Authors: Songxiang Liu, Disong Wang, Yuewen Cao, Lifa Sun, Xixin Wu, Shiyin Kang, Xunying Liu and Helen Meng
Abstract: Techniques for accent conversion (AC) aim to convert non-native-accented speech to native one. Conventional AC methods try to convert only the speaker identity of a native speaker's voice to that of the non-native accented target speaker, leaving the underlying contents and pronunciation unchanged. This hinders their practical use in real-world applications, because native-accented utterances are required at conversion stage. In this paper, we present an end-to-end framework, which is able to conduct AC from non-native-accented utterances without using any native-accented utterances during online conversion. We achieve this by independently extracting linguistic and speaker representations from non-native accented speech and condition a speech synthesis model on these representations to generate native-accented speech. Experiments on open-sourced data corpora show that the proposed system can convert Hindi-accented English speech into native-accented one with high speech naturalness, which is indistinguishable from native-accented recordings in terms of accent.
System Description
Baseline: L2 PPGs are first converted to L1 PPGs, from which get the converted speech.
Proposed: The proposed end-to-end accent conversion approach.
Ablation: Remove the accent embedding and accent classification from the proposed approach.
System Comparison
1. Text content: "Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob."
Non-native Accented Speech
Baseline
Ablation
Proposed
2. Text content: "SCOTLAND has shown the way."
Non-native Accented Speech
Baseline
Ablation
Proposed
3. Text content: "She is given a new deputy minister for transport and planning."
Non-native Accented Speech
Baseline
Ablation
Proposed
4. Text content: "We must provide a long-term solution to tackle this attitude."
Non-native Accented Speech
Baseline
Ablation
Proposed
5. Text content: "She is believed to be in South Africa."
Non-native Accented Speech
Baseline
Ablation
Proposed
6. Text content: "That's all right then."
Non-native Accented Speech
Baseline
Ablation
Proposed
7. Text content: "Health Secretary Frank Dobson made the surprise announcement in the Commons yesterday."
Non-native Accented Speech
Baseline
Ablation
Proposed
8. Text content: "I had relied on him."
Non-native Accented Speech
Baseline
Ablation
Proposed
9. Text content: "He sees a difference in the style of the two teams."
Non-native Accented Speech
Baseline
Ablation
Proposed
10. Text content: "There is a lack of chemistry."
Non-native Accented Speech
Baseline
Ablation
Proposed
11. Text content: "That could mean the difference between life and death in action."
Non-native Accented Speech
Baseline
Ablation
Proposed
12. Text content: "It wasn't to be."
Non-native Accented Speech
Baseline
Ablation
Proposed
13. Text content: "It's just awful."
Non-native Accented Speech
Baseline
Ablation
Proposed
14. Text content: "It is set in Paris."
Non-native Accented Speech
Baseline
Ablation
Proposed
15. Text content: "It is no surprise."
Non-native Accented Speech
Baseline
Ablation
Proposed
16. Text content: "It is also very valuable."
Non-native Accented Speech
Baseline
Ablation
Proposed
17. Text content: "We have to be sure that the taxation system can work."
Non-native Accented Speech
Baseline
Ablation
Proposed
18. Text content: "It's the last thing on my mind."
Non-native Accented Speech
Baseline
Ablation
Proposed
19. Text content: "To do so he reckons that a good opening result is essential."
Non-native Accented Speech
Baseline
Ablation
Proposed
20. Text content: "We also need a small plastic snake and a big toy frog for the kids."