Audio Samples from "End-to-End Accent Conversion"

Authors: Songxiang Liu, Disong Wang, Yuewen Cao, Lifa Sun, Xixin Wu, Shiyin Kang, Xunying Liu and Helen Meng

Abstract: Techniques for accent conversion (AC) aim to convert non-native-accented speech to native one. Conventional AC methods try to convert only the speaker identity of a native speaker's voice to that of the non-native accented target speaker, leaving the underlying contents and pronunciation unchanged. This hinders their practical use in real-world applications, because native-accented utterances are required at conversion stage. In this paper, we present an end-to-end framework, which is able to conduct AC from non-native-accented utterances without using any native-accented utterances during online conversion. We achieve this by independently extracting linguistic and speaker representations from non-native accented speech and condition a speech synthesis model on these representations to generate native-accented speech. Experiments on open-sourced data corpora show that the proposed system can convert Hindi-accented English speech into native-accented one with high speech naturalness, which is indistinguishable from native-accented recordings in terms of accent.


System Description

Baseline: L2 PPGs are first converted to L1 PPGs, from which get the converted speech.

Proposed: The proposed end-to-end accent conversion approach.

Ablation: Remove the accent embedding and accent classification from the proposed approach.


System Comparison

1. Text content: "Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob."

Non-native Accented Speech
Baseline Ablation Proposed

2. Text content: "SCOTLAND has shown the way."

Non-native Accented Speech
Baseline Ablation Proposed

3. Text content: "She is given a new deputy minister for transport and planning."

Non-native Accented Speech
Baseline Ablation Proposed

4. Text content: "We must provide a long-term solution to tackle this attitude."

Non-native Accented Speech
Baseline Ablation Proposed

5. Text content: "She is believed to be in South Africa."

Non-native Accented Speech
Baseline Ablation Proposed

6. Text content: "That's all right then."

Non-native Accented Speech
Baseline Ablation Proposed

7. Text content: "Health Secretary Frank Dobson made the surprise announcement in the Commons yesterday."

Non-native Accented Speech
Baseline Ablation Proposed

8. Text content: "I had relied on him."

Non-native Accented Speech
Baseline Ablation Proposed

9. Text content: "He sees a difference in the style of the two teams."

Non-native Accented Speech
Baseline Ablation Proposed

10. Text content: "There is a lack of chemistry."

Non-native Accented Speech
Baseline Ablation Proposed

11. Text content: "That could mean the difference between life and death in action."

Non-native Accented Speech
Baseline Ablation Proposed

12. Text content: "It wasn't to be."

Non-native Accented Speech
Baseline Ablation Proposed

13. Text content: "It's just awful."

Non-native Accented Speech
Baseline Ablation Proposed

14. Text content: "It is set in Paris."

Non-native Accented Speech
Baseline Ablation Proposed

15. Text content: "It is no surprise."

Non-native Accented Speech
Baseline Ablation Proposed

16. Text content: "It is also very valuable."

Non-native Accented Speech
Baseline Ablation Proposed

17. Text content: "We have to be sure that the taxation system can work."

Non-native Accented Speech
Baseline Ablation Proposed

18. Text content: "It's the last thing on my mind."

Non-native Accented Speech
Baseline Ablation Proposed

19. Text content: "To do so he reckons that a good opening result is essential."

Non-native Accented Speech
Baseline Ablation Proposed

20. Text content: "We also need a small plastic snake and a big toy frog for the kids."

Non-native Accented Speech
Baseline Ablation Proposed