I received the Ph.D. degree from the Human-Computer Communications Laboratory (HCCL) at The Chinese University of Hong Kong, supervised by Prof. Helen Meng. Before that, I obtained the B.Eng. degree in Automation from Department of Control Science and Engineering at Zhejiang University. I did a several-month summer visiting in 2019 at Speech Processing and Machine Learning Lab at National Taiwan University, advised by Prof. Hung-yi Lee, working on adversarial attacks on ASVspoofing countermeasure systems and unsupervised ASR using GAN-based models.

My research interests encompass the extensive domain of Multi/Omni-Modal LLM and speech and language intelligence, which includes speech foundation models, large language models (LLMs), text-to-speech synthesis (TTS), voice conversion (VC), singing synthesis, cross-modal representation learning, audio adversarial attacks \& defense, among other related areas.

工作经历 Working Experiences

腾讯AI Lab -> 米哈游 -> 月之暗面 (Kimi) -> 美团

Tencent AI Lab -> miHoYo -> Moonshot AI (Kimi) -> Meituan

我们正在积极寻找实习生和全职研究人员，从事多模态融合及多模态实时交互算法研究，欢迎感兴趣的人联系我！songxiangliu.cuhk艾特gmail.com

News

2025.05: “ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling “[Arxiv] accepted by ICML 2025.
2025.04.25: 👋 We release the technical report of Kimi-Audio. [Code][Model][Paper].
2024.05: The paper of “UniAudio: Towards Universal Audio Generation with Large Language Models” is accepted at ICML 2024.
2024.04: The paper of “InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt” is accepted by IEEE/ACM Transactions on Audio Speech and Language Processing.
2023.02: Our new work on prompt-based expressive TTS – “InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt” is available [Demo][arXiv].
2022.02: Paper introducing DiffGAN-TTS is available from [arXiv].
2021.09: DiffSVC paper will appear in ASRU2021.
2021.05: Our new work on singing voice conversion with the denoising diffusion probabilistic model (DDPM)[Demo][arXiv].
2021.03: Our FastSVC paper has been accepted as an oral paper in ICME 2021.

Selected Publications (Full list)

Journal Papers

* indicates equal contributions.

Dongchao Yang*, Songxiang Liu*, Rongjie Huang, Chao Weng, Helen Meng, InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt, arXiv:2301.13662, accepted by IEEE/ACM Transactions on Audio Speech and Language Processing, 2024. (Correspondence author)
Songxiang Liu, Yuewen Cao, Disong Wang, Xixin Wu, Xunying Liu, Helen Meng, Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling, in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1717-1728, 2021.
Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Helen Meng, Exemplar-Based Emotive Speech Synthesis, in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 874-886, 2021.
Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Disong Wang, Zhiyong Wu, Xunying Liu, Helen Meng, Speech Emotion Recognition using Sequential Capsule Networks, accepted by IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 29, pp. 874-886, 2021.

Conference Papers

* indicates equal contributions.

Dongchao Yang, Jinchuan Tian, Xu Tan, Rongjie Huang, Songxiang Liu, Haohan Guo, Xuankai Chang, Jiatong Shi, Sheng Zhao, Jiang Bian, Zhou Zhao, Xixin Wu, Helen Meng, UniAudio: Towards Universal Audio Generation with Large Language Models, ICML 2024.
Xiang Li, Songxiang Liu, Max W. Y. Lam, Zhiyong Wu, Chao Weng, Helen Meng, Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model, ISCA Proceedings of Interspeech 2023 (The ISCA Best Student Paper Award).
Sipan Li, Songxiang Liu, Luwen Zhang, Xiang Li, Yanyao Bian, Chao Weng, Zhiyong Wu, Helen Meng, SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias, 2023 IEEE International Conference on Multimedia and Expo (ICME), 1703-1708.
Songxiang Liu, Dan Su, Dong Yu, DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs, 2022 International Conference on Machine Learning (ICML) Workshop on Machine Learning for Audio Synthesis.
Songxiang Liu, Shan Yang, Dan Su, Dong Yu, Referee: towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022.
Songxiang Liu*, Yuewen Cao*, Dan Su, Helen Meng, DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion, IEEE ASRU 2021.
Songxiang Liu, Yuewen Cao, Na Hu, Dan Su, Helen Meng, FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation, 2021 IEEE International Conference on Multimedia and Expo (ICME), pp. 1-6, doi: 10.1109/ICME51207.2021.9428161. (oral)
Songxiang Liu, Yuewen Cao, Shiyin Kang, Na Hu, Xunying Liu, Dan Su, Dong Yu, Helen Meng, Transferring Source Style in Non-Parallel Voice Conversion, ISCA INTERSPEECH 2020.
Songxiang Liu, Disong Wang, Yuewen Cao, Lifa Sun, Xixin Wu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng, “End-to-End Accent Conversion Without Using Native Utterances, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020.
Haibin Wu*, Songxiang Liu*, Helen Meng, Hung-yi Lee, Defense against adversarial attacks on spoofing countermeasures of ASV”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020.
Songxiang Liu, Haibin Wu, Hung-yi Lee, Helen Meng, Adversarial attacks on spoofing countermeasures of automatic speaker verification, IEEE ASRU 2019.
Songxiang Liu, Yuewen Cao, Xixin Wu, Lifa Sun, Xunying Liu, Helen Meng, Jointly Trained Conversion Model and WaveNet Vocoder for Non-Parallel Voice Conversion Using Mel-Spectrograms and Phonetic Posteriorgrams, ISCA Proceedings of Interspeech 2019.
Songxiang Liu, Jinghua Zhong, Lifa Sun, Xixin Wu, Xunying Liu, Helen Meng, Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance, ISCA Proceedings of Interspeech 2018. (oral)
Songxiang Liu, Lifa Sun, Xixin Wu, Xunying Liu, Helen Meng, The HCCL-CUHK System for the Voice Conversion Challenge 2018, ISCA Speech Odyssey 2018.

Selected Preprints and Technical Reports

* indicates equal contributions.

Dongchao Yang*, Songxiang Liu*, Rongjie Huang*, Jinchuan Tian, Chao Weng, Yuexian Zou, Hifi-codec: Group-residual vector quantization for high fidelity audio codec, arXiv preprint arXiv:2305.02765. 2023.
Songxiang Liu, Dan Su, Dong Yu, Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning, Tech. report, 2021.
Songxiang Liu, Yuewen Cao, Helen Meng, Multi-Target Emotional Voice Conversion With Neural Vocoders, in arXiv:2004.03782, work done in 2018
Songxiang Liu, Yuewen Cao, Helen Meng, Emotional Voice Conversion With Cycle-consistent Adversarial Network, in arXiv:2004.03781, work done in 2018

Internship

Tencent AI Lab (09/2019-12/2022): Research Intern, conducting research on singing voice conversion \& synthesis, accent conversion and voice conversion, advised by Dr. Shiying Kang.

Professional Services

Organizer of Codec SUPERB Challenge @ SLT 2024.
Organizer of Singing Voice Conversion Challenge 2023, the summary was presented at a special session during IEEE ASRU 2023.
Serve as the reviewer of ICLR, COLING, IEEE/ACM Tran. ASLP, ICASSP, ASRU, Signal Processing Letters, INTERSPEECH, Speaker Odyssey Workshop and so on.

Honors and Awards

2023: The ISCA Best Student Paper Award (As the mentor of the first author).
2018: Best paper award in the International Doctoral Forum 2018, Shenzhen, China

Songxiang Liu (刘颂湘)

工作经历 Working Experiences

News

Selected Publications (Full list)

Journal Papers

Conference Papers

Selected Preprints and Technical Reports

Internship

Professional Services

Honors and Awards