News

2025.11: 🔥 We introduce LongCat-Flash-Omni, a SOTA open-source omni-modal model with 560B-A27B parameters, excelling at real-time audio-visual interaction. Tech-Report [Arxiv], GitHub [Code], Model [Weights].
2025.05: “ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling “[Arxiv] accepted by ICML 2025.
2025.04.25: 👋 We release the technical report of Kimi-Audio. [Code][Model][Paper].
2024.05: The paper of “UniAudio: Towards Universal Audio Generation with Large Language Models” is accepted at ICML 2024.
2024.04: The paper of “InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt” is accepted by IEEE/ACM Transactions on Audio Speech and Language Processing.
2023.02: Our new work on prompt-based expressive TTS – “InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt” is available [Demo][arXiv].
2022.02: Paper introducing DiffGAN-TTS is available from [arXiv].
2021.09: DiffSVC paper will appear in ASRU2021.
2021.05: Our new work on singing voice conversion with the denoising diffusion probabilistic model (DDPM)[Demo][arXiv].
2021.03: Our FastSVC paper has been accepted as an oral paper in ICME 2021.