Text-to-speech, voice cloning, music generation, and audio processing
Robust speech recognition via large-scale weak supervision. SOTA multilingual STT.
Deep learning toolkit for text-to-speech, battle-tested in research and production.
Transformer-based text-to-audio model by Suno. Can generate speech, music, and sound effects.
Instant voice cloning with granular tone color, accent, and style control.
Retrieval-based Voice Conversion. Real-time voice cloning with minimal data.
Generate complete songs from text prompts. Leading AI music generation platform.
Meta's library for audio generation: MusicGen, AudioGen, and EnCodec.