Published On Oct 19, 2024
Meta has released a research only Audio Language Model called Spirit-LM, it can take both text & audio and input and generate both audio & text output
Meta Spirit LM is trained with a word-level interleaving method on speech and text datasets to enable cross-modality generation
Spirit LM lets people generate more natural sounding speech, and it has the ability to learn new tasks across modalities such as automatic speech recognition, text-to-speech, and speech classification.
Spirit-LM blog post:
https://ai.meta.com/blog/fair-news-se...
Spirit-LM research paper:
https://arxiv.org/abs/2402.05755
Spirit LM code:
https://github.com/facebookresearch/s...
Spirit-LM developer’s video:
• SpiRit-LM, an Interleaved Spoken and ...
Follow on Twitter: https://x.com/digi_decode