S2UT: Direct Speech-to-Speech Translation with Discrete Units

1.9K subscribers

637 views

About
Share

Published On Apr 28, 2023

Original paper:
https://arxiv.org/pdf/2107.05604.pdf

Summary:
Uses Transformer encoder-decoder architecture
Uses Multi Headed Attention
Uses target phonemes for extra supervision
Predicts discrete units as the model output
Experiments include Spanish to English language translations
Published in 2022

Other resources:
/ advancing-direct-speech-to-speech-modeling...

What is self supervised learning?
https://arxiv.org/pdf/2304.12210.pdf

Code:
How to use model for training and evaluation: https://github.com/facebookresearch/f...
Code for the actual S2UT model: https://github.com/facebookresearch/f... (search for s2ut_transformer)

Published On Apr 28, 2023

Share/Embed

Video Link