S2UT: Direct Speech-to-Speech Translation with Discrete Units
Prabhjot Gosal Prabhjot Gosal
1.9K subscribers
637 views
7

 Published On Apr 28, 2023

Original paper:
https://arxiv.org/pdf/2107.05604.pdf

Summary:
Uses Transformer encoder-decoder architecture
Uses Multi Headed Attention
Uses target phonemes for extra supervision
Predicts discrete units as the model output
Experiments include Spanish to English language translations
Published in 2022

Other resources:
  / advancing-direct-speech-to-speech-modeling...  

What is self supervised learning?
https://arxiv.org/pdf/2304.12210.pdf

Code:
How to use model for training and evaluation: https://github.com/facebookresearch/f...
Code for the actual S2UT model: https://github.com/facebookresearch/f... (search for s2ut_transformer)

show more

Share/Embed