Accepted as conference paper at 38th International Conference on Machine Learning (ICML 2021).
Authors: Xuan-Phi Nguyen, Shafiq Joty, Thanh-Tung Nguyen, Wu Kui, Ai Ti Aw
Paper link: https://arxiv.org/abs/1911.01986
Citation
Please cite as:
@incollection{nguyen2021cbd,
title = {Cross-model Back-translated Distillation for Unsupervised Machine Translation},
author = {Xuan-Phi Nguyen and Shafiq Joty and Thanh-Tung Nguyen and Wu Kui and Ai Ti Aw},
booktitle = {38th International Conference on Machine Learning},
year = {2021},
}
These guidelines demonstrate the steps to run CBD on the WMT En-De
Finetuned model
Model | Train Dataset | Finetuned model |
---|---|---|
WMT En-Fr |
WMT English-French | model: download |
WMT En-De |
WMT English-German | model: download |
0. Installation
./install.sh
pip install fairseq==0.8.0 --progress-bar off
1. Prepare data
Following instructions from MASS-paper to create WMT En-De dataset.
2. Prepare pretrained model
Download XLM finetuned model (theta_1): here, save it to bash variable export xlm_path=...
Download MASS finetuned model (theta_2): here, save it to export mass_path=....
Download XLM pretrained model (theta): here, save it to export pretrain_path...
3. Run CBD model
# you may change the inputs in the file according to your context
bash run_ende.sh