Neural Machine Translation for User-Generated Content

研究課題

研究課題/領域番号	20K19879
研究種目	若手研究
配分区分	基金
審査区分	小区分61030:知能情報学関連
研究機関	国立研究開発法人情報通信研究機構
研究代表者	MARIE BENJAMIN 国立研究開発法人情報通信研究機構, 先進的音声翻訳研究開発推進センター先進的翻訳技術研究室, 研究員 (30869433)
研究期間 (年度)	2020-04-01 – 2022-03-31
研究課題ステータス	中途終了 (2021年度)
配分額 *注記	4,160千円 (直接経費: 3,200千円、間接経費: 960千円) 2021年度: 520千円 (直接経費: 400千円、間接経費: 120千円) 2020年度: 3,640千円 (直接経費: 2,800千円、間接経費: 840千円)
キーワード	machine translation / COVID-19 / user-generated content / language model / Asian languages / user-generated text / deep learning / unsupervised learning / social media
研究開始時の研究の概要	Machine translation has achieved significant advances during the last decade thanks to deep learning technologies and the establishment of neural machine translation (NMT). However, noisy user-generated content (UGC), for instance from online social networks, can still cause disastrous mistranslations in most NMT systems. NMT for UGC is an under-studied challenging topic. This research will create new datasets of UGC for evaluating state-of-the-art NMT systems and will propose new methods to improve NMT for UGC through unsupervised machine translation and style-transfer technologies.
研究実績の概要	The main achievement for the second year of this research is a new method to extend monolingual data in a low-resource domain and style (e.g., tweets on the topic of COVID-19) to generate larger data for training NMT. For instance, given a a small set of Japanese tweets (e.g., 1000 tweets) about the COVID-19 crisis, that is too small to train NMT, this method artificially extends it to million tweets on the same topic and makes it useful to train better NMT to translate tweets. Using this artificial data to train NMT yields improvements of NMT that becomes better at translating texts even for domain and style for which very few data is available. Experiments have been successfully conducted in various domains and styles (medical, IT, news, tweets, online discussions), and languages (French, German, Japanese). This work has also been extended for "personalizing" NMT, i.e., adapt NMT so it translates texts written by a specific person while preserving the characteristics of writing of this person.

報告書

(2件)

2021 実績報告書
2020 実施状況報告書

研究成果
(6件)

すべて 2021 2020

すべて雑誌論文 (2件) (うち国際共著 2件、査読あり 2件、オープンアクセス 2件) 学会発表 (4件) (うち国際学会 2件)

[雑誌論文] Extremely low-resource neural machine translation for Asian languages2020
- 著者名/発表者名
  Rubino Raphael、Marie Benjamin、Dabre Raj、Fujita Atsushi、Utiyama Masao、Sumita Eiichiro
- 雑誌名
  
  Machine Translation
  
  巻: 34 号: 4 ページ: 347-382
- DOI
  10.1007/s10590-020-09258-6
- 関連する報告書
  2020 実施状況報告書
- 査読あり / オープンアクセス / 国際共著
[雑誌論文] Synthesizing Parallel Data of User-Generated Texts with Zero-Shot Neural Machine Translation2020
- 著者名/発表者名
  Marie Benjamin、Fujita Atsushi
- 雑誌名
  
  Transactions of the Association for Computational Linguistics
  
  巻: 8 ページ: 710-725
- DOI
  10.1162/tacl_a_00341
- 関連する報告書
  2020 実施状況報告書
- 査読あり / オープンアクセス / 国際共著
[学会発表] Scientific Credibility of Machine Translation Research: A Meta-Evaluation of 769 Papers2021
- 著者名/発表者名
  Benjamin Marie, Atsushi Fujita, Raphael Rubino
- 学会等名
  第13回最先端NLP勉強会
- 関連する報告書
  2021 実績報告書
[学会発表] Scientific Credibility of Machine Translation Research: A Meta-Evaluation of 769 Papers2021
- 著者名/発表者名
  Benjamin Marie, Atsushi Fujita, Raphael Rubino
- 学会等名
  The 11th International Joint Conference on Natural Language Processing
- 関連する報告書
  2021 実績報告書
- 国際学会
[学会発表] Altering Parallel Data into User-Generated Texts with Zero-Shot Neural Machine Translation2021
- 著者名/発表者名
  Marie Benjamin、Fujita Atsushi
- 学会等名
  言語処理学会第27回年次大会（NLP2021）
- 関連する報告書
  2020 実施状況報告書
[学会発表] Tagged Back-translation Revisited: Why Does It Really Work?2020
- 著者名/発表者名
  Marie Benjamin、Rubino Raphael、Fujita Atsushi
- 学会等名
  The 58th Annual Meeting of the Association for Computational Linguistics
- 関連する報告書
  2020 実施状況報告書
- 国際学会

Neural Machine Translation for User-Generated Content

研究代表者

MARIE BENJAMIN 国立研究開発法人情報通信研究機構, 先進的音声翻訳研究開発推進センター先進的翻訳技術研究室, 研究員 (30869433)

4,160千円 (直接経費: 3,200千円、間接経費: 960千円)

報告書

研究成果

[雑誌論文] Extremely low-resource neural machine translation for Asian languages2020

著者名/発表者名

雑誌名

DOI

関連する報告書

[雑誌論文] Synthesizing Parallel Data of User-Generated Texts with Zero-Shot Neural Machine Translation2020

著者名/発表者名

雑誌名

DOI

関連する報告書

[学会発表] Scientific Credibility of Machine Translation Research: A Meta-Evaluation of 769 Papers2021

著者名/発表者名

学会等名

関連する報告書

[学会発表] Scientific Credibility of Machine Translation Research: A Meta-Evaluation of 769 Papers2021

著者名/発表者名

学会等名

関連する報告書

[学会発表] Altering Parallel Data into User-Generated Texts with Zero-Shot Neural Machine Translation2021

著者名/発表者名

学会等名

関連する報告書

[学会発表] Tagged Back-translation Revisited: Why Does It Really Work?2020

著者名/発表者名

学会等名

関連する報告書