2021 年度実績報告書

Neural Machine Translation for User-Generated Content

研究課題

研究課題/領域番号	20K19879
研究機関	国立研究開発法人情報通信研究機構
研究代表者	MARIE BENJAMIN 国立研究開発法人情報通信研究機構, 先進的音声翻訳研究開発推進センター先進的翻訳技術研究室, 研究員 (30869433)
研究期間 (年度)	2020-04-01 – 2022-03-31
キーワード	machine translation / COVID-19 / user-generated content / language model
研究実績の概要	The main achievement for the second year of this research is a new method to extend monolingual data in a low-resource domain and style (e.g., tweets on the topic of COVID-19) to generate larger data for training NMT. For instance, given a a small set of Japanese tweets (e.g., 1000 tweets) about the COVID-19 crisis, that is too small to train NMT, this method artificially extends it to million tweets on the same topic and makes it useful to train better NMT to translate tweets. Using this artificial data to train NMT yields improvements of NMT that becomes better at translating texts even for domain and style for which very few data is available. Experiments have been successfully conducted in various domains and styles (medical, IT, news, tweets, online discussions), and languages (French, German, Japanese). This work has also been extended for "personalizing" NMT, i.e., adapt NMT so it translates texts written by a specific person while preserving the characteristics of writing of this person.

[学会発表] Scientific Credibility of Machine Translation Research: A Meta-Evaluation of 769 Papers2021
- 著者名/発表者名
  Benjamin Marie, Atsushi Fujita, Raphael Rubino
- 学会等名
  第13回最先端NLP勉強会
[学会発表] Scientific Credibility of Machine Translation Research: A Meta-Evaluation of 769 Papers2021
- 著者名/発表者名
  Benjamin Marie, Atsushi Fujita, Raphael Rubino
- 学会等名
  The 11th International Joint Conference on Natural Language Processing
- 国際学会