日中機械翻訳の実用化を目指した対訳資源の段階的自動構築

Research Project

Project/Area Number	14J02353
Research Category	Grant-in-Aid for JSPS Fellows
Allocation Type	Single-year Grants
Section	国内
Research Field	Intelligent informatics
Research Institution	Kyoto University
Principal Investigator	チョシンキ京都大学, 特別研究員(DC2)
Project Period (FY)	2014-04-25 – 2016-03-31
Project Status	Declined (Fiscal Year 2015)
Budget Amount *help	¥1,700,000 (Direct Cost: ¥1,700,000) Fiscal Year 2015: ¥800,000 (Direct Cost: ¥800,000) Fiscal Year 2014: ¥900,000 (Direct Cost: ¥900,000)
Keywords	機械翻訳 / コンパラブルコーパス / 対訳データ
Outline of Annual Research Achievements	統計的機械翻訳（SMT）では対訳コーパスから翻訳知識を獲得するため、翻訳の精度は対訳コーパスの量と質に依存する。しかしながら、大規模かつ高品質な対訳コーパスが存在する言語対やドメインは少ない。この問題を解決するために、コンパラブルコーパスを利用することが考えられる。コンパラブルコーパスは各言語独立に、特定の話題について記述された文書対である。コンパラブルコーパスには単語、単語列（フラグメント）、文の三種類の対訳データが数多く存在する。この一年間、これらの対訳データを統合的に抽出するフレームワークを研究し、翻訳の精度を向上させた。得られた主要な成果は以下の通りである。 1. 対訳単語対抽出において、トピックと文脈知識を用いた反復的抽出手法を提案した。提案手法は種となる事前知識（対訳辞書など）が不要で、抽出の性能が反復的に改善できる。日英、中英、日中のWikipediaデータでの実験により、提案手法の有効性を示した。また、抽出した対訳単語対は後の対訳フラグメントおよび対訳文抽出に使用した。 2. Wikipediaデータから日中対訳コーパスを構築するための堅牢な対訳文抽出システムを提案した。提案システムは主に対訳文候補のフィルタおよび対訳文であるかどうかを識別する分類器から構成されている。実験では、対訳文抽出の性能と翻訳精度向上の２つの観点から、提案システムの有効性を示した。 3. 単語アライメントモデルにより抽出された対訳フラグメント候補を、すでに抽出されている対訳単語対を用いてフィルタリングすることにより、高精度に対訳フラグメントを抽出するシステムを提案した。日中コンパラブルコーパスで行われた実験の結果、提案システムが対訳フラグメントを正確に抽出し、これを利用することで翻訳の精度も向上することを確認した。
Research Progress Status	翌年度、交付申請を辞退するため、記入しない。
Strategy for Future Research Activity	翌年度、交付申請を辞退するため、記入しない。

Report

(1 results)

2014 Annual Research Report

Research Products
(7 results)

All 2015 2014 Other

All Journal Article (2 results) (of which Peer Reviewed: 2 results, Open Access: 2 results, Acknowledgement Compliant: 2 results) Presentation (3 results) Book (1 results) Remarks (1 results)

[Journal Article] Integrated Parallel Sentence and Fragment Extraction from Comparable Corpora: A Case Study on Chinese-Japanese Wikipedia2015
- Author(s)
  Chenhui Chu, Toshiaki Nakazawa, Sadao Kurohashi
- Journal Title
  
  ACM Transactions on Asian Language Information Processing (TALIP)
  
  Volume: 印刷中
- Related Report
  2014 Annual Research Report
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] Parallel Sentence Extraction Based on Unsupervised Bilingual Lexicon Extraction from Comparable Corpora2015
- Author(s)
  Chenhui Chu, Toshiaki Nakazawa, Sadao Kurohashi
- Journal Title
  
  自然言語処理
  
  Volume: 印刷中
- Related Report
  2014 Annual Research Report
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Presentation] Large-scale Japanese-Chinese Scientific Dictionary Construction via Pivot-based Statistical Machine Translation2015
- Author(s)
  Chenhui Chu, Raj Dabre, Toshiaki Nakazawa and Sadao Kurohashi
- Organizer
  In Proceedings of the 21th Annual Meeting of the Association for Natural Language Processing (NLP2015)
- Place of Presentation
  京都大学京都市左京区吉田本町
- Year and Date
  2015-03-17 – 2015-03-19
- Related Report
  2014 Annual Research Report
[Presentation] Improving Statistical Machine Translation Accuracy Using Bilingual Lexicon Extraction with Paraphrases2014
- Author(s)
  Chenhui Chu, Toshiaki Nakazawa, Sadao Kurohashi
- Organizer
  In Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing (PACLIC2014)
- Place of Presentation
  Phuket, Thailand
- Year and Date
  2014-12-12 – 2014-12-14
- Related Report
  2014 Annual Research Report
[Presentation] Constructing a Chinese-Japanese Parallel Corpus from Wikipedia2014
- Author(s)
  Chenhui Chu, Toshiaki Nakazawa, Sadao Kurohashi
- Organizer
  In Proceedings of the 9th Conference on International Language Resources and Evaluation (LREC2014)
- Place of Presentation
  Reykjavik, Iceland
- Year and Date
  2014-05-26 – 2014-05-31
- Related Report
  2014 Annual Research Report
[Book] Using Comparable Corpora for Under-Resourced Areas of Machine Translation2015
- Author(s)
  Chenhui Chu, Toshiaki Nakazawa and Sadao Kurohashi
- Publisher
  Springer
- Related Report
  2014 Annual Research Report
[Remarks] Chenhui Chu
- URL
  http://lotus.kuee.kyoto-u.ac.jp/~chu
- Related Report
  2014 Annual Research Report

日中機械翻訳の実用化を目指した対訳資源の段階的自動構築

Principal Investigator

チョ シンキ 京都大学, 特別研究員(DC2)

¥1,700,000 (Direct Cost: ¥1,700,000)

Report

Research Products

[Journal Article] Integrated Parallel Sentence and Fragment Extraction from Comparable Corpora: A Case Study on Chinese-Japanese Wikipedia2015

Author(s)

Journal Title

Related Report

[Journal Article] Parallel Sentence Extraction Based on Unsupervised Bilingual Lexicon Extraction from Comparable Corpora2015

Author(s)

Journal Title

Related Report

[Presentation] Large-scale Japanese-Chinese Scientific Dictionary Construction via Pivot-based Statistical Machine Translation2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Improving Statistical Machine Translation Accuracy Using Bilingual Lexicon Extraction with Paraphrases2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Constructing a Chinese-Japanese Parallel Corpus from Wikipedia2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Book] Using Comparable Corpora for Under-Resourced Areas of Machine Translation2015

Author(s)

Publisher

Related Report

[Remarks] Chenhui Chu

URL

Related Report

チョシンキ京都大学, 特別研究員(DC2)