Semi supervised word alignment model for parallel corpus
Project/Area Number |
20500149
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | National Institute of Information and Communications Technology |
Principal Investigator |
YAMAMOTO Hiroifumi National Institute of Information and Communications Technology, 理工学部, 教授 (00395013)
|
Co-Investigator(Renkei-kenkyūsha) |
SUMITA Eiichiro 情報通信研究機構 (90395020)
YASUDA Keiji 情報通信研究機構 (50395018)
GOH Ghooi-Ling 情報通信研究機構 (90531616)
|
Project Period (FY) |
2008 – 2010
|
Project Status |
Completed (Fiscal Year 2010)
|
Budget Amount *help |
¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000)
Fiscal Year 2010: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2009: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2008: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
|
Keywords | 自然言語処理 / アライメント / 多言語化 / 半教師あり学習 / 固有名詞 / 確率付き制約 |
Research Abstract |
The porous of this research is to improve word alignment accuracy in parallel corpus. In this research, not only word information, but also part-of-speech information and sentence structure are used. Semi-supervised approach is used for training, since it is difficult to additional information to all of sentence in corpus. For Japanese, English, and Chinese parallel corpus, semi-supervised aliment method using POS tag, and meaning tag for proper noun is conducted, and its effectiveness is confirmed. Next, sentence structure information is used for alignment, and its effectiveness is also confirmed.
|
Report
(4 results)
Research Products
(15 results)