2019 Fiscal Year Final Research Report
Constructing simplified Japanese corpus and prototyping automatic text simplification
Project/Area Number |
17K18481
|
Research Category |
Grant-in-Aid for Challenging Research (Exploratory)
|
Allocation Type | Multi-year Fund |
Research Field |
Literature, Linguistics, and related fields
|
Research Institution | Nagaoka University of Technology |
Principal Investigator |
|
Project Period (FY) |
2017-06-30 – 2020-03-31
|
Keywords | やさしい日本語 / 平易化 / 対訳コーパス |
Outline of Final Research Achievements |
(1) A simple Japanese checker is developed that automatically determines whether it is simple Japanese, and it is released to the public as a Web application. (2) Using this tool, a Japanese simplified corpus is created which includes 50000 original sentences and their corresponding simplified ones. At present, this is the only simplified Japanese corpus and is also the largest in the world. At the same time, a simple Japanese vocabulary of 2000 words and grammar are defined. In addition, the simplified Japanese corpus described above is expanded by using crowdsourcing, and a corpus of 35,000 new sentences is created. (3) Using the above corpora, various studies on Japanese simplification were conducted.
|
Free Research Field |
自然言語処理
|
Academic Significance and Societal Importance of the Research Achievements |
やさしい日本語に対する潜在需要と一般の関心は高く、NHK News Web Easy や自治体などで徐々に社会的に認知される段階に入りつつある。そのような中で、日本語で唯一の平易化コーパスを構築して公開した意義は非常に大きい。日本語の自動平易化研究は語彙平易化を除けば現時点で本研究課題のみであり、自然言語処理への貢献も非常に大きいと考える。政府や自治体からのお知らせがもし自動でやさしい日本語に変換することができれば情報保障の観点から非常に有益で、本研究課題はそのための基礎を構築できたと考えている。
|