Project Area | Compilation of a balanced corpus of written Japanese: Infrastructure for the coming Japanese linguistics |
Project/Area Number |
18061002
|
Research Category |
Grant-in-Aid for Scientific Research on Priority Areas
|
Allocation Type | Single-year Grants |
Review Section |
Humanities and Social Sciences
|
Research Institution | Chiba University |
Principal Investigator |
DEN Yasuharu 千葉大学, 文学部, 教授 (70291458)
|
Co-Investigator(Kenkyū-buntansha) |
YAMADA Atsushi 京都高度技術研究所, 研究部, 主席研究員 (20240004)
MINEMATSU Nobuaki 東京大学, 大学院・新領域創成科学研究科, 准教授 (90273333)
UCHIMOTO Kiyotaka 情報通信研究機構, 総合企画部, プランニングマネージャー (60358885)
OGISO Tomonobu 国立国語研究所, 言語・資源研究系, 准教授 (20337489)
KOISO Hanae 国立国語研究所, 理論・構造研究系, 准教授 (30312200)
|
Project Period (FY) |
2006 – 2010
|
Project Status |
Completed (Fiscal Year 2010)
|
Budget Amount *help |
¥91,900,000 (Direct Cost: ¥91,900,000)
Fiscal Year 2010: ¥17,700,000 (Direct Cost: ¥17,700,000)
Fiscal Year 2009: ¥19,000,000 (Direct Cost: ¥19,000,000)
Fiscal Year 2008: ¥19,000,000 (Direct Cost: ¥19,000,000)
Fiscal Year 2007: ¥19,000,000 (Direct Cost: ¥19,000,000)
Fiscal Year 2006: ¥17,200,000 (Direct Cost: ¥17,200,000)
|
Keywords | 電子化辞書 / 形態素解析 / 書き言葉コーパス / 音変化 / アクセント / アクセント変化 / 辞書データベース / 単位の自動構成 |
Research Abstract |
(1) An electric dictionary for morphological analyzers with the following characteristics has been developed. ・ Lexical entries with uniform unit-size based on Short-Unit Words ・ Hierarchical representation of lexical entries, consisting of lemma, form, orthography, and pronunciation, which enables us to deal with variations in orthography and word form ・ Rich information including features for phonological and accentual sandhi (2) A version for morphological analyzer MeCab has been derived from the dictionary database, with several updates, which amounts to 210K lemma and 330K orthographic entries and which achieves an accuracy of 98.9% in part-of-speech tagging and an accuracy of 98.6% in lemma identification. 3) A version of the dictionary database represented by XML files has also been developed, which enables users to build customized dictionaries for morphological analyzers according to the user’s preference and purpose. (4) Post-processing tools, including Middle- and Long-Unit-Word analyzers, have been developed for advanced use of the dictionary, such as syntactic analysis and text-to-speech application.
|