2022 Fiscal Year Final Research Report
Construction of a large word database with accent information
Project/Area Number |
19K13173
|
Research Category |
Grant-in-Aid for Early-Career Scientists
|
Allocation Type | Multi-year Fund |
Review Section |
Basic Section 02060:Linguistics-related
|
Research Institution | Tokyo Metropolitan University (2021-2022) National Institute for Japanese Language and Linguistics (2019-2020) |
Principal Investigator |
Oka Teruaki 東京都立大学, システムデザイン研究科, 特任助教 (50782942)
|
Project Period (FY) |
2019-04-01 – 2023-03-31
|
Keywords | アクセント / 形態素解析辞書 |
Outline of Final Research Achievements |
Crowdsourcing was used to add accent information to UniDic, an electronic dictionary for morphological analysis. Since the participants were unspecified non-specialists, we set them the task of selecting familiar accents from speech synthesized by simultaneously presenting not only the word whose accent they wanted to identify, but also its successor words that would not change the accent of the word. Filtering was performed using words with known accents as gold, and Bayesian level estimation of the worker and each question was used to assign accent information by weighted majority voting. The filtering and the prediction of the worker's level for the task resulted in large-scale accent assignment that was not affected by differences in place of residence.
|
Free Research Field |
自然言語処理
|
Academic Significance and Societal Importance of the Research Achievements |
単語へのアクセント付与作業は、居住地や出身地の影響を受けるため、非専門家には難しく、大規模な実施は困難だった。クラウドソーシングの普及とともに発展した設問や作業者のレベル推定手法を使うことで、専門家を時間的空間的に拘束することのないアクセント付与のフローを実現した。
|