Establishment of Automatic Word Segmentation Technology from Large-scale Text Data Independent of Language
Project/Area Number |
16K01267
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Research Field |
Social systems engineering/Safety system
|
Research Institution | Shonan Institute of Technology |
Principal Investigator |
Suzuki Makoto 湘南工科大学, 工学部, 教授 (80339796)
|
Co-Investigator(Kenkyū-buntansha) |
三川 健太 湘南工科大学, 工学部, 准教授 (40707733)
|
Project Period (FY) |
2016-04-01 – 2020-03-31
|
Project Status |
Completed (Fiscal Year 2019)
|
Budget Amount *help |
¥3,250,000 (Direct Cost: ¥2,500,000、Indirect Cost: ¥750,000)
Fiscal Year 2018: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2017: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2016: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000)
|
Keywords | 多言語処理 / 感情極性辞書 / テキストマイニング / N-gram / 単語抽出 / 単語切り出し / 自動抽出 / 自動分割 |
Outline of Final Research Achievements |
In this research, we constructed a word segmentation technology that processes text data that is mixed with multiple languages expressed in Unicode with the same program. This technique is a language-independent word segmentation method based on a simple state transition model that does not require any dictionary or grammatical knowledge for each language. The research proceeded mainly in two directions: (1) extension of the language to be processed and (2) extension of application cases. Regarding (1), We confirmed that it is effective not only for Japanese but also for Japanese classics and foreign languages such as English, Chinese, and Korean. Regarding (2), we were able to propose a method for automatically creating an emotional polarity dictionary using user reviews of products and facilities.
|
Academic Significance and Societal Importance of the Research Achievements |
本研究では、対象のレビューデータをもとに感情極性辞書を自動的に作成する手法を提案することができた。感情極性辞書とは、文章に含まれる単語に対し、文中に含まれる特有の極性(ポジティブ、ネガティブ)を持つ単語が含まれているという考えに基づき、単語に対し極性値を与えた辞書である。今回は商品や施設のユーザレビュー(5段階の評価値付きのテキストデータ)を用いて、評価値に基づいて感情極性値を算出することにより、感情極性辞書を自動的に作成する手法を提案した。これにより、コンピュータが自動的にユーザレビューを収集し、ある商品や施設に特化した感情極性辞書を構成できる可能性を示唆することができた。
|
Report
(5 results)
Research Products
(38 results)