2022 Fiscal Year Final Research Report

Study on Improving Performance of Natural Language Processing by Integrating Collocation Extraction and Deep Learning

Research Project

PDF

Project/Area Number	19K20333
Research Category	Grant-in-Aid for Early-Career Scientists
Allocation Type	Multi-year Fund
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	University of Tsukuba
Principal Investigator	Wakabayashi Kei 筑波大学, 図書館情報メディア系, 准教授 (40631908)
Project Period (FY)	2019-04-01 – 2023-03-31
Keywords	連語抽出 / 深層学習 / 能動学習 / 隠れマルコフモデル / 文書要約 / 対話システム / トピックモデル
Outline of Final Research Achievements	In this study, we addressed the following three research questions. (A) To extract meaningful collocations from text with higher accuracy, we proposed a new method for training collocation extraction models by using linguistic resources and human annotator resources efficiently, and advanced a basic theory of statistical models used for collocation extraction. (B) We proposed a deep learning method that uses the extracted collocations to improve the accuracy of natural language processing applications, which are namely document summarization, language understanding in dialog systems, and topic modeling. (C) We proposed a method for dynamically extracting collocations that contribute to improving the accuracy of later-stage natural language processing tasks during the training of deep learning models for those tasks.
Free Research Field	機械学習，自然言語処理
Academic Significance and Societal Importance of the Research Achievements	複数の単語で特定の意味を持つ連語を考慮することは，多くの自然言語処理のアプリケーションの精度を向上させるために重要な課題である．しかし，連語抽出手法の性質が，後段の自然言語処理タスクを学習する深層学習手法に与える影響については，これまで明らかにされてこなかった．本研究成果の意義は，連語の抽出と深層学習による自然言語処理タスクの精度向上を結びつける方法論を示し，その効果を明らかにしたことにある．とりわけ，連語を明示的に分析結果として提示するトピックモデリングや対話システムの言語理解タスクにおいて，直接的に応用可能な研究成果が得られたと考える．