2022 Fiscal Year Final Research Report

Pre-trained language models using the network structure of large-scale scholarly data

Research Project

PDF

Project/Area Number	20K12076
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 62020:Web informatics and service informatics-related
Research Institution	The University of Tokyo
Principal Investigator	Mori Junichiro 東京大学, 大学院情報理工学系研究科, 准教授 (30508924)
Project Period (FY)	2020-04-01 – 2023-03-31
Keywords	学術文献データ / 事前学習言語モデル / 引用ネットワーク / 表現学習
Outline of Final Research Achievements	The importance of extracting diverse academic knowledge from vast amounts of academic literature data that leads to new discoveries and problem-solving has been recognized. In this study, with the aim of supporting the extraction and discovery of useful knowledge from large-scale academic literature data, we conducted research on the fundamental methodology for constructing pre-trained language models from large-scale hypertext data that considers the network structure of academic literature data. As research results, we developed a technology for constructing pre-trained language models from large-scale academic literature data based on the citation relationships between documents in the form of hypertext data, as well as a technology for supporting the extraction and discovery of useful knowledge from large-scale academic literature data using pre-trained language models.
Free Research Field	知能情報学
Academic Significance and Societal Importance of the Research Achievements	まず、COVID-19に関する科学的エビデンスや重要な技術などの情報を抽出しその解析結果を広く一般に公開した。次に、引用ネットワーク構造を考慮した文献コーパスからの事前学習言語モデル構築のための予測問題の設計と実装に取り組んだ。また、事前学習言語モデルにより獲得された分散表現を用いた引用ネットワークのリンク予測およびノード分類タスクによる評価に取り組んだ。最後に、期間中に研究開発を行った手法を応用し、萌芽的な学術論文の発見、サーベイ論文の自動生成、研究トピックの抽出と時系列変化の可視化など、複数の新たなタスクに取り組んだ。これらの研究成果を複数の学会で発表した。