2022 Fiscal Year Final Research Report
Pre-trained language models using the network structure of large-scale scholarly data
Project/Area Number |
20K12076
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Review Section |
Basic Section 62020:Web informatics and service informatics-related
|
Research Institution | The University of Tokyo |
Principal Investigator |
Mori Junichiro 東京大学, 大学院情報理工学系研究科, 准教授 (30508924)
|
Project Period (FY) |
2020-04-01 – 2023-03-31
|
Keywords | 学術文献データ / 事前学習言語モデル / 引用ネットワーク / 表現学習 |
Outline of Final Research Achievements |
The importance of extracting diverse academic knowledge from vast amounts of academic literature data that leads to new discoveries and problem-solving has been recognized. In this study, with the aim of supporting the extraction and discovery of useful knowledge from large-scale academic literature data, we conducted research on the fundamental methodology for constructing pre-trained language models from large-scale hypertext data that considers the network structure of academic literature data. As research results, we developed a technology for constructing pre-trained language models from large-scale academic literature data based on the citation relationships between documents in the form of hypertext data, as well as a technology for supporting the extraction and discovery of useful knowledge from large-scale academic literature data using pre-trained language models.
|
Free Research Field |
知能情報学
|
Academic Significance and Societal Importance of the Research Achievements |
まず、COVID-19に関する科学的エビデンスや重要な技術などの情報を抽出しその解析結果を広く一般に公開した。次に、 引用ネットワーク構造を考慮した文献コーパスからの事前学習言語モデル構築のための予測問題の設計と実装に取り組んだ。また、事前学習言語モデルにより獲得された分散表現を用いた引用ネットワークのリンク予測および ノード分類タスクによる評価に取り組んだ。 最後に、期間中に研究開発を行った手法を応用し、萌芽的な学術論文の発見、サーベイ論文の自動生成、研究トピックの抽出と時系列変化の可視化など、複数の新たなタスクに取り組んだ。これらの研究成果を複数の学会で発表した。
|