Pre-trained language models using the network structure of large-scale scholarly data

Research Project

Project/Area Number	20K12076
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 62020:Web informatics and service informatics-related
Research Institution	The University of Tokyo
Principal Investigator	Mori Junichiro 東京大学, 大学院情報理工学系研究科, 准教授 (30508924)
Project Period (FY)	2020-04-01 – 2023-03-31
Project Status	Completed (Fiscal Year 2022)
Budget Amount *help	¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2022: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000) Fiscal Year 2021: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000) Fiscal Year 2020: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
Keywords	学術文献データ / 事前学習言語モデル / 引用ネットワーク / 表現学習 / 事前学習モデル
Outline of Research at the Start	本研究では、大規模な学術文献データから有用な知識の抽出と発見を支援することを目的に、学術文献データのネットワーク構造を考慮した大規模ハイパーテキストデータからの事前学習言語モデルの構築に関する基本的な方法論の研究を行う。本研究では学術文献のようにテキスト同士が関係で結ばれ全体としてネットワーク構造を持ったテキストコーパスからの事前学習言語モデルを構築するための知見を明らかにする。その上で、学術文献データからの知識抽出・発見に関連するタスクに事前学習言語モデルを適用し評価を行い、実応用に関する知見を明らかにする。
Outline of Final Research Achievements	The importance of extracting diverse academic knowledge from vast amounts of academic literature data that leads to new discoveries and problem-solving has been recognized. In this study, with the aim of supporting the extraction and discovery of useful knowledge from large-scale academic literature data, we conducted research on the fundamental methodology for constructing pre-trained language models from large-scale hypertext data that considers the network structure of academic literature data. As research results, we developed a technology for constructing pre-trained language models from large-scale academic literature data based on the citation relationships between documents in the form of hypertext data, as well as a technology for supporting the extraction and discovery of useful knowledge from large-scale academic literature data using pre-trained language models.
Academic Significance and Societal Importance of the Research Achievements	まず、COVID-19に関する科学的エビデンスや重要な技術などの情報を抽出しその解析結果を広く一般に公開した。次に、引用ネットワーク構造を考慮した文献コーパスからの事前学習言語モデル構築のための予測問題の設計と実装に取り組んだ。また、事前学習言語モデルにより獲得された分散表現を用いた引用ネットワークのリンク予測およびノード分類タスクによる評価に取り組んだ。最後に、期間中に研究開発を行った手法を応用し、萌芽的な学術論文の発見、サーベイ論文の自動生成、研究トピックの抽出と時系列変化の可視化など、複数の新たなタスクに取り組んだ。これらの研究成果を複数の学会で発表した。

Report

(4 results)

2022 Annual Research Report Final Research Report ( PDF )
2021 Research-status Report
2020 Research-status Report

Research Products
(19 results)

All 2023 2022 2021 2020 Other

All Journal Article (5 results) (of which Int'l Joint Research: 2 results, Peer Reviewed: 5 results, Open Access: 3 results) Presentation (13 results) (of which Int'l Joint Research: 3 results) Remarks (1 results)

[Journal Article] Classification of the Top-cited Literature by Fusing Linguistic and Citation Information with the Transformer Model2022
- Author(s)
  Masanao Ochi, Masanori Shiro, Junichiro Mori, Ichiro Sakata
- Journal Title
  
  Proceedings of the 18th International Conference on Web Information Systems and Technologies
  
  Volume: -
- Related Report
  2022 Annual Research Report
- Peer Reviewed
[Journal Article] Predictive analysis of multiple future scientific impacts by embedding a heterogeneous network2022
- Author(s)
  Masanao Ochi, Masanori Shiro, Jun’ichiro Mori, Ichiro Sakata
- Journal Title
  
  PLOS ONE
  
  Volume: 17-9 Issue: 9 Pages: 0274253-0274253
- DOI
  10.1371/journal.pone.0274253
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Unsupervised Abstractive Opinion Summarization by Generating Sentences with Tree-Structured Topic Guidance2021
- Author(s)
  Masaru Isonuma, Junichiro Mori, Danushka Bollegala, Ichiro Sakata
- Journal Title
  
  Transactions of the Association for Computational Linguistics
  
  Volume: 9 Pages: 945-961
- DOI
  10.1162/tacl_a_00406
- Related Report
  2021 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Journal Article] Tree-Structured Neural Topic Model2020
- Author(s)
  Isonuma Masaru、Mori Junichiro、Bollegala Danushka、Sakata Ichiro
- Journal Title
  
  Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL2020)
  
  Volume: 1 Pages: 995-1005
- DOI
  10.18653/v1/2020.acl-main.73
- Related Report
  2020 Research-status Report
- Peer Reviewed / Int'l Joint Research
[Journal Article] Constructive Approach for Early Extraction of Viral Spreading Social Issues from Twitter2020
- Author(s)
  Chou Jen Shiau、Masanao Ochi、Takeshi Sakaki、Ken Nagahama、Kanji Sakai、Junichiro Mori、Ichiro Sakata
- Journal Title
  
  Proceedings of ACM Web Science 2020 (WebSci2020)
  
  Volume: 1 Pages: 96-105
- DOI
  10.1145/3394231.3397899
- Related Report
  2020 Research-status Report
- Peer Reviewed / Open Access
[Presentation] 時系列構造化ニューラルトピックモデル2023
- Author(s)
  宮本望, 磯沼大, 高瀬翔, 森純一郎, 坂田一郎
- Organizer
  言語処理学会第29回年次大会
- Related Report
  2022 Annual Research Report
[Presentation] サーベイ論文自動生成に向けた大規模ベンチマークデータセットの構築2023
- Author(s)
  笠西哲, 磯沼大, 森純一郎, 坂田一郎
- Organizer
  言語処理学会第29回年次大会
- Related Report
  2022 Annual Research Report
[Presentation] Transformerモデルを用いた学術文献の言語情報と引用情報の融合2022
- Author(s)
  大知正直、城真範、森純一郎、坂田一郎
- Organizer
  2022年度人工知能学会全国大会
- Related Report
  2022 Annual Research Report
[Presentation] Self-attention機構に基づくDynamic Structured Neural Topic Model2022
- Author(s)
  宮本望、磯沼大、森純一郎、坂田一郎
- Organizer
  2022年度人工知能学会全国大会
- Related Report
  2022 Annual Research Report
[Presentation] Transformer Encoder-Decoderモデルによるサーベイ論文の自動生成2022
- Author(s)
  笠西哲、磯沼大、森純一郎、坂田一郎
- Organizer
  2022年度人工知能学会全国大会
- Related Report
  2022 Annual Research Report
[Presentation] Homophilyに基づくサイレントマジョリティの意見推定2022
- Author(s)
  向井穂乃花、磯沼大、森純一郎、坂田一郎
- Organizer
  言語処理学会第28回年次大会
- Related Report
  2021 Research-status Report
[Presentation] Which Is More Helpful in Finding Scientific Papers to Be Top-cited in the Future: Content or Citations? Case Analysis in the Field of Solar Cells 20092021
- Author(s)
  Masanao Ochi, Masanori Shiro, Junichiro Mori, Ichiro Sakata
- Organizer
  International Conference on Web Information Systems and Technologies
- Related Report
  2021 Research-status Report
- Int'l Joint Research
[Presentation] 科学研究のインパクト予測に向けた学術文献情報から抽出した分散表現による特定可能性分析2021
- Author(s)
  大知正直, 城真範, 森純一郎, 坂田一郎
- Organizer
  2021年度人工知能学会全国大会（第35回）
- Related Report
  2021 Research-status Report
[Presentation] Unsupervised Abstractive Opinion Summarization by Generating Sentences with Tree Structured Topic Guidance2021
- Author(s)
  Masaru Isonuma, Junichiro Mori, Danushka Bollegala, Ichiro Sakata
- Organizer
  The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP2021)
- Related Report
  2021 Research-status Report
- Int'l Joint Research
[Presentation] Citation Network Analysis of the COVID-19 Open Research Dataset2020
- Author(s)
  Junichiro Mori
- Organizer
  Second International Workshop on SCIentific DOCument Analysis (SCIDOCA 2020)
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] 潜在的なトピック構造を捉えた生成型教師なし意見要約2020
- Author(s)
  磯沼大
- Organizer
  情報処理学会第246回自然言語処理研究会
- Related Report
  2020 Research-status Report
[Presentation] トピック文生成による教師なし意見要約2020
- Author(s)
  磯沼大
- Organizer
  言語処理学会第27回年次大会
- Related Report
  2020 Research-status Report
[Presentation] 構築主義的アプローチに基づく情報拡散型社会問題の早期抽出2020
- Author(s)
  蕭喬仁
- Organizer
  2020年度人工知能学会全国大会
- Related Report
  2020 Research-status Report
[Remarks] COVID-19関連論文の引用解析
- URL
  https://academic-landscape.com/analysis/36093
- Related Report
  2020 Research-status Report

Pre-trained language models using the network structure of large-scale scholarly data

Principal Investigator

Mori Junichiro 東京大学, 大学院情報理工学系研究科, 准教授 (30508924)

¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)

Report

Research Products

[Journal Article] Classification of the Top-cited Literature by Fusing Linguistic and Citation Information with the Transformer Model2022

Author(s)

Journal Title

Related Report

[Journal Article] Predictive analysis of multiple future scientific impacts by embedding a heterogeneous network2022

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Unsupervised Abstractive Opinion Summarization by Generating Sentences with Tree-Structured Topic Guidance2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Tree-Structured Neural Topic Model2020

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Constructive Approach for Early Extraction of Viral Spreading Social Issues from Twitter2020

Author(s)

Journal Title

DOI

Related Report

[Presentation] 時系列構造化ニューラルトピックモデル2023

Author(s)

Organizer

Related Report

[Presentation] サーベイ論文自動生成に向けた大規模ベンチマークデータセットの構築2023

Author(s)

Organizer

Related Report

[Presentation] Transformerモデルを用いた学術文献の言語情報と引用情報の融合2022

Author(s)

Organizer

Related Report

[Presentation] Self-attention機構に基づくDynamic Structured Neural Topic Model2022

Author(s)

Organizer

Related Report

[Presentation] Transformer Encoder-Decoderモデルによるサーベイ論文の自動生成2022

Author(s)

Organizer

Related Report

[Presentation] Homophilyに基づくサイレントマジョリティの意見推定2022

Author(s)

Organizer

Related Report

[Presentation] Which Is More Helpful in Finding Scientific Papers to Be Top-cited in the Future: Content or Citations? Case Analysis in the Field of Solar Cells 20092021

Author(s)

Organizer

Related Report

[Presentation] 科学研究のインパクト予測に向けた学術文献情報から抽出した分散表現による特定可能性分析2021

Author(s)

Organizer

Related Report

[Presentation] Unsupervised Abstractive Opinion Summarization by Generating Sentences with Tree Structured Topic Guidance2021

Author(s)

Organizer

Related Report

[Presentation] Citation Network Analysis of the COVID-19 Open Research Dataset2020

Author(s)

Organizer

Related Report

[Presentation] 潜在的なトピック構造を捉えた生成型教師なし意見要約2020

Author(s)

Organizer

Related Report

[Presentation] トピック文生成による教師なし意見要約2020

Author(s)

Organizer

Related Report

[Presentation] 構築主義的アプローチに基づく情報拡散型社会問題の早期抽出2020

Author(s)