• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Pre-trained language models using the network structure of large-scale scholarly data

Research Project

Project/Area Number 20K12076
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeMulti-year Fund
Section一般
Review Section Basic Section 62020:Web informatics and service informatics-related
Research InstitutionThe University of Tokyo

Principal Investigator

Mori Junichiro  東京大学, 大学院情報理工学系研究科, 准教授 (30508924)

Project Period (FY) 2020-04-01 – 2023-03-31
Project Status Completed (Fiscal Year 2022)
Budget Amount *help
¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)
Fiscal Year 2022: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Fiscal Year 2021: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2020: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
Keywords学術文献データ / 事前学習言語モデル / 引用ネットワーク / 表現学習 / 事前学習モデル
Outline of Research at the Start

本研究では、大規模な学術文献データから有用な知識の抽出と発見を支援することを目的に、学術文献データのネットワーク構造を考慮した大規模ハイパーテキストデータからの事前学習言語モデルの構築に関する基本的な方法論の研究を行う。本研究では学術文献のようにテキスト同士が関係で結ばれ全体としてネットワーク構造を持ったテキストコーパスからの事前学習言語モデルを構築するための知見を明らかにする。その上で、学術文献データからの知識抽出・発見に関連するタスクに事前学習言語モデルを適用し評価を行い、実応用に関する知見を明らかにする。

Outline of Final Research Achievements

The importance of extracting diverse academic knowledge from vast amounts of academic literature data that leads to new discoveries and problem-solving has been recognized. In this study, with the aim of supporting the extraction and discovery of useful knowledge from large-scale academic literature data, we conducted research on the fundamental methodology for constructing pre-trained language models from large-scale hypertext data that considers the network structure of academic literature data. As research results, we developed a technology for constructing pre-trained language models from large-scale academic literature data based on the citation relationships between documents in the form of hypertext data, as well as a technology for supporting the extraction and discovery of useful knowledge from large-scale academic literature data using pre-trained language models.

Academic Significance and Societal Importance of the Research Achievements

まず、COVID-19に関する科学的エビデンスや重要な技術などの情報を抽出しその解析結果を広く一般に公開した。次に、 引用ネットワーク構造を考慮した文献コーパスからの事前学習言語モデル構築のための予測問題の設計と実装に取り組んだ。また、事前学習言語モデルにより獲得された分散表現を用いた引用ネットワークのリンク予測および ノード分類タスクによる評価に取り組んだ。 最後に、期間中に研究開発を行った手法を応用し、萌芽的な学術論文の発見、サーベイ論文の自動生成、研究トピックの抽出と時系列変化の可視化など、複数の新たなタスクに取り組んだ。これらの研究成果を複数の学会で発表した。

Report

(4 results)
  • 2022 Annual Research Report   Final Research Report ( PDF )
  • 2021 Research-status Report
  • 2020 Research-status Report
  • Research Products

    (19 results)

All 2023 2022 2021 2020 Other

All Journal Article (5 results) (of which Int'l Joint Research: 2 results,  Peer Reviewed: 5 results,  Open Access: 3 results) Presentation (13 results) (of which Int'l Joint Research: 3 results) Remarks (1 results)

  • [Journal Article] Classification of the Top-cited Literature by Fusing Linguistic and Citation Information with the Transformer Model2022

    • Author(s)
      Masanao Ochi, Masanori Shiro, Junichiro Mori, Ichiro Sakata
    • Journal Title

      Proceedings of the 18th International Conference on Web Information Systems and Technologies

      Volume: -

    • Related Report
      2022 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Predictive analysis of multiple future scientific impacts by embedding a heterogeneous network2022

    • Author(s)
      Masanao Ochi, Masanori Shiro, Jun’ichiro Mori, Ichiro Sakata
    • Journal Title

      PLOS ONE

      Volume: 17-9 Issue: 9 Pages: 0274253-0274253

    • DOI

      10.1371/journal.pone.0274253

    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Unsupervised Abstractive Opinion Summarization by Generating Sentences with Tree-Structured Topic Guidance2021

    • Author(s)
      Masaru Isonuma, Junichiro Mori, Danushka Bollegala, Ichiro Sakata
    • Journal Title

      Transactions of the Association for Computational Linguistics

      Volume: 9 Pages: 945-961

    • DOI

      10.1162/tacl_a_00406

    • Related Report
      2021 Research-status Report
    • Peer Reviewed / Open Access / Int'l Joint Research
  • [Journal Article] Tree-Structured Neural Topic Model2020

    • Author(s)
      Isonuma Masaru、Mori Junichiro、Bollegala Danushka、Sakata Ichiro
    • Journal Title

      Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL2020)

      Volume: 1 Pages: 995-1005

    • DOI

      10.18653/v1/2020.acl-main.73

    • Related Report
      2020 Research-status Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] Constructive Approach for Early Extraction of Viral Spreading Social Issues from Twitter2020

    • Author(s)
      Chou Jen Shiau、Masanao Ochi、Takeshi Sakaki、Ken Nagahama、Kanji Sakai、Junichiro Mori、Ichiro Sakata
    • Journal Title

      Proceedings of ACM Web Science 2020 (WebSci2020)

      Volume: 1 Pages: 96-105

    • DOI

      10.1145/3394231.3397899

    • Related Report
      2020 Research-status Report
    • Peer Reviewed / Open Access
  • [Presentation] 時系列構造化ニューラルトピックモデル2023

    • Author(s)
      宮本望, 磯沼大, 高瀬翔, 森純一郎, 坂田一郎
    • Organizer
      言語処理学会第29回年次大会
    • Related Report
      2022 Annual Research Report
  • [Presentation] サーベイ論文自動生成に向けた大規模ベンチマークデータセットの構築2023

    • Author(s)
      笠西哲, 磯沼大, 森純一郎, 坂田一郎
    • Organizer
      言語処理学会第29回年次大会
    • Related Report
      2022 Annual Research Report
  • [Presentation] Transformerモデルを用いた学術文献の言語情報と引用情報の融合2022

    • Author(s)
      大知正直、城真範、森純一郎、坂田一郎
    • Organizer
      2022年度人工知能学会全国大会
    • Related Report
      2022 Annual Research Report
  • [Presentation] Self-attention機構に基づくDynamic Structured Neural Topic Model2022

    • Author(s)
      宮本望、磯沼大、森純一郎、坂田一郎
    • Organizer
      2022年度人工知能学会全国大会
    • Related Report
      2022 Annual Research Report
  • [Presentation] Transformer Encoder-Decoderモデルによるサーベイ論文の自動生成2022

    • Author(s)
      笠西哲、磯沼大、森純一郎、坂田一郎
    • Organizer
      2022年度人工知能学会全国大会
    • Related Report
      2022 Annual Research Report
  • [Presentation] Homophilyに基づくサイレントマジョリティの意見推定2022

    • Author(s)
      向井穂乃花、磯沼大、森純一郎、坂田一郎
    • Organizer
      言語処理学会第28回年次大会
    • Related Report
      2021 Research-status Report
  • [Presentation] Which Is More Helpful in Finding Scientific Papers to Be Top-cited in the Future: Content or Citations? Case Analysis in the Field of Solar Cells 20092021

    • Author(s)
      Masanao Ochi, Masanori Shiro, Junichiro Mori, Ichiro Sakata
    • Organizer
      International Conference on Web Information Systems and Technologies
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] 科学研究のインパクト予測に向けた学術文献情報から抽出した分散表現による特定可能性分析2021

    • Author(s)
      大知 正直, 城 真範, 森 純一郎, 坂田 一郎
    • Organizer
      2021年度 人工知能学会全国大会(第35回)
    • Related Report
      2021 Research-status Report
  • [Presentation] Unsupervised Abstractive Opinion Summarization by Generating Sentences with Tree Structured Topic Guidance2021

    • Author(s)
      Masaru Isonuma, Junichiro Mori, Danushka Bollegala, Ichiro Sakata
    • Organizer
      The 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP2021)
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] Citation Network Analysis of the COVID-19 Open Research Dataset2020

    • Author(s)
      Junichiro Mori
    • Organizer
      Second International Workshop on SCIentific DOCument Analysis (SCIDOCA 2020)
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] 潜在的なトピック構造を捉えた生成型教師なし意見要約2020

    • Author(s)
      磯沼大
    • Organizer
      情報処理学会 第246回自然言語処理研究会
    • Related Report
      2020 Research-status Report
  • [Presentation] トピック文生成による教師なし意見要約2020

    • Author(s)
      磯沼大
    • Organizer
      言語処理学会第27回年次大会
    • Related Report
      2020 Research-status Report
  • [Presentation] 構築主義的アプローチに基づく情報拡散型社会問題の早期抽出2020

    • Author(s)
      蕭喬仁
    • Organizer
      2020年度人工知能学会全国大会
    • Related Report
      2020 Research-status Report
  • [Remarks] COVID-19関連論文の引用解析

    • URL

      https://academic-landscape.com/analysis/36093

    • Related Report
      2020 Research-status Report

URL: 

Published: 2020-04-28   Modified: 2024-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi