Building structured Knowledge Base for trustable NLP application systems by Resource by Collaborative Construction scheme

Research Project

Project/Area Number	20H00617
Research Category	Grant-in-Aid for Scientific Research (A)
Allocation Type	Single-year Grants
Section	一般
Review Section	Medium-sized Section 61:Human informatics and related fields
Research Institution	Institute of Physical and Chemical Research
Principal Investigator	SEKINE SATOSHI 国立研究開発法人理化学研究所, 革新知能統合研究センター, チームリーダー (00813255)
Project Period (FY)	2020-04-01 – 2023-03-31
Project Status	Completed (Fiscal Year 2023)
Budget Amount *help	¥45,240,000 (Direct Cost: ¥34,800,000、Indirect Cost: ¥10,440,000) Fiscal Year 2022: ¥14,560,000 (Direct Cost: ¥11,200,000、Indirect Cost: ¥3,360,000) Fiscal Year 2021: ¥14,560,000 (Direct Cost: ¥11,200,000、Indirect Cost: ¥3,360,000) Fiscal Year 2020: ¥16,120,000 (Direct Cost: ¥12,400,000、Indirect Cost: ¥3,720,000)
Keywords	知識グラフ / 自然言語処理 / 情報抽出 / 固有表現 / 属性性抽出 / テキスト分類 / エンティティーリンキング / 知識構築 / 協働による知識構築 / 文書分類 / 協働によるリソース構築
Outline of Research at the Start	信頼できる人工知能システムは人間のわかる言葉でその判断を説明できるようになることが必要だと考えます。そのために、人工知能が扱える形の世界知識が必要になります。ただし、こういった知識をきちんと構築することは非常に難かしく、現状では、人工知能の応用に耐えうる精度と規模の知識は存在しません。本プロジェクトでは世界中の人が編集し協働で構築されているWikipediaを人工知能が使える形に構造化します。この構造化を人手によって全て実現することはコストや時間的にも難しいので、多数の人工知能のシステムを利用し、その結果を集約することによる構造化知識の構築を目指しています。
Outline of Final Research Achievements	The main results of the Shinra Project are as follows. These are published on the Shinra Project homepage. 1) Category classification data for all items in Japanese Wikipedia, attribute value extraction data for samples of all categories, entity linking data, and comprehensive data for category classification data in 30 languages. 2) A baseline system that semi-automatically performs each task of category classification, attribute value extraction and entity linking. 3) Access API for Shinra data from applications that use the data.
Academic Significance and Societal Importance of the Research Achievements	森羅データを含む、本プロジェクトの成果は、自然言語処理において必要不可欠なものであり、生成AIにおけるハルシネーション対応や信頼できる人工知能のための説明できる自然言語処理コンポーネントの中心的なデータとして利用できる。このようなデータは世界的にもユニークな内容であり、このデータの応用は社会において広く活用され、信頼できる人工知能の普及に役立つ。

Report

(5 results)

2023 Final Research Report ( PDF )
2022 Annual Research Report
2021 Annual Research Report
2020 Comments on the Screening Results Annual Research Report

Research Products
(22 results)

All 2024 2023 2022 2021 2020

All Presentation (22 results) (of which Int'l Joint Research: 10 results)

[Presentation] 森羅プロジェクト2024
- Author(s)
  関根聡　宇佐美佑　門脇一真　三浦明波　中山功太　安藤まや
- Organizer
  言語処理学会
- Related Report
  2022 Annual Research Report
[Presentation] JEMHopQA:日本語マルチホップQAデータセットの改良2024
- Author(s)
  石井愛 , 井之上直也, 鈴木久美, 関根聡
- Organizer
  言語処理学会
- Related Report
  2022 Annual Research Report
[Presentation] マルチホップQAの根拠情報を用いたLLMの``偽''正解の分析2024
- Author(s)
  石井愛 , 井之上直也, 鈴木久美, 関根聡
- Organizer
  言語処理学会
- Related Report
  2022 Annual Research Report
[Presentation] JEMHopQA: Dataset for Japanese Explainable Multi-Hop Question Answering2024
- Author(s)
  Ai Ishii, Naoya Inoue, Hisami Suzuki and Satoshi Sekine
- Organizer
  LREC-COLING 2024
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] 森羅タスクと森羅公開データ2023
- Author(s)
  関根聡, 中山功太, 隅田飛鳥, 渋木英潔, 門脇一真, 三浦明波, 宇佐美佑, 安藤まや
- Organizer
  言語処理学会
- Related Report
  2022 Annual Research Report
[Presentation] 森羅タスクと森羅公開データ2023
- Author(s)
  関根聡 (理研), 中山功太 (理研/筑波大), 隅田飛鳥 (理研), 渋木英潔 (BESNA), 門脇一真 (日本総研), 三浦明波 (アティード), 宇佐美佑 (Usami LLC), 安藤まや (フリー)
- Organizer
  言語処理学会年次大会
- Related Report
  2021 Annual Research Report
[Presentation] 拡張固有表現に分類された31言語のWikipedia知識ベース2022
- Author(s)
  関根聡, 中山功太, 野本昌子, 安藤まや, 隅田飛鳥, 松田耕史
- Organizer
  言語処理学会
- Related Report
  2022 Annual Research Report
[Presentation] 森羅2021-LinkJP結果の分析:BERTとルールベースの比較2022
- Author(s)
  野本昌子, 宇佐美佑, 安藤まや, 中山功太, 関根聡
- Organizer
  言語処理学会
- Related Report
  2022 Annual Research Report
[Presentation] Resource of Wikipedias in 31 Languages Categorized into Fine-Grained Named Entities2022
- Author(s)
  Satoshi Sekine, Kouta Nakayama, Masako Nomoto, Maya Ando, Asuka Sumida, Koji Matsuda
- Organizer
  COLING 2022
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Resource of Wikipedias in 31 Languages Categorized into Fine-Grained Named Entities2022
- Author(s)
  1.Satoshi Sekine, Kouta Nakayama, Masako Nomoto, Maya Ando, Asuka Sumida, Koji Matsuda
- Organizer
  COLING 2022
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] 拡張固有表現に分類された31言語のWikipedia知識ベース2022
- Author(s)
  関根聡, 中山功太, 野本昌子 (理研), 安藤まや (フリー), 隅田飛鳥, 松田耕史 (理研)
- Organizer
  言語処理学会年次大会
- Related Report
  2021 Annual Research Report
[Presentation] 森羅2021-LinkJP結果の分析:BERTとルールベースの比較2022
- Author(s)
  野本昌子, 宇佐美佑, 安藤まや, 中山功太, 関根聡
- Organizer
  言語処理学会第28回年次大会
- Related Report
  2020 Annual Research Report
[Presentation] 拡張固有表現に分類された31言語のWikipedia知識ベース2022
- Author(s)
  関根聡, 中山功太, 野本昌子, 安藤まや, 隅田飛鳥, 松田耕史
- Organizer
  言語処理学会第28回年次大会
- Related Report
  2020 Annual Research Report
[Presentation] SHINRA2020-ML: Categorizing 30-language Wikipedia into fine-grained NE based on “Resource by Collaborative Contribution” scheme”2021
- Author(s)
  2.Satoshi Sekine, Kouta Nakayama, Koji Matsuda, Asuka Sumida, Maya Ando, Yu Usami, Masako Nomoto
- Organizer
  Automated Knowledge Base Construction
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Co-Teaching Student-Model through Submission Results of Shared Task2021
- Author(s)
  3.Kouta Nakayama, Yukino Baba, Satoshi Sekine
- Organizer
  EMNLP 2021
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] SHINRA2020-ML:30 言語の Wikipedia ページの分類2021
- Author(s)
  関根聡, 野本昌子, 中山功太, 隅田飛鳥, 松田耕史, 安藤まや
- Organizer
  言語処理学会第27回年次大会
- Related Report
  2020 Annual Research Report
[Presentation] 能動的サンプリングを用いたリソース構築共有タスクにおける予測対象データ削減2021
- Author(s)
  中山功太, 栗田修平, 馬場雪乃, 関根聡
- Organizer
  言語処理学会第27回年次大会
- Related Report
  2020 Annual Research Report
[Presentation] SHINRA2020-ML: Categorizing 30-language Wikipedia into fine-grained NE based on “Resource by Collaborative Contribution” scheme2021
- Author(s)
  Satoshi Sekine, Kouta Nakayama, Maya Ando, Yu Usami, Masako Nomoto and Koji Matsuda
- Organizer
  3rd conference on the Automated Knowledge Base Construction (AKBC 2021)
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] Studio Ousia at the NTCIR-15 SHINRA2020-ML Task2020
- Author(s)
  Sosuke Nishikawa and Ikuya Yamada
- Organizer
  In Proceedings of the NTCIR-15 Conference
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] HUKB at SHINRA2020-ML task2020
- Author(s)
  Masaharu Yoshioka and Yoshiaki Koitabashi
- Organizer
  In Proceedings of the NTCIR-15 Conference
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] LIAT Team’s Wikipedia Classifier at NTCIR-15 SHINRA2020-ML: Classification Task2020
- Author(s)
  Kouta Nakayama and Satoshi Sekine
- Organizer
  In Proceedings of the NTCIR-15 Conference
- Related Report
  2020 Annual Research Report
- Int'l Joint Research
[Presentation] Overview of SHINRA2020-ML Task2020
- Author(s)
  Satoshi Sekine, Masako Nomoto, Kouta Nakayama, Asuka Sumida, Koji Matsuda, and Maya Ando
- Organizer
  In Proceedings of the NTCIR-15 Conference
- Related Report
  2020 Annual Research Report
- Int'l Joint Research

Building structured Knowledge Base for trustable NLP application systems by Resource by Collaborative Construction scheme

Principal Investigator

SEKINE SATOSHI 国立研究開発法人理化学研究所, 革新知能統合研究センター, チームリーダー (00813255)

¥45,240,000 (Direct Cost: ¥34,800,000、Indirect Cost: ¥10,440,000)

Report

Research Products

[Presentation] 森羅プロジェクト2024

Author(s)

Organizer

Related Report

[Presentation] JEMHopQA:日本語マルチホップQAデータセットの改良2024

Author(s)

Organizer

Related Report

[Presentation] マルチホップQAの根拠情報を用いたLLMの``偽''正解の分析2024

Author(s)

Organizer

Related Report

[Presentation] JEMHopQA: Dataset for Japanese Explainable Multi-Hop Question Answering2024

Author(s)

Organizer

Related Report

[Presentation] 森羅タスクと森羅公開データ2023

Author(s)

Organizer

Related Report

[Presentation] 森羅タスクと森羅公開データ2023

Author(s)

Organizer

Related Report

[Presentation] 拡張固有表現に分類された31言語のWikipedia知識ベース2022

Author(s)

Organizer

Related Report

[Presentation] 森羅2021-LinkJP結果の分析:BERTとルールベースの比較2022

Author(s)

Organizer

Related Report

[Presentation] Resource of Wikipedias in 31 Languages Categorized into Fine-Grained Named Entities2022

Author(s)

Organizer

Related Report

[Presentation] Resource of Wikipedias in 31 Languages Categorized into Fine-Grained Named Entities2022

Author(s)

Organizer

Related Report

[Presentation] 拡張固有表現に分類された31言語のWikipedia知識ベース2022

Author(s)

Organizer

Related Report

[Presentation] 森羅2021-LinkJP結果の分析:BERTとルールベースの比較2022

Author(s)

Organizer

Related Report

[Presentation] 拡張固有表現に分類された31言語のWikipedia知識ベース2022

Author(s)

Organizer

Related Report

[Presentation] SHINRA2020-ML: Categorizing 30-language Wikipedia into fine-grained NE based on “Resource by Collaborative Contribution” scheme”2021

Author(s)

Organizer

Related Report

[Presentation] Co-Teaching Student-Model through Submission Results of Shared Task2021

Author(s)

Organizer

Related Report

[Presentation] SHINRA2020-ML:30 言語の Wikipedia ページの分類2021

Author(s)

Organizer

Related Report

[Presentation] 能動的サンプリングを用いたリソース構築共有タスクにおける予測対象データ削減2021

Author(s)

Organizer

Related Report

[Presentation] SHINRA2020-ML: Categorizing 30-language Wikipedia into fine-grained NE based on “Resource by Collaborative Contribution” scheme2021

Author(s)

Organizer

Related Report

[Presentation] Studio Ousia at the NTCIR-15 SHINRA2020-ML Task2020

Author(s)