Constructing Reading Comprehension Datasets to Evaluate Discourse-level Language Understanding

Research Project

Project/Area Number	22K17954
Research Category	Grant-in-Aid for Early-Career Scientists
Allocation Type	Multi-year Fund
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	National Institute of Informatics
Principal Investigator	菅原朔国立情報学研究所, コンテンツ科学研究系, 助教 (10855894)
Project Period (FY)	2022-04-01 – 2025-03-31
Project Status	Granted (Fiscal Year 2023)
Budget Amount *help	¥4,550,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥1,050,000) Fiscal Year 2024: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000) Fiscal Year 2023: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000) Fiscal Year 2022: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Keywords	自然言語理解 / 自然言語処理 / 計算言語学 / 文章読解 / 談話理解
Outline of Research at the Start	文章を読んで質問に答えさせる読解タスクが言語理解を実現するシステムの評価タスクとして近年さかんに取り扱われているが、既存のデータセットの多くでは複数の文や文章全体にわたる内容にかかわる理解を問うことができず、人間らしい高度な言語理解を評価するためのタスクとして大きな限界があった。本研究では文の相互関係の理解に関連するような言語現象・推論が含まれるような文章について質問を収集する。
Outline of Annual Research Achievements	2021年度後半から2022年度にかけて大規模なパラメータ数からなるアーキテクチャを大規模なコーパスの上で訓練することで構築した大規模言語モデルと呼ばれるシステムを基礎にした研究が急増している。そのなかで、本研究はとくに文の相互関係の理解に注目し、説明性の高い談話的文章理解を問う評価用データセットの構築を目指している。高度化したシステムの振る舞いを評価するにあたって単文にとどまらない複数の文の理解を総合的に問うアプローチは重要性が高く、集中的に取り組まれる必要がある。大規模言語モデルの発展と軌を一にして、言語理解の評価用のデータセットも多様化・大規模化する傾向があり、現状のデータセットで何が取り組まれており、現状のシステムに何ができるのか、広範で正確な調査が必要とされている。 2023年度においてはこうした進展を踏まえた文献調査を進めながら、大きく分けて(1)既存の評価手法・データセットの分析と(2)文節間・文間の関係の理解を含めた新規データセットの作成を行った。 (1)においては、心理測定学における妥当性の構成概念を援用し、既存のデータセット設計が満たすべき要件についてチェックリストを整備した。(2)においては、既存の選択式の機械読解データセットに対して、その各選択肢が正答または誤答である理由についての問題を同じく選択式の機械読解データセットとして作成することで、根拠理解という側面から自然言語理解モデルの一貫的な評価に取り組んだ。
Current Status of Research Progress	Current Status of Research Progress 3: Progress in research has been slightly delayed. Reason 直近1,2年で大規模言語モデルが著しい発展を見せており、これをもとにした評価方法の概観・分析やデータセットの構築を行った。一方で、談話理解の評価についてはこうして構築したデータセットの副次的な項目に留まっており、それ自体を中心的に評価するデータセットの構築には至っていない。一方で、そのようなデータセットの必要性・昨今のシステムの評価における重要性においても広範で正確な調査とともに検討を深める必要がある。
Strategy for Future Research Activity	次年度も同様にシステム分析・データセット構築を中心に進める。現状ひろく使われているシステムの再現や現状の能力の把握・有力なデータセットにおける評価項目・振る舞いの調査を網羅的に行うことも重要である。本研究が目的としている評価用データセットの構築をより有意義なものとするため、とくに文関係の把握に注目し、重要な言語現象が評価対象になっているのか、システムはどのような性能を示しているのかについて理解を深めることを最優先目標とする。また同時に、そのような文関係理解などの高度な能力がシステムにどのように獲得されるかという発生的な観点も含めて評価を行えることが望ましいと考えている。

Report

(2 results)

2023 Research-status Report
2022 Research-status Report

Research Products
(9 results)

All 2023 2022

All Journal Article (9 results) (of which Peer Reviewed: 9 results, Open Access: 9 results)

[Journal Article] Analyzing the Effectiveness of the Underlying Reasoning Tasks in Multi-hop Question Answering2023
- Author(s)
  Ho Xanh、Duong Nguyen Anh-Khoa、Sugawara Saku、Aizawa Akiko
- Journal Title
  
  Findings of the Association for Computational Linguistics: EACL 2023
  
  Volume: 1 Pages: 1163-1180
- DOI
  10.18653/v1/2023.findings-eacl.87
- Related Report
  2023 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] On Degrees of Freedom in Defining and Testing Natural Language Understanding2023
- Author(s)
  Sugawara Saku、Tsugita Shun
- Journal Title
  
  Findings of the Association for Computational Linguistics: ACL 2023
  
  Volume: 1 Pages: 13625-13649
- DOI
  10.18653/v1/2023.findings-acl.861
- Related Report
  2023 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] PROPRES: Investigating the Projectivity of Presupposition with Various Triggers and Environments2023
- Author(s)
  Asami Daiki、Sugawara Saku
- Journal Title
  
  Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL)
  
  Volume: 1 Pages: 122-137
- DOI
  10.18653/v1/2023.conll-1.9
- Related Report
  2023 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Evaluating the Rationale Understanding of Critical Reasoning in Logical Reading Comprehension2023
- Author(s)
  Kawabata Akira、Sugawara Saku
- Journal Title
  
  Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
  
  Volume: 1 Pages: 116-143
- DOI
  10.18653/v1/2023.emnlp-main.9
- Related Report
  2023 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Penalizing Confident Predictions on Largely Perturbed Inputs Does Not Improve Out-of-Distribution Generalization in Question Answering2023
- Author(s)
  Kazutoshi Shinoda, Saku Sugawara, Akiko Aizawa
- Journal Title
  
  Proceedings of the Workshop on Knowledge Augmented Methods for NLP
  
  Volume: 1
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Which Shortcut Solution Do Question Answering Models Prefer to Learn?2023
- Author(s)
  Kazutoshi Shinoda, Saku Sugawara, Akiko Aizawa
- Journal Title
  
  Proceedings of the 37th AAAI Conference on Artificial Intelligence
  
  Volume: 1
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] How Well Do Multi-hop Reading Comprehension Models Understand Date Information?2022
- Author(s)
  Xanh Ho, Saku Sugawara, Akiko Aizawa
- Journal Title
  
  Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing
  
  Volume: 1 Pages: 470-479
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Possible Stories: Evaluating Situated Commonsense Reasoning under Multiple Possible Scenarios2022
- Author(s)
  Mana Ashida, Saku Sugawara
- Journal Title
  
  Proceedings of the 29th International Conference on Computational Linguistics
  
  Volume: 1 Pages: 3606-3630
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Look to the Right: Mitigating Relative Position Bias in Extractive Question Answering2022
- Author(s)
  Kazutoshi Shinoda, Saku Sugawara, Akiko Aizawa
- Journal Title
  
  Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP
  
  Volume: 1 Pages: 418-425
- Related Report
  2022 Research-status Report
- Peer Reviewed / Open Access

Constructing Reading Comprehension Datasets to Evaluate Discourse-level Language Understanding

Principal Investigator

菅原 朔 国立情報学研究所, コンテンツ科学研究系, 助教 (10855894)

¥4,550,000 (Direct Cost: ¥3,500,000、Indirect Cost: ¥1,050,000)

Current Status of Research Progress

Reason

Report

Research Products

[Journal Article] Analyzing the Effectiveness of the Underlying Reasoning Tasks in Multi-hop Question Answering2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] On Degrees of Freedom in Defining and Testing Natural Language Understanding2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] PROPRES: Investigating the Projectivity of Presupposition with Various Triggers and Environments2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Evaluating the Rationale Understanding of Critical Reasoning in Logical Reading Comprehension2023

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Penalizing Confident Predictions on Largely Perturbed Inputs Does Not Improve Out-of-Distribution Generalization in Question Answering2023

Author(s)

Journal Title

Related Report

[Journal Article] Which Shortcut Solution Do Question Answering Models Prefer to Learn?2023

Author(s)

Journal Title

Related Report

[Journal Article] How Well Do Multi-hop Reading Comprehension Models Understand Date Information?2022

Author(s)

Journal Title

Related Report

[Journal Article] Possible Stories: Evaluating Situated Commonsense Reasoning under Multiple Possible Scenarios2022

Author(s)

Journal Title

Related Report

[Journal Article] Look to the Right: Mitigating Relative Position Bias in Extractive Question Answering2022

Author(s)

Journal Title

Related Report

菅原朔国立情報学研究所, コンテンツ科学研究系, 助教 (10855894)