他者との言語的接触を考慮した個人が用いることばの意味の動的計算モデル

Research Project

Project/Area Number	22KJ0950
Project/Area Number (Other)	22J14451 (2022)
Research Category	Grant-in-Aid for JSPS Fellows
Allocation Type	Multi-year Fund (2023) Single-year Grants (2022)
Section	国内
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	The University of Tokyo
Principal Investigator	大葉大輔東京大学, 生産技術研究所, 特別研究員(PD)
Project Period (FY)	2023-03-08 – 2024-03-31
Project Status	Completed (Fiscal Year 2023)
Budget Amount *help	¥1,700,000 (Direct Cost: ¥1,700,000) Fiscal Year 2023: ¥800,000 (Direct Cost: ¥800,000) Fiscal Year 2022: ¥900,000 (Direct Cost: ¥900,000)
Keywords	自然言語処理 / 個人適応 / 語義曖昧性解消 / 解釈性 / バイアス除去 / 単語埋め込み / エンティティリンキング
Outline of Research at the Start	人々がマイクロブログ等を通して発信する実体験や思想を正確に理解することで、社会状況の把握や効果的なマーケティングが可能になる。しかしながら、我々人間が書く・話すことば（単語や句）の意味は、話題に対する専門性や偏った理解によって人により異なり、また同じ人のことばであっても他者の使用することばを読む・聞くことによってその意味は変化する。本研究では、周囲の人間の使用することばにも着目しながら、任意の書き手が任意の時点でことばに込める意味を連続実数ベクトル表現として計算する方法論を確立する。これにより、多種多様な人が発信するテキストを対象にした正確な言語処理の実現を目指す。
Outline of Annual Research Achievements	本研究課題は，任意の書き手が任意の時点で自然言語に込める意味を数理的に表現する方法論を確立するものである．初年度は，書き手レベルで単語の意味表現を計算するための基盤モデル [Oba et al., 2020] の妥当性検証に取り組み，定量的・定性的な観点から訓練データの特徴量が計算結果に与える影響を示した．加えて，文脈依存な意味計算が行えるLLMsを基盤とすることで自然言語が持つ意味の時間的変動性をも捉えることを狙いに，LLMsを個人適応する施策に取り組み始め，適用可能なデータセットの広範性を拡大する手法を開発した．また，書き手・時間といった要素以外にもテキスト理解のために必要となる言語外情報, ”世界知識”を再学習等のコストを払わずに利用するべく，世界知識の連続表現を説明文等から動的に推定・補完する手法を提案し，国際会議EMNLPに採択された．最終年度前半は主に，LLMsを個人適応する施策に引き続き取り組んだ．個人適応のための書き手固有な付加情報を明らかにする過程で，パラメタなどの連続的な付加情報よりもランダムID列や過去会話履歴などの離散的な付加情報の方がLLMsの個人適応を助けるという知見を，異なるデータおよびLLMsの設定において検証した．成果はプレプリントとして公開した．一方，上記研究ではLLMsの性質を制御することが難しい例も見られた．本年度後半には，その一原因は，LLMsが事前学習データから暗黙的に獲得した”偏見”にあるのではないかと仮説を立て，後処理的に偏見を削除する研究に取り組んだ．その過程で，反実仮想的・説明文的なテキストを追加入力することでLLMsに錯覚を起こし，偏見を抑制できることを示した．同時に，事前訓練そのものにも原因の究明を求めた．事前訓練を通して特定の文脈に過適合していることを示した．これら成果は国際会議EACLに採択された．

Report

(2 results)

2023 Annual Research Report
2022 Annual Research Report

Research Products
(7 results)

All 2024 2022

All Journal Article (3 results) (of which Peer Reviewed: 3 results, Open Access: 3 results) Presentation (4 results) (of which Int'l Joint Research: 2 results)

[Journal Article] In-Contextual Gender Bias Suppression for Large Language Models2024
- Author(s)
  Daisuke Oba, Masahiro Kaneko, Danushka Bollegala.
- Journal Title
  
  Findings of the Association for Computational Linguistics: EACL 2024
  
  Volume: - Pages: 1722-1742
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Tracing the Roots of Facts in Multilingual Language Models: Independent, Shared, and Transferred Knowledge2024
- Author(s)
  Xin Zhao, Naoki Yoshinaga, Daisuke Oba
- Journal Title
  
  Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics
  
  Volume: - Pages: 2088-2102
- Related Report
  2023 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Entity Embedding Completion for Wide-Coverage Entity Disambiguation2022
- Author(s)
  Daisuke Oba, Ikuya Yamada, Naoki Yoshinaga, Masashi Toyoda
- Journal Title
  
  Findings of the Association for Computational Linguistics: EMNLP 2022
  
  Volume: - Pages: 6333-6344
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Presentation] 文脈内学習に基づく大規模言語モデルの性別バイアス抑制2024
- Author(s)
  大葉大輔, 金子正弘, Danushka Bollegala.
- Organizer
  言語処理学会第30回年次大会(NLP2024)
- Related Report
  2023 Annual Research Report
[Presentation] 多様なプロンプトを用いた言語モデルの多角的な知識評価2024
- Author(s)
  趙信, 吉永直樹, 大葉大輔.
- Organizer
  第259回自然言語処理研究発表会
- Related Report
  2023 Annual Research Report
[Presentation] In-Contextual Gender Bias Suppression for Large Language Models2024
- Author(s)
  Daisuke Oba, Masahiro Kaneko, Danushka Bollegala.
- Organizer
  The 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL2024)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research
[Presentation] Tracing the Roots of Facts in Multilingual Language Models: Independent, Shared, and Transferred Knowledge2024
- Author(s)
  Xin Zhao, Naoki Yoshinaga, Daisuke Oba.
- Organizer
  The 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL2024)
- Related Report
  2023 Annual Research Report
- Int'l Joint Research

他者との言語的接触を考慮した個人が用いることばの意味の動的計算モデル

Principal Investigator

大葉 大輔 東京大学, 生産技術研究所, 特別研究員(PD)

¥1,700,000 (Direct Cost: ¥1,700,000)

Report

Research Products

[Journal Article] In-Contextual Gender Bias Suppression for Large Language Models2024

Author(s)

Journal Title

Related Report

[Journal Article] Tracing the Roots of Facts in Multilingual Language Models: Independent, Shared, and Transferred Knowledge2024

Author(s)

Journal Title

Related Report

[Journal Article] Entity Embedding Completion for Wide-Coverage Entity Disambiguation2022

Author(s)

Journal Title

Related Report

[Presentation] 文脈内学習に基づく大規模言語モデルの性別バイアス抑制2024

Author(s)

Organizer

Related Report

[Presentation] 多様なプロンプトを用いた言語モデルの多角的な知識評価2024

Author(s)

Organizer

Related Report

[Presentation] In-Contextual Gender Bias Suppression for Large Language Models2024

Author(s)

Organizer

Related Report

[Presentation] Tracing the Roots of Facts in Multilingual Language Models: Independent, Shared, and Transferred Knowledge2024

Author(s)

Organizer

Related Report

大葉大輔東京大学, 生産技術研究所, 特別研究員(PD)