2023 Fiscal Year Research-status Report

Extending knowledge graph structures through deep text understanding

Research Project

Project/Area Number	22K12044
Research Institution	Waseda University
Principal Investigator	岩井原瑞穂早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (40253538)
Project Period (FY)	2022-04-01 – 2025-03-31
Keywords	テキストマイニング / 情報抽出 / 深層学習 / 学習済み言語モデル / 知識グラフ / テキスト分類
Outline of Annual Research Achievements	Wikipediaなどの知識蓄積型コンテンツからは構造的データが知識グラフとして抽出され，検索結果の分類や種々の知識処理に活用されている．知識グラフを充実させるためには，Wikipediaやツィート，文書から新たな知識を抽出して知識グラフを拡張することが必要であり，そのためにはウェブコンテンツの構造情報やテキスト情報を統合的に分析する必要がある．本研究では，(1)少量の訓練データのもとでの多ラベル文書分類タスク, (2)学習済み言語モデルを活用したキーフレーズ抽出・生成,(3)知識グラフの構造的拡張の３つのテーマについて，テキストおよびグラフ構造の深層分析に基づく新たな情報抽出手法を開発するのが目的である．本年度は，(1)については，ラベル名に関連した特徴的な語句をマスク言語モデルにより発見して追加し，さらに注意機構により文単位の重要度を求める方法を開発した．これを文書が規定のアスペクトについて言及しているかを判定する多ラベル文書分類タスクに適用し，精度が向上すること示した． (2)のキーフレーズ抽出・生成は，文書に出現しているキーフレーズと出現していないキーフレーズにタスクを分けて，それぞれ別の生成型言語モデルを訓練し，さらに学習目標のキーフレーズの出現順序をシャッフルして順序への依存性を削減するなどの手法を組み合わせることにより，既知の結果を上回るキーフレーズ抽出・生成の性能を達成できることを示した． (3)の知識グラフの構造的拡張は，Wikipediaのリストとカテゴリーがどのような意味的な型を持つ要素からなるかを判定する問題について，少量の訓練用のデータを拡張するために，リストとカテゴリーの意味的・文法的・構造的な特徴から疑似ラベルを自動生成するルールを設計した．この疑似ラベルによる言語モデルの訓練により，意味的な型を求める精度を向上できることを示した．
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 当初の計画通り，各研究課題について遂行することができた．
Strategy for Future Research Activity	(1)の少量の訓練データのもとでの文書分類タスクについては，prompt tuningという手法の改良をさらに進めるとともに，ラベルなし文書に予測結果を疑似ラベルとして与えて自己訓練を行う際の，新たに設計した閾値関数について評価を進めるとともに，文書間の類似性が訓練により変化する状況を可視化して効果を検証する． (2)のキーフレーズ抽出については，大規模言語モデルによるキーフレーズ生成を検討し，大規模言語モデルの訓練方法について改良を進める． (3)の知識グラフの構造的拡張については，意味的型付け手法の精度の改良を目標として，新たな構造的特徴の抽出や，大規模言語モデルによる判定精度の向上，さらに本手法の応用として，文書から知識グラフへ枝として加えるべき三つ組みを発見する手法を開発する．

Research Products
(6 results)

All 2024 2023

All Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (5 results) (of which Int'l Joint Research: 3 results)

[Journal Article] Self-training involving semantic-space finetuning for semi-supervised multi-label document classification2024
- Author(s)
  Zhewei Xu, Mizuho Iwaihara
- Journal Title
  
  International Journal on Digital Libraries
  
  Volume: 25 Pages: 25-39
- DOI
  10.1007/s00799-023-00355-4
- Peer Reviewed
[Presentation] Empowering Zero-Shot Extreme Multi-Label Text Classification via Weighted Contrastive Learning and Semantic Label Augmentation2024
- Author(s)
  Zhao Yanan, Mizuho Iwaihara
- Organizer
  DEIM Forum 1a-3-4, Online, February 2024.
[Presentation] Evaluating the Performance of ChatGPT for Aspect-Based Sentiment Analysis2024
- Author(s)
  Yifei Wang, Mizuho Iwaihara
- Organizer
  DEIM Forum, T1-B-8-0-2, Online, February 2024.
[Presentation] Few-Shot Multi-Label Aspect Category Detection Utilizing Prototypical Network with Sentence-Level Weighting and Label Augmentation2023
- Author(s)
  Zeyu Wang and Mizuho Iwaihara
- Organizer
  Proc. 34th Int. Conf. on Database and Expert Systems Applications (DEXA2023), LNCS Vol.14147, pp.363-377, Aug. 2023.
- Int'l Joint Research
[Presentation] Enhancing Keyphrase Generation by BART Finetuning with Splitting and Shuffling2023
- Author(s)
  Bin CHEN and Mizuho IWAIHARA
- Organizer
  Proc. 20th Pacific Rim Int. Conf. on Artificial Intelligence (PRICAI23), Jakarta, Nov. 2023.
- Int'l Joint Research
[Presentation] SLHCat: Mapping Wikipedia Categories and Lists to DBpedia by Leveraging Semantic, Lexical, and Hierarchical Features2023
- Author(s)
  Zhaoyi WANG, Zhenyang ZHANG, Jiaxin QIN, Mizuho IWAIHARA
- Organizer
  Proc.25th Int.Conf.Asian Digital Libraries (ICDAL2023), LNCS Vol. 14457, pp.133-148, Dec. 2023.
- Int'l Joint Research

2023 Fiscal Year Research-status Report

Extending knowledge graph structures through deep text understanding

Principal Investigator

岩井原 瑞穂 早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (40253538)

Current Status of Research Progress

Reason

Research Products

[Journal Article] Self-training involving semantic-space finetuning for semi-supervised multi-label document classification2024

Author(s)

Journal Title

DOI

[Presentation] Empowering Zero-Shot Extreme Multi-Label Text Classification via Weighted Contrastive Learning and Semantic Label Augmentation2024

Author(s)

Organizer

[Presentation] Evaluating the Performance of ChatGPT for Aspect-Based Sentiment Analysis2024

Author(s)

Organizer

[Presentation] Few-Shot Multi-Label Aspect Category Detection Utilizing Prototypical Network with Sentence-Level Weighting and Label Augmentation2023

Author(s)

Organizer

[Presentation] Enhancing Keyphrase Generation by BART Finetuning with Splitting and Shuffling2023

Author(s)

Organizer

[Presentation] SLHCat: Mapping Wikipedia Categories and Lists to DBpedia by Leveraging Semantic, Lexical, and Hierarchical Features2023

Author(s)

Organizer

岩井原瑞穂早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (40253538)