Extending knowledge graph structures through deep text understanding

Research Project

Project/Area Number	22K12044
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 60080:Database-related
Research Institution	Waseda University
Principal Investigator	岩井原瑞穂早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (40253538)
Project Period (FY)	2022-04-01 – 2025-03-31
Project Status	Granted (Fiscal Year 2023)
Budget Amount *help	¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2024: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2023: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2022: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Keywords	テキストマイニング / 情報抽出 / 深層学習 / 学習済み言語モデル / 知識グラフ / テキスト分類 / ソーシャルメディア / 知識処理
Outline of Research at the Start	Wikipediaなどの知識蓄積型コンテンツからは構造的データが知識グラフとして抽出され，検索結果の分類や種々の知識処理に活用されている．知識グラフを充実させるためには，Wikipediaやツィート，文書から新たな知識を抽出して知識グラフを拡張することが必要である．本研究では，(1)少量の訓練データのもとでの多ラベル文書分類タスク, (2)学習済み言語モデルを活用したキーフレーズ抽出・生成,(3)知識グラフの構造的拡張の３つのテーマについて，テキストおよびグラフ構造の深層分析に基づく新たな情報抽出手法を開発する．
Outline of Annual Research Achievements	Wikipediaなどの知識蓄積型コンテンツからは構造的データが知識グラフとして抽出され，検索結果の分類や種々の知識処理に活用されている．知識グラフを充実させるためには，Wikipediaやツィート，文書から新たな知識を抽出して知識グラフを拡張することが必要であり，そのためにはウェブコンテンツの構造情報やテキスト情報を統合的に分析する必要がある．本研究では，(1)少量の訓練データのもとでの多ラベル文書分類タスク, (2)学習済み言語モデルを活用したキーフレーズ抽出・生成,(3)知識グラフの構造的拡張の３つのテーマについて，テキストおよびグラフ構造の深層分析に基づく新たな情報抽出手法を開発するのが目的である．本年度は，(1)については，ラベル名に関連した特徴的な語句をマスク言語モデルにより発見して追加し，さらに注意機構により文単位の重要度を求める方法を開発した．これを文書が規定のアスペクトについて言及しているかを判定する多ラベル文書分類タスクに適用し，精度が向上すること示した． (2)のキーフレーズ抽出・生成は，文書に出現しているキーフレーズと出現していないキーフレーズにタスクを分けて，それぞれ別の生成型言語モデルを訓練し，さらに学習目標のキーフレーズの出現順序をシャッフルして順序への依存性を削減するなどの手法を組み合わせることにより，既知の結果を上回るキーフレーズ抽出・生成の性能を達成できることを示した． (3)の知識グラフの構造的拡張は，Wikipediaのリストとカテゴリーがどのような意味的な型を持つ要素からなるかを判定する問題について，少量の訓練用のデータを拡張するために，リストとカテゴリーの意味的・文法的・構造的な特徴から疑似ラベルを自動生成するルールを設計した．この疑似ラベルによる言語モデルの訓練により，意味的な型を求める精度を向上できることを示した．
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason 当初の計画通り，各研究課題について遂行することができた．
Strategy for Future Research Activity	(1)の少量の訓練データのもとでの文書分類タスクについては，prompt tuningという手法の改良をさらに進めるとともに，ラベルなし文書に予測結果を疑似ラベルとして与えて自己訓練を行う際の，新たに設計した閾値関数について評価を進めるとともに，文書間の類似性が訓練により変化する状況を可視化して効果を検証する． (2)のキーフレーズ抽出については，大規模言語モデルによるキーフレーズ生成を検討し，大規模言語モデルの訓練方法について改良を進める． (3)の知識グラフの構造的拡張については，意味的型付け手法の精度の改良を目標として，新たな構造的特徴の抽出や，大規模言語モデルによる判定精度の向上，さらに本手法の応用として，文書から知識グラフへ枝として加えるべき三つ組みを発見する手法を開発する．

Report

(2 results)

2023 Research-status Report
2022 Research-status Report

Research Products
(12 results)

All 2024 2023 2022

All Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (11 results) (of which Int'l Joint Research: 6 results)

[Journal Article] Self-training involving semantic-space finetuning for semi-supervised multi-label document classification2024
- Author(s)
  Zhewei Xu, Mizuho Iwaihara
- Journal Title
  
  International Journal on Digital Libraries
  
  Volume: 25 Issue: 1 Pages: 25-39
- DOI
  10.1007/s00799-023-00355-4
- Related Report
  2023 Research-status Report
- Peer Reviewed
[Presentation] Empowering Zero-Shot Extreme Multi-Label Text Classification via Weighted Contrastive Learning and Semantic Label Augmentation2024
- Author(s)
  Zhao Yanan, Mizuho Iwaihara
- Organizer
  DEIM Forum 1a-3-4, Online, February 2024.
- Related Report
  2023 Research-status Report
[Presentation] Evaluating the Performance of ChatGPT for Aspect-Based Sentiment Analysis2024
- Author(s)
  Yifei Wang, Mizuho Iwaihara
- Organizer
  DEIM Forum, T1-B-8-0-2, Online, February 2024.
- Related Report
  2023 Research-status Report
[Presentation] Few-Shot Multi-Label Aspect Category Detection Utilizing Prototypical Network with Sentence-Level Weighting and Label Augmentation2023
- Author(s)
  Zeyu Wang and Mizuho Iwaihara
- Organizer
  Proc. 34th Int. Conf. on Database and Expert Systems Applications (DEXA2023), LNCS Vol.14147, pp.363-377, Aug. 2023.
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] Enhancing Keyphrase Generation by BART Finetuning with Splitting and Shuffling2023
- Author(s)
  Bin CHEN and Mizuho IWAIHARA
- Organizer
  Proc. 20th Pacific Rim Int. Conf. on Artificial Intelligence (PRICAI23), Jakarta, Nov. 2023.
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] SLHCat: Mapping Wikipedia Categories and Lists to DBpedia by Leveraging Semantic, Lexical, and Hierarchical Features2023
- Author(s)
  Zhaoyi WANG, Zhenyang ZHANG, Jiaxin QIN, Mizuho IWAIHARA
- Organizer
  Proc.25th Int.Conf.Asian Digital Libraries (ICDAL2023), LNCS Vol. 14457, pp.133-148, Dec. 2023.
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] Mapping Wikipedia Categories and Lists to DBPedia Ontology Based on Structural and Semantic Features2023
- Author(s)
  Zhang Zhenyang, Wang Zhaoyi, Mizuho Iwaihara
- Organizer
  第15回データ工学と情報マネジメントに関するフォーラム
- Related Report
  2022 Research-status Report
[Presentation] Utilizing Keyphrase Generation and Semantic Similarity for Extreme Multi- Label Text Classification2023
- Author(s)
  Dai Xiangting, Mizuho Iwaihara
- Organizer
  第15回データ工学と情報マネジメントに関するフォーラム
- Related Report
  2022 Research-status Report
[Presentation] Efficient Summarization of Long Documents Using Hybrid Extractive-Abstractive Method2023
- Author(s)
  Chen Weichao, Mizuho Iwaihara
- Organizer
  第15回データ工学と情報マネジメントに関するフォーラム
- Related Report
  2022 Research-status Report
[Presentation] Capsule Network Over Pre-Trained Language Model and User Writing Styles for Authorship Attribution on Short Texts2022
- Author(s)
  Zeping Huang, Mizuho Iwaihara
- Organizer
  Proc. 2022 3rd International Conference on Control, Robotics and Intelligent System (CCRIS’22)
- Related Report
  2022 Research-status Report
- Int'l Joint Research
[Presentation] Extractive Summarization Utilizing Keyphrases by Finetuning BERT-Based Model2022
- Author(s)
  Xiaoye Wang, Mizuho Iwaihara
- Organizer
  Proc.24th Int.Conf.Asian Digital Libraries (ICDAL2022), LNCS Vol. 13636
- Related Report
  2022 Research-status Report
- Int'l Joint Research
[Presentation] Unsupervised Keyphrase Generation by Utilizing Masked Words Prediction and Pseudo-label BART Finetuning2022
- Author(s)
  Yingchao Ju and Mizuho Iwaihara
- Organizer
  Proc.24th Int.Conf.Asian Digital Libraries (ICDAL2022), LNCS Vol. 13636
- Related Report
  2022 Research-status Report
- Int'l Joint Research

Extending knowledge graph structures through deep text understanding

Principal Investigator

岩井原 瑞穂 早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (40253538)

¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)

Current Status of Research Progress

Reason

Report

Research Products

[Journal Article] Self-training involving semantic-space finetuning for semi-supervised multi-label document classification2024

Author(s)

Journal Title

DOI

Related Report

[Presentation] Empowering Zero-Shot Extreme Multi-Label Text Classification via Weighted Contrastive Learning and Semantic Label Augmentation2024

Author(s)

Organizer

Related Report

[Presentation] Evaluating the Performance of ChatGPT for Aspect-Based Sentiment Analysis2024

Author(s)

Organizer

Related Report

[Presentation] Few-Shot Multi-Label Aspect Category Detection Utilizing Prototypical Network with Sentence-Level Weighting and Label Augmentation2023

Author(s)

Organizer

Related Report

[Presentation] Enhancing Keyphrase Generation by BART Finetuning with Splitting and Shuffling2023

Author(s)

Organizer

Related Report

[Presentation] SLHCat: Mapping Wikipedia Categories and Lists to DBpedia by Leveraging Semantic, Lexical, and Hierarchical Features2023

Author(s)

Organizer

Related Report

[Presentation] Mapping Wikipedia Categories and Lists to DBPedia Ontology Based on Structural and Semantic Features2023

Author(s)

Organizer

Related Report

[Presentation] Utilizing Keyphrase Generation and Semantic Similarity for Extreme Multi- Label Text Classification2023

Author(s)

Organizer

Related Report

[Presentation] Efficient Summarization of Long Documents Using Hybrid Extractive-Abstractive Method2023

Author(s)

Organizer

Related Report

[Presentation] Capsule Network Over Pre-Trained Language Model and User Writing Styles for Authorship Attribution on Short Texts2022

Author(s)

Organizer

Related Report

[Presentation] Extractive Summarization Utilizing Keyphrases by Finetuning BERT-Based Model2022

Author(s)

Organizer

Related Report

[Presentation] Unsupervised Keyphrase Generation by Utilizing Masked Words Prediction and Pseudo-label BART Finetuning2022

Author(s)

Organizer

Related Report

岩井原瑞穂早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (40253538)