Structural extension of knolwedge graph utilizing temporal and semantic analysis of social media

Research Project

Project/Area Number	19K11983
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 60080:Database-related
Research Institution	Waseda University
Principal Investigator	Iwaihara Mizuho 早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (40253538)
Project Period (FY)	2019-04-01 – 2022-03-31
Project Status	Completed (Fiscal Year 2021)
Budget Amount *help	¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2021: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000) Fiscal Year 2020: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000) Fiscal Year 2019: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000)
Keywords	データマイニング / テキストマイニング / 情報抽出 / 知識グラフ / 時系列分析 / ソーシャルメディア / 知識処理 / センチメント分析 / 知識抽出 / 時系列データ
Outline of Research at the Start	代表的な知識蓄積型ソーシャルメディアであるWikipediaは，リンク関係やカテゴリなど計算機利用が容易な構造的データを含んでいるため，記事の項目をノードとし項目間の関連を枝とする知識グラフが抽出され，検索や分類，自然言語処理の多様なタスク等に広く活用されている．本研究では，(1)知識グラフの構造的拡張， (1-a) 記事間のリンク予測および記事の分離統合予測問題，(1-b) Wikipediaリストの要素帰属問題およびテーブルスキーマ生成問題，(2)編集履歴からの特徴的語句の抽出，(3)ソーシャルメディアにおけるセンチメントの集約表現からなる研究課題に取り組む．
Outline of Final Research Achievements	Wikipedia is known as the largest social media collecting knowledge, from which knowledge graphs are extracted as computer-readable structured knowledge models. Knowledge graphs are utilized for search result enrichment and various natural language tasks. For developing high-quality knowledge graphs from Wikipedia, structured data such as lists and categories need to be utilized. In this research, we developed new methods for predicting Wikipedia article pairs that should be merged, and pairs that should have links. For extracting keyphrases from article texts, we developed a method utilizing pretrained language models, improving known records on this task. We also proposed new methods for authorship attribution on tweets, utilizing text sentiment.
Academic Significance and Societal Importance of the Research Achievements	ウェブからの有用な情報の抽出は，日々生成される膨大なデータを整理分類する基礎的段階を含む．テキスト分類は伝統的に多くの手法が提案されてきたが，新たな形態のテキストとして，Wikipediaの記事の階層的構造や，ツィートのハッシュタグ，さらにこれらの時系列的要素などの課題が出現している．一方，訓練済み学習モデルと呼ばれる深層学習を元にした手法が，従来手法を一変させつつある．本研究では，キーフレーズ抽出，リンク予測，階層的分類等の問題および知識グラフの応用について幅広く研究を行い，いくつかの問題では従来を上回る性能を示すことができた．

Report

(4 results)

2021 Annual Research Report Final Research Report ( PDF )
2020 Research-status Report
2019 Research-status Report

Research Products
(14 results)

All 2022 2021 2020 2019

All Journal Article (1 results) (of which Peer Reviewed: 1 results, Open Access: 1 results) Presentation (13 results) (of which Int'l Joint Research: 10 results)

[Journal Article] Detection of Mergeable Wikipedia Articles Utilizing Multiple Similarity Measures2020
- Author(s)
  Renzhi Wang, Mizuho Iwaihara
- Journal Title
  
  Journal of Information Processing
  
  Volume: 28 Issue: 0 Pages: 178-191
- DOI
  10.2197/ipsjjip.28.178
- NAID
  130007798617
- ISSN
  1882-6652
- Related Report
  2019 Research-status Report
- Peer Reviewed / Open Access
[Presentation] Keyphrase Generation by Utilizing BART Finetuning and BERT-Based Ranking2022
- Author(s)
  A Diya, Mizuho Iwaihara
- Organizer
  データ工学と情報マネジメントに関するフォーラム(DEIM Forum) G24-3, Online
- Related Report
  2021 Annual Research Report
[Presentation] Annotating Column Type Utilizing BERT and Knowledge Graph Over Wikipedia Categories and Lists2022
- Author(s)
  Qin Jiaxin, Mizuho Iwaihara
- Organizer
  データ工学と情報マネジメントに関するフォーラム(DEIM Forum) G33-1, Online
- Related Report
  2021 Annual Research Report
[Presentation] Column Type Detection Based on Pretrained Language Models with Various Column Encodings2022
- Author(s)
  Li Peining, Mizuho Iwaihara
- Organizer
  データ工学と情報マネジメントに関するフォーラム(DEIM Forum) E43-3, Online
- Related Report
  2021 Annual Research Report
[Presentation] Integrating RoBERTa Fine-Tuning and User Writing Styles for Authorship Attribution of Short Texts2021
- Author(s)
  Xiangyu Wang Mizuho Iwaihara
- Organizer
  Proc. 5th APWeb-WAIM Joing Conference on Web and Big Data (APWeb-WAIM 2021), LNCS 12858
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Link Prediction for Wikipedia Articles based on Temporal Article Embedding2021
- Author(s)
  Jiaji Ma, Mizuho Iwaihara
- Organizer
  Proc. 13th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1: KDIR
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Integrating Semantic Space Finetuning and Self-training for Semi-supervised Multi-label Text Classification2021
- Author(s)
  Zhewei Xu , Mizuho Iwaihara
- Organizer
  Proc. 23rd Int. Conf. Asia-Pacific Digital Libraries (ICADL21),online, LNCS Vol.13133
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Supervised Learning of Keyphrase Extraction Utilizing Prior Summarization2021
- Author(s)
  Tingyi Liu, Mizuho Iwaihara
- Organizer
  Proc. 23rd Int. Conf. Asia-Pacific Digital Libraries (ICADL21),online, LNCS Vol.13133
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] Contribution of Improved Character Embedding and Latent Posting Styles to Authorship Attribution of Short Texts2020
- Author(s)
  Wenjing Huang, Rui Su and Mizuho Iwaihara
- Organizer
  4th APWeb-WAIM Joint Conference on Web and Big Data (APWeb-WAIM 2020)
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] Utilizing BERT Pretrained Models with Various Fine-tune Methods for Subjectivity Detection2020
- Author(s)
  Hairong Huo and Mizuho Iwaihara
- Organizer
  4th APWeb-WAIM Joint Conference on Web and Big Data (APWeb-WAIM 2020)
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] Detection of Editing Bursts and Extraction of Significant Keyphrases from Wikipedia Edit History, In Big Data Analyses, Services, and Smart Data, Advances in Intelligent Systems and Computing book series2020
- Author(s)
  Zihang Chen, Mizuho Iwaihara
- Organizer
  Advances in Intelligent Systems and Computing book series
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] Weakly-Supervised Neural Categorization of Wikipedia Articles2019
- Author(s)
  Xingyu Chen, Mizuho Iwaihara
- Organizer
  Proc. ICADL2019, LNCS11853, pp. 16-22, Nov. 2019.
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Presentation] Two-Encoder Pointer-Generator Network for Summarizing Segments of Long Articles2019
- Author(s)
  Junhao Li and Mizuho Iwaihara
- Organizer
  The Asia Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conf. Web and Big Data (APWeb-WAIM 2019), LNCS 11641, pp. 299-313, Chengdu
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Presentation] Utilizing Latent Posting Style for Authorship Attribution on Short Texts2019
- Author(s)
  Patamawadee Leepaisomboon, Mizuho Iwaihara
- Organizer
  Proc. IEEE Int. Conf. Cloud and Big Data Computing (CBDCom 2019), pp.1015-1022, Fukuoka
- Related Report
  2019 Research-status Report
- Int'l Joint Research

Structural extension of knolwedge graph utilizing temporal and semantic analysis of social media

Principal Investigator

Iwaihara Mizuho 早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (40253538)

¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)

Report

Research Products

[Journal Article] Detection of Mergeable Wikipedia Articles Utilizing Multiple Similarity Measures2020

Author(s)

Journal Title

DOI

NAID

ISSN

Related Report

[Presentation] Keyphrase Generation by Utilizing BART Finetuning and BERT-Based Ranking2022

Author(s)

Organizer

Related Report

[Presentation] Annotating Column Type Utilizing BERT and Knowledge Graph Over Wikipedia Categories and Lists2022

Author(s)

Organizer

Related Report

[Presentation] Column Type Detection Based on Pretrained Language Models with Various Column Encodings2022

Author(s)

Organizer

Related Report

[Presentation] Integrating RoBERTa Fine-Tuning and User Writing Styles for Authorship Attribution of Short Texts2021

Author(s)

Organizer

Related Report

[Presentation] Link Prediction for Wikipedia Articles based on Temporal Article Embedding2021

Author(s)

Organizer

Related Report

[Presentation] Integrating Semantic Space Finetuning and Self-training for Semi-supervised Multi-label Text Classification2021

Author(s)

Organizer

Related Report

[Presentation] Supervised Learning of Keyphrase Extraction Utilizing Prior Summarization2021

Author(s)

Organizer

Related Report

[Presentation] Contribution of Improved Character Embedding and Latent Posting Styles to Authorship Attribution of Short Texts2020

Author(s)

Organizer

Related Report

[Presentation] Utilizing BERT Pretrained Models with Various Fine-tune Methods for Subjectivity Detection2020

Author(s)

Organizer

Related Report

[Presentation] Detection of Editing Bursts and Extraction of Significant Keyphrases from Wikipedia Edit History, In Big Data Analyses, Services, and Smart Data, Advances in Intelligent Systems and Computing book series2020

Author(s)

Organizer

Related Report

[Presentation] Weakly-Supervised Neural Categorization of Wikipedia Articles2019

Author(s)

Organizer

Related Report

[Presentation] Two-Encoder Pointer-Generator Network for Summarizing Segments of Long Articles2019

Author(s)

Organizer

Related Report

[Presentation] Utilizing Latent Posting Style for Authorship Attribution on Short Texts2019

Author(s)

Organizer

Related Report