Study on Improving Performance of Natural Language Processing by Integrating Collocation Extraction and Deep Learning

Research Project

Project/Area Number	19K20333
Research Category	Grant-in-Aid for Early-Career Scientists
Allocation Type	Multi-year Fund
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	University of Tsukuba
Principal Investigator	Wakabayashi Kei 筑波大学, 図書館情報メディア系, 准教授 (40631908)
Project Period (FY)	2019-04-01 – 2023-03-31
Project Status	Completed (Fiscal Year 2022)
Budget Amount *help	¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2021: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000) Fiscal Year 2020: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000) Fiscal Year 2019: ¥1,950,000 (Direct Cost: ¥1,500,000、Indirect Cost: ¥450,000)
Keywords	連語抽出 / 深層学習 / 能動学習 / 隠れマルコフモデル / 文書要約 / 対話システム / トピックモデル / クラウドソーシング / 自然言語教示 / 自然言語処理 / 遠距離教師あり学習 / 連語
Outline of Research at the Start	機械翻訳における翻訳ミスや，対話システムにおける見当違いの対話応答，質問応答における質問意図の取り違えなど，自然言語処理アプリケーションの様々なエラーに共通する大きな原因に，複数の単語で特定の意味を持つ連語(collocation)の考慮ができていないという点がある．本研究では，文章から意味がありそうな単語の並びを抽出する連語抽出手法と，抽出された連語を精度向上に活かす連語を考慮した深層学習手法を発展・融合させることで，様々な自然言語処理アプリケーションの精度を改善する基盤技術を構築する．
Outline of Final Research Achievements	In this study, we addressed the following three research questions. (A) To extract meaningful collocations from text with higher accuracy, we proposed a new method for training collocation extraction models by using linguistic resources and human annotator resources efficiently, and advanced a basic theory of statistical models used for collocation extraction. (B) We proposed a deep learning method that uses the extracted collocations to improve the accuracy of natural language processing applications, which are namely document summarization, language understanding in dialog systems, and topic modeling. (C) We proposed a method for dynamically extracting collocations that contribute to improving the accuracy of later-stage natural language processing tasks during the training of deep learning models for those tasks.
Academic Significance and Societal Importance of the Research Achievements	複数の単語で特定の意味を持つ連語を考慮することは，多くの自然言語処理のアプリケーションの精度を向上させるために重要な課題である．しかし，連語抽出手法の性質が，後段の自然言語処理タスクを学習する深層学習手法に与える影響については，これまで明らかにされてこなかった．本研究成果の意義は，連語の抽出と深層学習による自然言語処理タスクの精度向上を結びつける方法論を示し，その効果を明らかにしたことにある．とりわけ，連語を明示的に分析結果として提示するトピックモデリングや対話システムの言語理解タスクにおいて，直接的に応用可能な研究成果が得られたと考える．

Report

(5 results)

2022 Annual Research Report Final Research Report ( PDF )
2021 Research-status Report
2020 Research-status Report
2019 Research-status Report

Research Products
(30 results)

All 2023 2022 2021 2020 2019

All Journal Article (18 results) (of which Peer Reviewed: 18 results, Open Access: 11 results) Presentation (12 results)

[Journal Article] Keyphrase-based Refinement Functions for Efficient Improvement on Document-Topic Association in Human-in-the-Loop Topic Models2023
- Author(s)
  Muhammad Haseeb UR Rehman Khan、Kei Wakabayashi
- Journal Title
  
  情報処理学会論文誌データベース（TOD）
  
  Volume: 16
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Robust Slot Filling Modeling for Incomplete Annotations using Segmentation-Based Formulation2022
- Author(s)
  Wakabayashi Kei、Takeuchi Johane、Nakano Mikio
- Journal Title
  
  Transactions of the Japanese Society for Artificial Intelligence
  
  Volume: 37 Issue: 3 Pages: IDS-E_1-12
- DOI
  10.1527/tjsai.37-3_IDS-E
- ISSN
  1346-0714, 1346-8030
- Year and Date
  2022-05-01
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Effect of Label Redundancy in Crowdsourcing for Training Machine Learning Models2022
- Author(s)
  Shimizu Ayame、Wakabayash Kei
- Journal Title
  
  Journal of Data Intelligence
  
  Volume: 3 Issue: 3 Pages: 301-315
- DOI
  10.26421/jdi3.3-1
- Related Report
  2022 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Effect of Label Redundancy in Crowdsourcing for Training Machine Learning Models2022
- Author(s)
  Ayame Shimizu, Kei Wakabayashi
- Journal Title
  
  Journal of Data Intelligence
  
  Volume: 3
- Related Report
  2021 Research-status Report
- Peer Reviewed
[Journal Article] Robust Slot Filling Modeling for Incomplete Annotations using Segmentation-Based Formulation2022
- Author(s)
  Kei Wakabayashi, Johane Takeuchi, Mikio Nakano
- Journal Title
  
  人工知能学会論文誌
  
  Volume: 37
- Related Report
  2021 Research-status Report
- Peer Reviewed
[Journal Article] Efficient Training Method for Phrase Extraction Models using Natural Language Explanations2021
- Author(s)
  Ryosuke Saito, Koga Kobayashi, Kei Wakabayashi
- Journal Title
  
  Proceedings of the 23rd International Conference on Information Integration and Web Intelligence
  
  Volume: - Pages: 288-295
- DOI
  10.1145/3487664.3487703
- Related Report
  2021 Research-status Report
- Peer Reviewed
[Journal Article] Active Learning for Extracting Technical Terms Covering Multiword Phrases2021
- Author(s)
  Fumimaro Odakura, Koga Kobayashi, Kei Wakabayashi
- Journal Title
  
  Proceedings of the 23rd International Conference on Information Integration and Web Intelligence
  
  Volume: - Pages: 311-318
- DOI
  10.1145/3487664.3487706
- Related Report
  2021 Research-status Report
- Peer Reviewed
[Journal Article] Examining Effect of Label Redundancy for Machine Learning using Crowdsourcing2021
- Author(s)
  Ayame Shimizu, Kei Wakabayashi
- Journal Title
  
  Proceedings of the 23rd International Conference on Information Integration and Web Intelligence
  
  Volume: - Pages: 87-94
- DOI
  10.1145/3487664.3487677
- Related Report
  2021 Research-status Report
- Peer Reviewed
[Journal Article] Segmentation-Based Formulation of Slot Filling Task for Better Generative Modeling2021
- Author(s)
  Kei Wakabayashi, Johane Takeuchi, Mikio Nakano
- Journal Title
  
  Proceedings of the 12th International Workshop on Spoken Dialog System Technology
  
  Volume: -
- Related Report
  2021 Research-status Report
- Peer Reviewed
[Journal Article] Drifting and Popularity: A Study of Time Series Analysis of Topics2021
- Author(s)
  Muhammad Haseeb UR Rehman Khan, Kei Wakabayashi
- Journal Title
  
  Proceedings of the Seventh International Conference on Big Data, Small Data, Linked Data and Open Data
  
  Volume: - Pages: 16-22
- Related Report
  2021 Research-status Report 2020 Research-status Report
- Peer Reviewed
[Journal Article] Partial Annotation Scheme for Active Learning on Named Entity Recognition Tasks2020
- Author(s)
  Koga Kobayashi, Kei Wakabayashi
- Journal Title
  
  Journal of Data Intelligence
  
  Volume: 1 Issue: 3 Pages: 319-332
- DOI
  10.26421/jdi1.3-2
- Related Report
  2020 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Batch Prioritization of Data Labeling Tasks for Training Classifiers2020
- Author(s)
  Masanori Kimura, Kei Wakabayashi, Atsuyuki Morishima
- Journal Title
  
  Proceedings of the 8th AAAI Conference on Human Computation and Crowdsourcing
  
  Volume: - Pages: 163-167
- Related Report
  2020 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Effect of Semantic Content Generalization on Pointer Generator Network in Text Summarization2020
- Author(s)
  Yixuan Wu, Kei Wakabayashi
- Journal Title
  
  Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services
  
  Volume: - Pages: 72-76
- DOI
  10.1145/3428757.3429118
- Related Report
  2020 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Mitigating Effect of Dictionary Matching Errors in Distantly Supervised Named Entity Recognition2020
- Author(s)
  Koga Kobayashi, Kei Wakabayashi
- Journal Title
  
  Proceedings of the 22nd International Conference on Information Integration and Web-based Applications & Services
  
  Volume: - Pages: 111-114
- DOI
  10.1145/3428757.3429142
- Related Report
  2020 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Silent HMMs: Generalized Representation of Hidden Semi-Markov Models and Hierarchical HMMs2019
- Author(s)
  Kei Wakabayashi
- Journal Title
  
  Proceedings of the 14th International Conference on Finite State Methods and Natural Language Processing
  
  Volume: - Pages: 98-107
- DOI
  10.18653/v1/w19-3113
- Related Report
  2019 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Named entity recognition using point prediction and active learning2019
- Author(s)
  Koga Kobayashi, Kei Wakabayashi
- Journal Title
  
  Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services
  
  Volume: - Pages: 287-295
- DOI
  10.1145/3366030.3366072
- Related Report
  2019 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Events Insights Extraction from Twitter Using LDA and Day-Hashtag Pooling2019
- Author(s)
  Muhammad Haseeb Ur Rehman Khan, Kei Wakabayashi, Satoshi Fukuyama
- Journal Title
  
  Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services
  
  Volume: - Pages: 240-244
- DOI
  10.1145/3366030.3366090
- Related Report
  2019 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Estimation Method of L2 Learners' Second Language Ability by using Features in Conversation2019
- Author(s)
  Xinnan Chen, Muhammad Haseeb Ur Rehman Khan, Kei Wakabayashi
- Journal Title
  
  Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services
  
  Volume: - Pages: 142-150
- DOI
  10.1145/3366030.3366037
- Related Report
  2019 Research-status Report
- Peer Reviewed / Open Access
[Presentation] 動作を表す言語を生成する深層学習における共同注意の有効性2023
- Author(s)
  小田倉史麿, 若林啓
- Organizer
  第３７回人工知能学会全国大会
- Related Report
  2022 Annual Research Report
[Presentation] 後段タスクの精度向上のためのマルチレベルな分かち書きの最適化2022
- Author(s)
  小田倉史麿, 若林啓
- Organizer
  第３６回人工知能学会全国大会
- Related Report
  2022 Annual Research Report 2021 Research-status Report
[Presentation] Topic Modeling using Jointly Fine-tuned BERT for Phrases and Sentences2022
- Author(s)
  Zikai Zhou and Kei Wakabayashi
- Organizer
  第14回データ工学と情報マネジメントに関するフォーラム
- Related Report
  2021 Research-status Report
[Presentation] 自然言語教示によるフレーズ抽出器の学習に関する研究2021
- Author(s)
  齊藤亮将, 小林滉河, 若林啓
- Organizer
  第13回データ工学と情報マネジメントに関するフォーラム
- Related Report
  2020 Research-status Report
[Presentation] 能動学習による複合語を考慮した専門用語抽出2021
- Author(s)
  小田倉史麿, 小林滉河, 若林啓
- Organizer
  第13回データ工学と情報マネジメントに関するフォーラム
- Related Report
  2020 Research-status Report
[Presentation] クラウドソーシングによる訓練データセット構築における最適な冗長度の検証2021
- Author(s)
  清水綾女, 若林啓
- Organizer
  第13回データ工学と情報マネジメントに関するフォーラム
- Related Report
  2020 Research-status Report
[Presentation] ゼロショット文書分類向けの情報源領域から学習データの選択手法2021
- Author(s)
  大畑直輝, 白井匡人, 若林啓, 劉健全
- Organizer
  第13回データ工学と情報マネジメントに関するフォーラム
- Related Report
  2020 Research-status Report
[Presentation] 遠距離教師あり固有表現抽出における辞書マッチの誤りの考慮2020
- Author(s)
  小林滉河, 若林啓
- Organizer
  言語処理学会第２６回年次大会
- Related Report
  2019 Research-status Report
[Presentation] Effect of Semantic Content Generalization on Pointer Generator Network in Text Summarization2020
- Author(s)
  Wu Yixuan, 若林啓
- Organizer
  言語処理学会第２６回年次大会
- Related Report
  2019 Research-status Report
[Presentation] 変分ベイズにおける最適解探索効率の検証2020
- Author(s)
  岡威久馬, 若林啓
- Organizer
  第12回データ工学と情報マネジメントに関するフォーラム
- Related Report
  2019 Research-status Report
[Presentation] 文書要約における転移学習のための文書選択手法の提案2020
- Author(s)
  白井匡人, 若林啓
- Organizer
  第12回データ工学と情報マネジメントに関するフォーラム
- Related Report
  2019 Research-status Report
[Presentation] Twitterユーザのリツイート情報を用いたトピックの可視化2020
- Author(s)
  清水綾女, 若林啓, 佐藤哲司
- Organizer
  第12回データ工学と情報マネジメントに関するフォーラム
- Related Report
  2019 Research-status Report

Study on Improving Performance of Natural Language Processing by Integrating Collocation Extraction and Deep Learning

Principal Investigator

Wakabayashi Kei 筑波大学, 図書館情報メディア系, 准教授 (40631908)

¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)

Report

Research Products

[Journal Article] Keyphrase-based Refinement Functions for Efficient Improvement on Document-Topic Association in Human-in-the-Loop Topic Models2023

Author(s)

Journal Title

Related Report

[Journal Article] Robust Slot Filling Modeling for Incomplete Annotations using Segmentation-Based Formulation2022

Author(s)

Journal Title

DOI

ISSN

Year and Date

Related Report

[Journal Article] Effect of Label Redundancy in Crowdsourcing for Training Machine Learning Models2022

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Effect of Label Redundancy in Crowdsourcing for Training Machine Learning Models2022

Author(s)

Journal Title

Related Report

[Journal Article] Robust Slot Filling Modeling for Incomplete Annotations using Segmentation-Based Formulation2022

Author(s)

Journal Title

Related Report

[Journal Article] Efficient Training Method for Phrase Extraction Models using Natural Language Explanations2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Active Learning for Extracting Technical Terms Covering Multiword Phrases2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Examining Effect of Label Redundancy for Machine Learning using Crowdsourcing2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Segmentation-Based Formulation of Slot Filling Task for Better Generative Modeling2021

Author(s)

Journal Title

Related Report

[Journal Article] Drifting and Popularity: A Study of Time Series Analysis of Topics2021

Author(s)

Journal Title

Related Report

[Journal Article] Partial Annotation Scheme for Active Learning on Named Entity Recognition Tasks2020

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Batch Prioritization of Data Labeling Tasks for Training Classifiers2020

Author(s)

Journal Title

Related Report

[Journal Article] Effect of Semantic Content Generalization on Pointer Generator Network in Text Summarization2020

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Mitigating Effect of Dictionary Matching Errors in Distantly Supervised Named Entity Recognition2020

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Silent HMMs: Generalized Representation of Hidden Semi-Markov Models and Hierarchical HMMs2019

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Named entity recognition using point prediction and active learning2019

Author(s)

Journal Title