• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Research and development of information extraction methods using word sense disambiguation and domain adaptation

Research Project

Project/Area Number 17KK0002
Research Category

Fund for the Promotion of Joint International Research (Fostering Joint International Research)

Allocation TypeMulti-year Fund
Research Field Intelligent informatics
Research InstitutionTokyo University of Agriculture and Technology (2021-2023)
Ibaraki University (2017-2020)

Principal Investigator

Komiya Kanako  東京農工大学, 工学(系)研究科(研究院), 准教授 (10592339)

Project Period (FY) 2018 – 2023
Project Status Completed (Fiscal Year 2023)
Budget Amount *help
¥10,140,000 (Direct Cost: ¥7,800,000、Indirect Cost: ¥2,340,000)
Keywords問題抽出 / アノテーション / 科学技術論文 / 語義曖昧性解消 / BERT / 「問題」 / 抽出 / 分散表現 / コーパス / コーパス作成 / 情報抽出 / 領域適応 / 人工知能 / 言語学
Outline of Final Research Achievements

We extracted the statements of "problems" (meaning something problematic, not tasks) from Japanese scientific and technical papers. We started by referring to a paper that did the same thing from English scientific and technical papers. However, because the expressions used to describe problems are complex in Japanese, we established annotation rules for the forms of expression and defined them linguistically. Following these rules, we also annotated whether or not the problem statements referred to by 'problem' in a sentence were included in that sentence, and conducted classification experiments using various methods.

Academic Significance and Societal Importance of the Research Achievements

英語と比較して、日本語の論文における問題内容の書かれ方について分析を行った。英語論文ではThe problem is X.の書かれ方で書かれている問題内容のみを扱っていたが、日本語では、コピュラ的な表現「Xが問題だ」以外にも修飾的な表現「Xという問題」のような表現が多くみられることが分かった。これらを踏まえて、問題内容のアノテーションルールを策定し、コーパスを作成した。この際に、問題内容は入れ子構造になっていることがあること、問題内容を示すのは、文のことも単語やフレーズのこともあること、指し示す問題内容の粒度にばらつきがあることなどを分析し、ルールに反映した。

Report

(7 results)
  • 2023 Annual Research Report   Final Research Report ( PDF )
  • 2022 Research-status Report
  • 2021 Research-status Report
  • 2020 Research-status Report
  • 2019 Research-status Report
  • 2018 Research-status Report
  • Research Products

    (44 results)

All 2023 2022 2021 2020 2019 2018

All Int'l Joint Research (1 results) Journal Article (6 results) (of which Peer Reviewed: 6 results,  Open Access: 6 results) Presentation (36 results) (of which Int'l Joint Research: 8 results,  Invited: 2 results) Book (1 results)

  • [Int'l Joint Research] University of Cambridge(英国)2018

    • Year and Date
      2018-05-01
    • Related Report
      2023 Annual Research Report
  • [Journal Article] Composing Word Embeddings for Compound Words Using Linguistic Knowledge2023

    • Author(s)
      Komiya Kanako、Kono Shinji、Seito Takumi、Hirabayashi Teruo
    • Journal Title

      ACM Transactions on Asian and Low-Resource Language Information Processing

      Volume: 22 Issue: 2 Pages: 1-22

    • DOI

      10.1145/3561299

    • Related Report
      2022 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Japanese Parsing Using Smaller BERT2022

    • Author(s)
      Shinji Kono、Komiya Kanako、Hiroyuki Shinnou
    • Journal Title

      Journal of Natural Language Processing

      Volume: 29 Issue: 3 Pages: 854-874

    • DOI

      10.5715/jnlp.29.854

    • ISSN
      1340-7619, 2185-8314
    • Related Report
      2022 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Diachronic Domain Adaptation of Word Sense Disambiguation in Corpus of Historical Japanese Using Word Embeddings2022

    • Author(s)
      古宮 嘉那子、田邊 絢、新納 浩幸
    • Journal Title

      国立国語研究所論集 = NINJAL Research Papers

      Volume: 23 Issue: 23 Pages: 59-73

    • DOI

      10.15084/00003566

    • ISSN
      2186-1358
    • URL

      https://repository.ninjal.ac.jp/records/3583

    • Related Report
      2022 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Morphological Analyzer Using the Bi-LSTM Model Only for Japanese Hiragana Sentences2022

    • Author(s)
      Jun Izutsu, Kanako Komiya
    • Journal Title

      International Journal on Natural Language Computing

      Volume: 11 Issue: 1 Pages: 29-45

    • DOI

      10.5121/ijnlc.2022.11103

    • Related Report
      2021 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Extracting Speech Patterns of Japanese Fictional Characters Using Subword Units2022

    • Author(s)
      Mika Kishino, Kanako Komiya
    • Journal Title

      International Journal on Natural Language Computing

      Volume: 11 Issue: 1 Pages: 1-14

    • DOI

      10.5121/ijnlc.2022.11101

    • Related Report
      2021 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Unsupervised All-words WSD Using Synonyms and Embeddings2019

    • Author(s)
      Suzuki Rui、Komiya Kanako、Asahara Masayuki、Sasaki Minoru、Shinnou Hiroyuki
    • Journal Title

      Journal of Natural Language Processing

      Volume: 26 Issue: 2 Pages: 361-379

    • DOI

      10.5715/jnlp.26.361

    • NAID

      130007706831

    • ISSN
      1340-7619, 2185-8314
    • Year and Date
      2019-06-15
    • Related Report
      2019 Research-status Report
    • Peer Reviewed / Open Access
  • [Presentation] All-Words Word Sense Disambiguation for Historical Japanese2023

    • Author(s)
      Shoma Asada, Kanako Komiya, and Masayuki Asahara
    • Organizer
      The 37th Pacific Asia Conference on Language, Information and Computation
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Word Segmentation of Hiragana Sentences Using Hiragana BERT2023

    • Author(s)
      Jun Izutsu, Kanako Komiya, and Hiroyuki Shinnou
    • Organizer
      PRICAI 2023 (LNCS)
    • Related Report
      2023 Annual Research Report
    • Int'l Joint Research
  • [Presentation] 『現代日本語書き言葉均衡コーパス』に対する分類語彙表番号悉皆付与2023

    • Author(s)
      浅田 宗磨,古宮嘉那子,浅原 正幸
    • Organizer
      言語処理学会第30回年次大会
    • Related Report
      2023 Annual Research Report
  • [Presentation] 翻訳とBabelNetを利用した日本語の語義曖昧性解消2023

    • Author(s)
      Ganbat Naranbuuvei,浅田宗磨,古宮嘉那子
    • Organizer
      言語処理学会第30回年次大会
    • Related Report
      2023 Annual Research Report
  • [Presentation] 語義曖昧性解消 コーパスへの意味タグの付与システム2023

    • Author(s)
      古宮嘉那子
    • Organizer
      語彙・辞書研究会 秋の研究会
    • Related Report
      2023 Annual Research Report
    • Invited
  • [Presentation] 近代以前の日本語を対象にした自然言語処理の紹介2023

    • Author(s)
      古宮嘉那子
    • Organizer
      日本語学会2023年度春季大会 シンポジウム「情報技術と大規模テキスト資源がひらく日本語史研究」(
    • Related Report
      2023 Annual Research Report
    • Invited
  • [Presentation] Word Sense Disambiguation of Corpus of Historical Japanese Using Japanese BERT Trained with Contemporary Texts2022

    • Author(s)
      Kanako Komiya, Nagi Oki and Masayuki Asahara
    • Organizer
      The 36th Pacific Asia Conference on Language, Information and Computation
    • Related Report
      2022 Research-status Report
    • Int'l Joint Research
  • [Presentation] T5を用いた古文から現代文への翻訳2022

    • Author(s)
      臼井久生, 古宮嘉那子
    • Organizer
      言語処理学会第29回年次大会
    • Related Report
      2022 Research-status Report
  • [Presentation] 日本語歴史コーパスのAll-words WSD2022

    • Author(s)
      浅田宗磨, 古宮嘉那子
    • Organizer
      言語処理学会第29回年次大会
    • Related Report
      2022 Research-status Report
  • [Presentation] T5 による特定キャラクター風発話への変換とその言語モデルの構築2022

    • Author(s)
      岸野望叶, 古宮嘉那子, 新納浩幸
    • Organizer
      第253回自然言語処理研究発表会
    • Related Report
      2022 Research-status Report
  • [Presentation] 科学技術論文における「問題」の周辺文からの問題内容の抽出2022

    • Author(s)
      平林照雄, 古宮嘉那子,浅原正幸
    • Organizer
      言語資源ワークショップ2022
    • Related Report
      2022 Research-status Report
  • [Presentation] Morphological Analysis of Japanese Hiragana Sentences Using the Bi-LSTM CRF Model2021

    • Author(s)
      Jun Izutsu, Kanako Komiya
    • Organizer
      10th International Conference on Natural Language Processing (NLP 2021)
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] Extraction of Linguistic Speech Patterns of Japanese Fictional Characters Using Subword Units2021

    • Author(s)
      Mika Kishino, Kanako Komiya
    • Organizer
      10th International Conference on Natural Language Processing (NLP 2021)
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] 現代文 BERT を利用した日本語歴史コーパスの語義曖昧性解消2021

    • Author(s)
      多喜 凪, 古宮嘉那子
    • Organizer
      言語処理学会第28回年次大会
    • Related Report
      2021 Research-status Report
  • [Presentation] BERTを用いた二つの辞書の対応付け2021

    • Author(s)
      河野稜斗, 平林照雄, 古宮嘉那子
    • Organizer
      言語処理学会第28回年次大会
    • Related Report
      2021 Research-status Report
  • [Presentation] 共学習によるレビュー文書からのネガティブな意見文の抽出2021

    • Author(s)
      三戸尚樹, 古宮嘉那子, 佐々木稔
    • Organizer
      言語処理学会第28回年次大会
    • Related Report
      2021 Research-status Report
  • [Presentation] レビューから抽出されたキーフレーズと感情スコアを用いた評判分析2021

    • Author(s)
      HUANG YIPU, 佐々木稔, 古宮嘉那子
    • Organizer
      言語処理学会第28回年次大会
    • Related Report
      2021 Research-status Report
  • [Presentation] 品詞情報を利用した複合語の分散表現の合成2021

    • Author(s)
      河野 慎司, 古宮嘉那子
    • Organizer
      音声言語および自然言語処理シンポジウム
    • Related Report
      2020 Research-status Report
  • [Presentation] Bi-LSTM CRF モデルを用いた平仮名文の形態素解析2021

    • Author(s)
      井筒順, 古宮嘉那子
    • Organizer
      言語処理学会第27回年次大会
    • Related Report
      2020 Research-status Report
  • [Presentation] 日本語の論文コーパスにおける「問題」の語義アノテーション2021

    • Author(s)
      平林照雄, 河野慎司, 古宮嘉那子, 新納浩幸
    • Organizer
      言語処理学会第27回年次大会
    • Related Report
      2020 Research-status Report
  • [Presentation] SentencePieceを用いたキャラクターの特徴語抽出2021

    • Author(s)
      岸野望叶, 古宮嘉那子
    • Organizer
      言語処理学会第27回年次大会
    • Related Report
      2020 Research-status Report
  • [Presentation] 論文の要旨からのタイトル生成におけるキーワードと分野別fine-tuningの効果2021

    • Author(s)
      金野佑太, 古宮嘉那子
    • Organizer
      言語処理学会第27回年次大会
    • Related Report
      2020 Research-status Report
  • [Presentation] SVMを用いたBCCWJにおける同形異音語の読み推定2021

    • Author(s)
      小林汰一郎, 古宮嘉那子
    • Organizer
      言語処理学会第27回年次大会
    • Related Report
      2020 Research-status Report
  • [Presentation] Automatic Creation of Correspondence Table of Meaning Tags from Two Dictionaries in One Language Using Bilingual Word Embedding2020

    • Author(s)
      Teruo Hirabayashi, Kanako Komiya, Masayuki Asahara and Hiroyuki Shinnou
    • Organizer
      13th BUCC Workshop at LREC 2020
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] Generation and Evaluation of Concept Embeddings Via Fine-Tuning Using Automatically Tagged Corpus2020

    • Author(s)
      Kanako Komiya, Daiki Yaginuma, Masayuki Asahara, Hiroyuki Shinnou
    • Organizer
      PACLIC 2020
    • Related Report
      2020 Research-status Report
  • [Presentation] Hiroyuki Shinnou,Composing Word Vectors for Japanese Compound Words Using Bilingual Word Embeddings2020

    • Author(s)
      Teruo Hirabayashi, Kanako Komiya, Masayuki Asahara
    • Organizer
      PACLIC 2020
    • Related Report
      2020 Research-status Report
  • [Presentation] Neural Machine Translation from Historical Japanese to Contemporary Japanese Using Diachronically Domain-Adapted Word Embeddings2020

    • Author(s)
      Masashi Takaku, Tosho Hirasawa, Mamoru Komachi, Kanako Komiya
    • Organizer
      PACLIC 2020
    • Related Report
      2020 Research-status Report
  • [Presentation] 複数の事前学習済みモデルを用いたQAサイト質問回答ペアの分類2020

    • Author(s)
      佐々木稔, 古宮嘉那子
    • Organizer
      IDRユーザフォーラム 2020
    • Related Report
      2020 Research-status Report
  • [Presentation] MeCab による平仮名のみの形態素解析2020

    • Author(s)
      井筒順, 明石陸, 加藤涼, 岸野望叶, 小林汰一郎, 金野佑太, 古宮嘉那子
    • Organizer
      言語処理学会第26回年次大会
    • Related Report
      2019 Research-status Report
  • [Presentation] マルチタスク学習を利用した短単位の分散表現から長単位の分散表現の合成2020

    • Author(s)
      河野慎司, 古宮嘉那子
    • Organizer
      言語処理学会第26回年次大会
    • Related Report
      2019 Research-status Report
  • [Presentation] 通時的な領域適応を行った単語分散表現を利用した古文から現代文へのニューラル機械翻訳2020

    • Author(s)
      高久雅史, 平澤寅庄, 小町守 , 古宮嘉那子
    • Organizer
      言語処理学会第26回年次大会
    • Related Report
      2019 Research-status Report
  • [Presentation] Bilingual Word Embeddingsによる短単位と長単位のアラインメント2020

    • Author(s)
      平林照雄, 古宮嘉那子, 新納浩幸
    • Organizer
      語処理学会第26回年次大会
    • Related Report
      2019 Research-status Report
  • [Presentation] Composing Word Vectors for Japanese Compound Words Using Dependency Relations2019

    • Author(s)
      Kanako Komiya, Takumi Seitou, Minoru Sasaki, Hiroyuki Shinnou
    • Organizer
      CICLING 2019
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] All-words WSDとfine-tuningを利用した分類語彙表の語義の分散表現の構築2019

    • Author(s)
      柳沼 大輝, 古宮 嘉那子, 新納 浩幸
    • Organizer
      言語資源活用ワークショップ 2019
    • Related Report
      2019 Research-status Report
  • [Presentation] 単語区切りの違いによるQAサイトの質問回答ペアの分類2019

    • Author(s)
      佐々木稔, 古宮嘉那子
    • Organizer
      IDRユーザフォーラム 2019
    • Related Report
      2019 Research-status Report
  • [Presentation] All-words Word Sense Disambiguation in Japanese2019

    • Author(s)
      Kanako Komiya
    • Organizer
      6th Annual Oxbridge Woman In Computer Science Conference 2019
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Book] 機械学習教本2019

    • Author(s)
      柴原 一友、築地 毅、古宮 嘉那子、宮武孝尚、小谷 善行
    • Total Pages
      240
    • Publisher
      森北出版
    • ISBN
      9784627854512
    • Related Report
      2019 Research-status Report

URL: 

Published: 2018-01-25   Modified: 2025-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi