• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Transfer Learning of Word Sense Disambiguation with Corpora Tagged with Multiple Tag Sets

Research Project

Project/Area Number 18K11421
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeMulti-year Fund
Section一般
Review Section Basic Section 61030:Intelligent informatics-related
Research InstitutionTokyo University of Agriculture and Technology (2021-2022)
Ibaraki University (2018-2020)

Principal Investigator

Komiya Kanako  東京農工大学, 工学(系)研究科(研究院), 准教授 (10592339)

Project Period (FY) 2018-04-01 – 2023-03-31
Project Status Completed (Fiscal Year 2022)
Budget Amount *help
¥4,420,000 (Direct Cost: ¥3,400,000、Indirect Cost: ¥1,020,000)
Fiscal Year 2020: ¥1,690,000 (Direct Cost: ¥1,300,000、Indirect Cost: ¥390,000)
Fiscal Year 2019: ¥1,820,000 (Direct Cost: ¥1,400,000、Indirect Cost: ¥420,000)
Fiscal Year 2018: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000)
Keywords語義曖昧性解消 / 分散表現 / 対応付け / 辞書 / 単語区切り / 複合語 / 古文 / BERT / コーパス / 教師なし / 疑似データ / バイリンガル分散表現 / 単語 / Fine Tuning / 語義 / 転移学習
Outline of Final Research Achievements

We conducted research on word sense disambiguation using corpora with multiple word sense tag sets.
First, we took the correspondence between two dictionaries using bilingual word embeddings and BERT. In addition, research on word sense disambiguation was conducted in historical texts with two tags, contemporary and historical tags. Furthermore, as the difference in tags sometimes come from the difference in word delimitation, we composed distributed representations of compound words from their constituent words using bilingual distributed representations and neural network multi-task learning. In addition, a related study, word segmentation in hiragana, was conducted.

Academic Significance and Societal Importance of the Research Achievements

科研費を申請した際にはまだBERTなどの事前学習モデルは存在しなかった。そのため、複数の異なったタグセットのコーパスを利用した「語義曖昧性解消」の研究を行う予定であった。しかし、BERTの出現により翻訳などの下段タスクの前処理としての語義曖昧性解消の意義は小さくなったと考え、辞書の対応付けの研究を行うこととした。また、事前学習モデルはタガーを提供し、語彙を限定しているため、単語区切りが異なる問題があることに気づいたため、複合語の分散表現の合成の研究と平仮名の単語分割の研究を行った。さらに、古文のような言語学的観点からは、語義を知ることに意味があると考え、古文の語義曖昧性解消の研究を行った。

Report

(6 results)
  • 2022 Annual Research Report   Final Research Report ( PDF )
  • 2021 Research-status Report
  • 2020 Research-status Report
  • 2019 Research-status Report
  • 2018 Research-status Report
  • Research Products

    (39 results)

All 2023 2022 2021 2020 2019 2018

All Journal Article (6 results) (of which Peer Reviewed: 6 results,  Open Access: 6 results) Presentation (32 results) (of which Int'l Joint Research: 10 results) Book (1 results)

  • [Journal Article] Composing Word Embeddings for Compound Words Using Linguistic Knowledge2023

    • Author(s)
      Komiya Kanako、Kono Shinji、Seito Takumi、Hirabayashi Teruo
    • Journal Title

      ACM Transactions on Asian and Low-Resource Language Information Processing

      Volume: 22 Issue: 2 Pages: 1-22

    • DOI

      10.1145/3561299

    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Japanese Parsing Using Smaller BERT2022

    • Author(s)
      Shinji Kono、Komiya Kanako、Hiroyuki Shinnou
    • Journal Title

      Journal of Natural Language Processing

      Volume: 29 Issue: 3 Pages: 854-874

    • DOI

      10.5715/jnlp.29.854

    • ISSN
      1340-7619, 2185-8314
    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Diachronic Domain Adaptation of Word Sense Disambiguation in Corpus of Historical Japanese Using Word Embeddings2022

    • Author(s)
      古宮 嘉那子、田邊 絢、新納 浩幸
    • Journal Title

      国立国語研究所論集 = NINJAL Research Papers

      Volume: 23 Issue: 23 Pages: 59-73

    • DOI

      10.15084/00003566

    • ISSN
      2186-1358
    • URL

      https://repository.ninjal.ac.jp/records/3583

    • Related Report
      2022 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Morphological Analyzer Using the Bi-LSTM Model Only for Japanese Hiragana Sentences2022

    • Author(s)
      Jun Izutsu, Kanako Komiya
    • Journal Title

      International Journal on Natural Language Computing

      Volume: 11 Issue: 1 Pages: 29-45

    • DOI

      10.5121/ijnlc.2022.11103

    • Related Report
      2021 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Extracting Speech Patterns of Japanese Fictional Characters Using Subword Units2022

    • Author(s)
      Mika Kishino, Kanako Komiya
    • Journal Title

      International Journal on Natural Language Computing

      Volume: 11 Issue: 1 Pages: 1-14

    • DOI

      10.5121/ijnlc.2022.11101

    • Related Report
      2021 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Unsupervised All-words WSD Using Synonyms and Embeddings2019

    • Author(s)
      Suzuki Rui、Komiya Kanako、Asahara Masayuki、Sasaki Minoru、Shinnou Hiroyuki
    • Journal Title

      Journal of Natural Language Processing

      Volume: 26 Issue: 2 Pages: 361-379

    • DOI

      10.5715/jnlp.26.361

    • NAID

      130007706831

    • ISSN
      1340-7619, 2185-8314
    • Year and Date
      2019-06-15
    • Related Report
      2019 Research-status Report
    • Peer Reviewed / Open Access
  • [Presentation] Word Sense Disambiguation of Corpus of Historical Japanese Using Japanese BERT Trained with Contemporary Texts2022

    • Author(s)
      Kanako Komiya, Nagi Oki and Masayuki Asahara
    • Organizer
      the 36th Pacific Asia Conference on Language, Information and Computation,
    • Related Report
      2022 Annual Research Report
    • Int'l Joint Research
  • [Presentation] 日本語歴史コーパスのAll-words WSD2022

    • Author(s)
      臼井久生, 古宮嘉那子
    • Organizer
      言語処理学会第29回年次大会
    • Related Report
      2022 Annual Research Report
  • [Presentation] 平仮名BERTを用いた平仮名文の分割2022

    • Author(s)
      井筒順, 古宮嘉那子, 新納浩幸
    • Organizer
      言語処理学会第29回年次大会
    • Related Report
      2022 Annual Research Report
  • [Presentation] 平仮名BERTによる平仮名文の分割2022

    • Author(s)
      井筒順, 古宮嘉那子, 新納浩幸
    • Organizer
      第253回自然言語処理研究発表会
    • Related Report
      2022 Annual Research Report
  • [Presentation] 科学技術論文における「問題」の周辺文からの問題内容の抽出2022

    • Author(s)
      平林照雄, 古宮嘉那子,浅原正幸
    • Organizer
      言語資源ワークショップ2022
    • Related Report
      2022 Annual Research Report
  • [Presentation] Morphological Analysis of Japanese Hiragana Sentences Using the Bi-LSTM CRF Model2021

    • Author(s)
      Jun Izutsu, Kanako Komiya
    • Organizer
      10th International Conference on Natural Language Processing (NLP 2021)
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] Extraction of Linguistic Speech Patterns of Japanese Fictional Characters Using Subword Units2021

    • Author(s)
      Mika Kishino, Kanako Komiya
    • Organizer
      10th International Conference on Natural Language Processing (NLP 2021)
    • Related Report
      2021 Research-status Report
    • Int'l Joint Research
  • [Presentation] 現代文 BERT を利用した日本語歴史コーパスの語義曖昧性解消2021

    • Author(s)
      多喜 凪, 古宮嘉那子
    • Organizer
      言語処理学会第28回年次大会
    • Related Report
      2021 Research-status Report
  • [Presentation] BERTを用いた二つの辞書の対応付け2021

    • Author(s)
      河野稜斗, 平林照雄, 古宮嘉那子
    • Organizer
      言語処理学会第28回年次大会
    • Related Report
      2021 Research-status Report
  • [Presentation] 共学習によるレビュー文書からのネガティブな意見文の抽出2021

    • Author(s)
      三戸尚樹, 古宮嘉那子, 佐々木稔
    • Organizer
      言語処理学会第28回年次大会
    • Related Report
      2021 Research-status Report
  • [Presentation] レビューから抽出されたキーフレーズと感情スコアを用いた評判分析2021

    • Author(s)
      HUANG YIPU, 佐々木稔, 古宮嘉那子
    • Organizer
      言語処理学会第28回年次大会
    • Related Report
      2021 Research-status Report
  • [Presentation] Bi-LSTM CRF モデルを用いた平仮名文の形態素解析2021

    • Author(s)
      井筒順, 古宮嘉那子
    • Organizer
      言語処理学会第27回年次大会
    • Related Report
      2020 Research-status Report
  • [Presentation] 日本語の論文コーパスにおける「問題」の語義アノテーション2021

    • Author(s)
      平林照雄, 河野慎司, 古宮嘉那子, 新納浩幸
    • Organizer
      言語処理学会第27回年次大会
    • Related Report
      2020 Research-status Report
  • [Presentation] SentencePieceを用いたキャラクターの特徴語抽出2021

    • Author(s)
      岸野望叶, 古宮嘉那子
    • Organizer
      言語処理学会第27回年次大会
    • Related Report
      2020 Research-status Report
  • [Presentation] 論文の要旨からのタイトル生成におけるキーワードと分野別fine-tuningの効果2021

    • Author(s)
      金野佑太, 古宮嘉那子
    • Organizer
      言語処理学会第27回年次大会
    • Related Report
      2020 Research-status Report
  • [Presentation] SVMを用いたBCCWJにおける同形異音語の読み推定2021

    • Author(s)
      小林汰一郎, 古宮嘉那子
    • Organizer
      言語処理学会第27回年次大会
    • Related Report
      2020 Research-status Report
  • [Presentation] Automatic Creation of Correspondence Table of Meaning Tags from Two Dictionaries in One Language Using Bilingual Word Embedding2020

    • Author(s)
      Teruo Hirabayashi, Kanako Komiya, Masayuki Asahara and Hiroyuki Shinnou
    • Organizer
      13th BUCC Workshop at LREC 2020
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] Generation and Evaluation of Concept Embeddings Via Fine-Tuning Using Automatically Tagged Corpus2020

    • Author(s)
      Kanako Komiya, Daiki Yaginuma, Masayuki Asahara, Hiroyuki Shinnou
    • Organizer
      PACLIC 2020
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] Composing Word Vectors for Japanese Compound Words Using Dependency Relations2020

    • Author(s)
      Teruo Hirabayashi, Kanako Komiya, Masayuki Asahara
    • Organizer
      PACLIC 2020
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] Neural Machine Translation from Historical Japanese to Contemporary Japanese Using Diachronically Domain-Adapted Word Embeddings2020

    • Author(s)
      Masashi Takaku, Tosho Hirasawa, Mamoru Komachi, Kanako Komiya
    • Organizer
      PACLIC 2020
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] 複数の事前学習済みモデルを用いたQAサイト質問回答ペアの分類2020

    • Author(s)
      佐々木稔, 古宮嘉那子
    • Organizer
      IDRユーザフォーラム 2020
    • Related Report
      2020 Research-status Report
  • [Presentation] 品詞情報を利用した複合語の分散表現の合成2020

    • Author(s)
      河野 慎司, 古宮嘉那子
    • Organizer
      音声言語および自然言語処理シンポジウム
    • Related Report
      2020 Research-status Report
  • [Presentation] MeCab による平仮名のみの形態素解析2020

    • Author(s)
      井筒順, 明石陸, 加藤涼, 岸野望叶, 小林汰一郎, 金野佑太, 古宮嘉那子
    • Organizer
      言語処理学会第26回年次大会
    • Related Report
      2019 Research-status Report
  • [Presentation] マルチタスク学習を利用した短単位の分散表現から長単位の分散表現の合成2020

    • Author(s)
      河野慎司, 古宮嘉那子
    • Organizer
      言語処理学会第26回年次大会
    • Related Report
      2019 Research-status Report
  • [Presentation] 通時的な領域適応を行った単語分散表現を利用した古文から現代文へのニューラル機械翻訳2020

    • Author(s)
      高久雅史, 平澤寅庄, 小町守 , 古宮嘉那子
    • Organizer
      言語処理学会第26回年次大会
    • Related Report
      2019 Research-status Report
  • [Presentation] Bilingual Word Embeddingsによる短単位と長単位のアラインメント2020

    • Author(s)
      平林照雄, 古宮嘉那子, 新納浩幸
    • Organizer
      語処理学会第26回年次大会
    • Related Report
      2019 Research-status Report
  • [Presentation] Composing Word Vectors for Japanese Compound Words Using Dependency Relations2019

    • Author(s)
      Kanako Komiya, Takumi Seitou, Minoru Sasaki, Hiroyuki Shinnou
    • Organizer
      CICLING 2019
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] All-words WSDとfine-tuningを利用した分類語彙表の語義の分散表現の構築2019

    • Author(s)
      柳沼 大輝, 古宮 嘉那子, 新納 浩幸
    • Organizer
      言語資源活用ワークショップ 2019
    • Related Report
      2019 Research-status Report
  • [Presentation] 単語区切りの違いによるQAサイトの質問回答ペアの分類2019

    • Author(s)
      佐々木稔, 古宮嘉那子
    • Organizer
      IDRユーザフォーラム 2019
    • Related Report
      2019 Research-status Report
  • [Presentation] Bilingual Word Embeddingsによる『岩波国語辞典』の語義と『分類語彙表』の語義の対応付け2019

    • Author(s)
      平林照雄, 古宮 嘉那子, 新納浩幸
    • Organizer
      言語処理学会第25回年次大会
    • Related Report
      2018 Research-status Report
  • [Presentation] Fine-tuning for Named Entity Recognition Using Part-of-Speech Tagging2018

    • Author(s)
      Masaya Suzuki, Kanako Komiya, Minoru Sasaki and Hiroyuki Shinnou
    • Organizer
      The 32nd Pacific Asia Conference on Language, Information and Computation
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Presentation] Detecting Unknown Word Senses in Contemporary Japanese Dictionary from Corpus of Historical Japanese2018

    • Author(s)
      Aya Tanabe, Kanako Komiya, Masayuki Asahara, Minoru Sasaki and Hiroyuki Shinnou
    • Organizer
      Japanese Association for Digital Humanities 2018
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Book] 機械学習教本2019

    • Author(s)
      柴原 一友、築地 毅、古宮 嘉那子、宮武孝尚、小谷 善行
    • Total Pages
      240
    • Publisher
      森北出版
    • ISBN
      9784627854512
    • Related Report
      2019 Research-status Report

URL: 

Published: 2018-04-23   Modified: 2024-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi