Compositionality and Interpretation of Word Embeddings

Research Project

Project/Area Number	19K12099
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	Tokyo Metropolitan University
Principal Investigator	Komachi Mamoru 東京都立大学, システムデザイン研究科, 教授 (60581329)
Project Period (FY)	2019-04-01 – 2022-03-31
Project Status	Completed (Fiscal Year 2021)
Budget Amount *help	¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000) Fiscal Year 2021: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000) Fiscal Year 2020: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000) Fiscal Year 2019: ¥2,990,000 (Direct Cost: ¥2,300,000、Indirect Cost: ¥690,000)
Keywords	単語分散表現 / 構成性 / 機械翻訳 / 文法誤り訂正 / 意味変化 / 深層学習 / 自然言語処理 / 文法誤り検出 / 機械学習 / 分散表現
Outline of Research at the Start	本研究は、自然言語処理における単語分散表現の学習において、意味の構成性がどのように実現されていて文の意味表現を計算できるのかについて、情報理論的観点から研究する。意味を構成する最小の単位は形態素と言われているが、文の意味の計算に必要な構成要素が何であるかは明らかではない。そこで、本研究は形態素より小さい単位で意味を構成する要素の探求と、それらを用いて文の意味を計算する技術の確立を目指す。
Outline of Final Research Achievements	In this research, we studied methods for composing distributed representation of words from smaller units in word representation learning in natural language processing. Specifically, focusing on machine translation, we explored the optimal granularity of input for learning distributed representation of words in Japanese-Chinese translation. We also clarified what kind of knowledge is transferable across languages such as Japanese, English, German, and Russian for grammatical error correction. In addition, we addressed the interpretation of word representations, and proposed a highly interpretable method for learning word representations to capture diachronic semantic change, employing an approach with an information-theoretic background.
Academic Significance and Societal Importance of the Research Achievements	本研究の成果は、日本語や中国語のような表意文字を用いる言語は、文字よりも細かい単位で意味を捉える方が適切であるという可能性を示唆している点にあります。世界的には英語に代表されるような少数のアルファベットを用いる言語が広く研究されていますが、そのような言語で提案されている手法が日本語や中国語では必ずしも最適な手法ではない、ということを意味します。深層学習の登場により多言語を同時に扱うことのできる手法がさまざま提案されていますが、それぞれの言語の特徴も考慮することの重要性を改めて示しています。

Report

(4 results)

2021 Annual Research Report Final Research Report ( PDF )
2020 Research-status Report
2019 Research-status Report

Research Products
(25 results)

All 2022 2021 2020 2019 Other

All Int'l Joint Research (2 results) Journal Article (6 results) (of which Peer Reviewed: 6 results, Open Access: 6 results) Presentation (17 results) (of which Int'l Joint Research: 17 results)

[Int'l Joint Research] IT University of Copenhagen/University of Groningen(デンマーク)
- Related Report
  2020 Research-status Report
[Int'l Joint Research] リバプール大学(英国)
- Related Report
  2019 Research-status Report
[Journal Article] 言語間での転移学習のための事前学習モデルと多言語の学習者データを用いた文法誤り訂正2022
- Author(s)
  山下郁海, 金子正弘, 三田雅人, 勝又智, Imankulova Aizhan, 小町守
- Journal Title
  
  自然言語処理
  
  Volume: 29
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Using Sub-character Level Information for Neural Machine Translation of Logographic Languages2021
- Author(s)
  Zhang Longtu and Komachi Mamoru
- Journal Title
  
  ACM Transactions on Asian and Low-Resource Language Information Processing
  
  Volume: 20 Issue: 2 Pages: 1-15
- DOI
  10.1145/3431727
- Related Report
  2021 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] 文法誤り訂正の参照文を用いない自動評価への最適化2021
- Author(s)
  吉村綾馬, 金子正弘, 梶原智之, 小町守
- Journal Title
  
  自然言語処理
  
  Volume: 28
- Related Report
  2020 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Using Sub-Character Level Information for Neural Machine Translation of Logographic Languages2021
- Author(s)
  Longtu Zhang and Mamoru Komachi
- Journal Title
  
  ACM Transaction on Asian and Low-Resource Language Information Processing
  
  Volume: -
- Related Report
  2020 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] Multi-Head Multi-Layer Attention to Deep Language Representations for Grammatical Error Detection2019
- Author(s)
  Masahiro Kaneko and Mamoru Komachi
- Journal Title
  
  Computacion y Sistemas
  
  Volume: 23 Pages: 883-391
- Related Report
  2019 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] 事前学習された文の分散表現を用いた機械翻訳の自動評価2019
- Author(s)
  嶋中宏希, 梶原智之, 小町守
- Journal Title
  
  自然言語処理
  
  Volume: 26 Pages: 613-634
- NAID
  130007761392
- Related Report
  2019 Research-status Report
- Peer Reviewed / Open Access
[Presentation] Analyzing Semantic Changes in Japanese Words Using BERT2021
- Author(s)
  Kazuma Kobayashi, Taichi Aida and Mamoru Komachi
- Organizer
  35th Pacific Asia Conference on Language, Information and Computation (PACLIC 2021)
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] A Comprehensive Analysis of PMI-based Models for Measuring Semantic Differences2021
- Author(s)
  Taichi Aida, Mamoru Komachi, Toshinobu Ogiso, Hiroya Takamura, Daichi Mochihashi
- Organizer
  35th Pacific Asia Conference on Language, Information and Computation (PACLIC 2021)
- Related Report
  2021 Annual Research Report
- Int'l Joint Research
[Presentation] From Masked-Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken Language Understanding2021
- Author(s)
  Rob van der Goot (IT University of Copenhagen), Marija Stepanovic (IT University of Copenhagen), Alan Ramponi (IT University of Copenhagen), Ibrahim Sharaf, Ahmet Ustun (University of Groningen), Aizhan Imankulova, Siti Oryza Khairunnisa, Mamoru Komachi and Barbara Plank (IT University of Copenhagen)
- Organizer
  2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2021)
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] SOME: Reference-less Sub-Metrics Optimized for Manual Evaluations of Grammatical Error Correction2020
- Author(s)
  Ryoma Yoshimura, Masahiro Kaneko, Tomoyuki Kajiwara (Osaka University) and Mamoru Komachi
- Organizer
  8th International Conference on Computational Linguistics (COLING)
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] Cross-lingual Transfer Learning for Grammatical Error Correction2020
- Author(s)
  Ikumi Yamashita, Satoru Katsumata, Masahiro Kaneko, Aizhan Imankulova and Mamoru Komachi
- Organizer
  28th International Conference on Computational Linguistics (COLING)
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] Chinese Grammatical Correction Using BERT-based Pre-trained Model2020
- Author(s)
  Hongfei Wang, Michiki Kurosawa, Satoru Katsumata and Mamoru Komachi
- Organizer
  1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing (AACL-IJCNLP)
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] Stronger Baselines for Grammatical Error Correction Using a Pretrained Encoder-Decoder Model2020
- Author(s)
  Satoru Katsumata and Mamoru Komachi
- Organizer
  1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing (AACL-IJCNLP)
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] Non-Autoregressive Grammatical Error Correction Towards a Writing Support System2020
- Author(s)
  Hiroki Homma and Mamoru Komachi
- Organizer
  6th Workshop on Natural Language Processing Techniques for Educational Application (NLP-TEA)
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] Zero-shot North Korean to English Neural Machine Translation by Character Tokenization and Phoneme Decomposition2020
- Author(s)
  Hwichan Kim, Tosho Hirasawa and Mamoru Komachi
- Organizer
  58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop (ACL 2020 SRW)
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] Automated Essay Scoring System for Nonnative Japanese Learners2020
- Author(s)
  Reo Hirao, Mio Arai, Hiroki Shimanaka, Satoru Katsumata and Mamoru Komachi
- Organizer
  12th International Conference on Language Resources and Evaluation (LREC 2020)
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] Korean to Japanese Neural Machine Translation System Using Hanja Information2020
- Author(s)
  Hwichan Kim, Tosho Hirasawa and Mamoru Komachi
- Organizer
  7th Workshop on Asian Translation (WAT)
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] TMU System Using BERT-based Pre-trained Model to the NLP-TEA CGED Shared Task 20202020
- Author(s)
  Hongfei Wang and Mamoru Komachi
- Organizer
  6th Workshop on Natural Language Processing (NLP-TEA)
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] TMUOU submission for WMT20 Quality Estimation Shared Task2020
- Author(s)
  Akifumi Nakamachi (Osaka University), Hiroki Shimanaka, Tomoyuki Kajiwara (Osaka University) and Mamoru Komachi
- Organizer
  Fifth Conference on Machine Translation (WMT 2020)
- Related Report
  2020 Research-status Report
- Int'l Joint Research
[Presentation] Automated Essay Scoring System for Nonnative Japanese Learners2020
- Author(s)
  Reo Hirao, Mio Arai, Hiroki Shimanaka, Satoru Katsumata and Mamoru Komachi
- Organizer
  12th International Conference on Language Resources and Evaluation
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Presentation] Zero-shot North Korean to English Neural Machine Translation by Character Tokenization and Phoneme Decomposition2020
- Author(s)
  Hwichan Kim, Tosho Hirasawa and Mamoru Komachi
- Organizer
  ACL 2020 Student Research Workshop
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Presentation] Chinese--Japanese Unsupervised Neural Machine Translation Using Sub-character Level Information2019
- Author(s)
  Longtu Zhang and Mamoru Komachi
- Organizer
  The 33rd Pacific Asia Conference on Language, Information and Computation
- Related Report
  2019 Research-status Report
- Int'l Joint Research
[Presentation] Debiasing Word Embeddings Improves Multimodal Machine Translation2019
- Author(s)
  Tosho Hirasawa and Mamoru Komachi
- Organizer
  17th Machine Translation Summit
- Related Report
  2019 Research-status Report
- Int'l Joint Research

Compositionality and Interpretation of Word Embeddings

Principal Investigator

Komachi Mamoru 東京都立大学, システムデザイン研究科, 教授 (60581329)

¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)

Report

Research Products

[Int'l Joint Research] IT University of Copenhagen/University of Groningen(デンマーク)

Related Report

[Int'l Joint Research] リバプール大学(英国)

Related Report

[Journal Article] 言語間での転移学習のための事前学習モデルと多言語の学習者データを用いた文法誤り訂正2022

Author(s)

Journal Title

Related Report

[Journal Article] Using Sub-character Level Information for Neural Machine Translation of Logographic Languages2021

Author(s)

Journal Title

DOI

Related Report

[Journal Article] 文法誤り訂正の参照文を用いない自動評価への最適化2021

Author(s)

Journal Title

Related Report

[Journal Article] Using Sub-Character Level Information for Neural Machine Translation of Logographic Languages2021

Author(s)

Journal Title

Related Report

[Journal Article] Multi-Head Multi-Layer Attention to Deep Language Representations for Grammatical Error Detection2019

Author(s)

Journal Title

Related Report

[Journal Article] 事前学習された文の分散表現を用いた機械翻訳の自動評価2019

Author(s)

Journal Title

NAID

Related Report

[Presentation] Analyzing Semantic Changes in Japanese Words Using BERT2021

Author(s)

Organizer

Related Report

[Presentation] A Comprehensive Analysis of PMI-based Models for Measuring Semantic Differences2021

Author(s)

Organizer

Related Report

[Presentation] From Masked-Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken Language Understanding2021

Author(s)

Organizer

Related Report

[Presentation] SOME: Reference-less Sub-Metrics Optimized for Manual Evaluations of Grammatical Error Correction2020

Author(s)

Organizer

Related Report

[Presentation] Cross-lingual Transfer Learning for Grammatical Error Correction2020

Author(s)

Organizer

Related Report

[Presentation] Chinese Grammatical Correction Using BERT-based Pre-trained Model2020

Author(s)

Organizer

Related Report

[Presentation] Stronger Baselines for Grammatical Error Correction Using a Pretrained Encoder-Decoder Model2020

Author(s)

Organizer

Related Report

[Presentation] Non-Autoregressive Grammatical Error Correction Towards a Writing Support System2020

Author(s)

Organizer

Related Report

[Presentation] Zero-shot North Korean to English Neural Machine Translation by Character Tokenization and Phoneme Decomposition2020

Author(s)

Organizer

Related Report

[Presentation] Automated Essay Scoring System for Nonnative Japanese Learners2020

Author(s)

Organizer

Related Report

[Presentation] Korean to Japanese Neural Machine Translation System Using Hanja Information2020

Author(s)

Organizer

Related Report