• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Constructing simplified Japanese corpus and prototyping automatic text simplification

Research Project

Project/Area Number 17K18481
Research Category

Grant-in-Aid for Challenging Research (Exploratory)

Allocation TypeMulti-year Fund
Research Field Literature, Linguistics, and related fields
Research InstitutionNagaoka University of Technology

Principal Investigator

Yamamoto Kazuhide  長岡技術科学大学, 工学研究科, 准教授 (40359708)

Project Period (FY) 2017-06-30 – 2020-03-31
Project Status Completed (Fiscal Year 2019)
Budget Amount *help
¥6,240,000 (Direct Cost: ¥4,800,000、Indirect Cost: ¥1,440,000)
Fiscal Year 2019: ¥2,600,000 (Direct Cost: ¥2,000,000、Indirect Cost: ¥600,000)
Fiscal Year 2018: ¥2,210,000 (Direct Cost: ¥1,700,000、Indirect Cost: ¥510,000)
Fiscal Year 2017: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Keywordsやさしい日本語 / 平易化 / 対訳コーパス / テキスト平易化 / ニューラル機械翻訳 / WordNet / USMT / 自然言語処理 / 自動平易化 / 人工知能 / 教育工学
Outline of Final Research Achievements

(1) A simple Japanese checker is developed that automatically determines whether it is simple Japanese, and it is released to the public as a Web application.
(2) Using this tool, a Japanese simplified corpus is created which includes 50000 original sentences and their corresponding simplified ones. At present, this is the only simplified Japanese corpus and is also the largest in the world. At the same time, a simple Japanese vocabulary of 2000 words and grammar are defined. In addition, the simplified Japanese corpus described above is expanded by using crowdsourcing, and a corpus of 35,000 new sentences is created.
(3) Using the above corpora, various studies on Japanese simplification were conducted.

Academic Significance and Societal Importance of the Research Achievements

やさしい日本語に対する潜在需要と一般の関心は高く、NHK News Web Easy や自治体などで徐々に社会的に認知される段階に入りつつある。そのような中で、日本語で唯一の平易化コーパスを構築して公開した意義は非常に大きい。日本語の自動平易化研究は語彙平易化を除けば現時点で本研究課題のみであり、自然言語処理への貢献も非常に大きいと考える。政府や自治体からのお知らせがもし自動でやさしい日本語に変換することができれば情報保障の観点から非常に有益で、本研究課題はそのための基礎を構築できたと考えている。

Report

(4 results)
  • 2019 Annual Research Report   Final Research Report ( PDF )
  • 2018 Research-status Report
  • 2017 Research-status Report
  • Research Products

    (12 results)

All 2020 2019 2018 2017

All Journal Article (1 results) (of which Int'l Joint Research: 1 results,  Peer Reviewed: 1 results) Presentation (11 results) (of which Int'l Joint Research: 4 results)

  • [Journal Article] Extremely Low-Resource Text Simplification with Pre-trained Transformer Language Model2020

    • Author(s)
      Maruyama Takumi、Yamamoto Kazuhide
    • Journal Title

      International Journal of Asian Language Processing

      Volume: 30 Issue: 01 Pages: 2050001-2050001

    • DOI

      10.1142/s2717554520500010

    • Related Report
      2019 Annual Research Report
    • Peer Reviewed / Int'l Joint Research
  • [Presentation] 教師なし語彙平易化及びWordNetによる出力制限2020

    • Author(s)
      勝田 哲弘, 山本 和英
    • Organizer
      言語処理学会第26回年次大会, pp.1217-1220
    • Related Report
      2019 Annual Research Report
  • [Presentation] 公的文書平易化における出力制御2020

    • Author(s)
      丸山 拓海, 山本 和英
    • Organizer
      言語処理学会第26回年次大会
    • Related Report
      2019 Annual Research Report
  • [Presentation] Extremely Low Resource Simplification with Pre-trained Transformer Language Model2019

    • Author(s)
      Takumi Maruyama and Kazuhide Yamamoto
    • Organizer
      Proceedings of the International Conference on Asian Language Processing (IALP 2019), Best Paper Award, pp.53-58
    • Related Report
      2019 Annual Research Report
  • [Presentation] Improving text simplification by corpus expansion with unsupervised learning2019

    • Author(s)
      Akihiro Katsuta and Kazuhide Yamamoto
    • Organizer
      Proceedings of the International Conference on Asian Language Processing (IALP 2019), pp.216-221
    • Related Report
      2019 Annual Research Report
  • [Presentation] 日本語文法平易化コーパスの構築2019

    • Author(s)
      稲岡 夢人, 山本 和英
    • Organizer
      言語処理学会第25回年次大会, pp.375-378
    • Related Report
      2018 Research-status Report
  • [Presentation] Lexical Substitution is Practical for Rare Word Simplification2018

    • Author(s)
      Takumi Maruyama and Kazuhide Yamamoto
    • Organizer
      The 32nd Pacific Asia Conference on Language, Information and Computation (PACLIC 32), no page numbers
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Presentation] Simplified Corpus with Core Vocabulary2018

    • Author(s)
      Takumi Maruyama and Kazuhide Yamamoto
    • Organizer
      The 11th International Conference on Language Resources and Evaluation (LREC 2018), pp.1153-1160
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Presentation] Crowdsourced Corpus of Sentence Simplification with Core Vocabulary2018

    • Author(s)
      Akihiro Katsuta and Kazuhide Yamamoto
    • Organizer
      The 11th International Conference on Language Resources and Evaluation (LREC 2018), pp.461-466
    • Related Report
      2018 Research-status Report
    • Int'l Joint Research
  • [Presentation] クラウドソーシングによる大規模なやさしい日本語換言辞書の構築2018

    • Author(s)
      角張 竜晴, 山本 和英
    • Organizer
      言語処理学会第24回年次大会
    • Related Report
      2017 Research-status Report
  • [Presentation] やさしい日本語格フレームの構築による係り受け解析2018

    • Author(s)
      角張 竜晴, 山本 和英
    • Organizer
      言語処理学会第24回年次大会
    • Related Report
      2017 Research-status Report
  • [Presentation] Sentence Simplification with Core Vocabulary2017

    • Author(s)
      Takumi Maruyama and Kazuhide Yamamoto
    • Organizer
      Proceedings of the International Conference on Asian Language Processing (IALP 2017)
    • Related Report
      2017 Research-status Report
    • Int'l Joint Research

URL: 

Published: 2017-07-21   Modified: 2021-02-19  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi