• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2010 Fiscal Year Final Research Report

The development of a multi-purpose electric dictionary for morphological analyzers

Planned Research

  • PDF
Project AreaCompilation of a balanced corpus of written Japanese: Infrastructure for the coming Japanese linguistics
Project/Area Number 18061002
Research Category

Grant-in-Aid for Scientific Research on Priority Areas

Allocation TypeSingle-year Grants
Review Section Humanities and Social Sciences
Research InstitutionChiba University

Principal Investigator

DEN Yasuharu  千葉大学, 文学部, 教授 (70291458)

Co-Investigator(Kenkyū-buntansha) YAMADA Atsushi  京都高度技術研究所, 研究部, 主席研究員 (20240004)
MINEMATSU Nobuaki  東京大学, 大学院・新領域創成科学研究科, 准教授 (90273333)
UCHIMOTO Kiyotaka  情報通信研究機構, 総合企画部, プランニングマネージャー (60358885)
OGISO Tomonobu  国立国語研究所, 言語・資源研究系, 准教授 (20337489)
KOISO Hanae  国立国語研究所, 理論・構造研究系, 准教授 (30312200)
Project Period (FY) 2006 – 2010
Keywords電子化辞書 / 形態素解析 / 書き言葉コーパス / 音変化 / アクセント
Research Abstract

(1) An electric dictionary for morphological analyzers with the following characteristics has been developed. ・ Lexical entries with uniform unit-size based on Short-Unit Words ・ Hierarchical representation of lexical entries, consisting of lemma, form, orthography, and pronunciation, which enables us to deal with variations in orthography and word form ・ Rich information including features for phonological and accentual sandhi
(2) A version for morphological analyzer MeCab has been derived from the dictionary database, with several updates, which amounts to 210K lemma and 330K orthographic entries and which achieves an accuracy of 98.9% in part-of-speech tagging and an accuracy of 98.6% in lemma identification.
3) A version of the dictionary database represented by XML files has also been developed, which enables users to build customized dictionaries for morphological analyzers according to the user’s preference and purpose.
(4) Post-processing tools, including Middle- and Long-Unit-Word analyzers, have been developed for advanced use of the dictionary, such as syntactic analysis and text-to-speech application.

  • Research Products

    (18 results)

All 2011 2010 2009 2008 2007 2006 Other

All Journal Article (12 results) (of which Peer Reviewed: 8 results) Presentation (3 results) Book (2 results) Remarks (1 results)

  • [Journal Article] Design, compilation, and preliminary analyses of Balanced Corpus of Contemporary Written Japanese2010

    • Author(s)
      K. Maekawa, M. Yamazaki, T. Maruyama, M. Yamaguchi, H. Ogura, W. Kashino, T. Ogiso, H. Koiso, and Y. Den
    • Journal Title

      Proceedings of LREC2010

      Pages: 1483-1486

    • Peer Reviewed
  • [Journal Article] 中古和文を対象とした形態素解析辞書の開発2010

    • Author(s)
      小木曽智信・小椋秀樹・田中牧郎・近藤明日子・伝康晴
    • Journal Title

      情報処理学会研究報告

      Volume: 2010-CH-85 Pages: 49-64

  • [Journal Article] Development of an on-line word accent dictionary of Japanese2009

    • Author(s)
      H. Hirano, M. Suzuki, K. Innami, N. Minematsu, and K. Hirose
    • Journal Title

      Proceedings of JSAA-ICJLE 2009

      Volume: 24 Pages: 640-646

  • [Journal Article] 多様な目的に適した形態素解析システム用電子化辞書2009

    • Author(s)
      伝康晴
    • Journal Title

      人工知能学会誌

      Volume: 24 Pages: 640-646

  • [Journal Article] 話し言葉における引用節・挿入節の自動認定および係り受け解析への応用2009

    • Author(s)
      浜辺良二・内元清貴・河原達也・井佐原均
    • Journal Title

      自然言語処理

      Volume: 16(1) Pages: 3-23

    • Peer Reviewed
  • [Journal Article] 形態論情報の自動付与とその問題点2009

    • Author(s)
      小木曽智信
    • Journal Title

      国文学解釈と鑑賞

      Volume: 74(1) Pages: 35-43

  • [Journal Article] Word-level dependency-structure annotation to Corpus of Spontaneous Japanese and its application2008

    • Author(s)
      K. Uchimoto and Y. Den
    • Journal Title

      Proceedings of LREC2008

      Pages: 3118-3122

    • Peer Reviewed
  • [Journal Article] A proper approach to Japanese morphological analysis: Dictionary, model, and evaluation2008

    • Author(s)
      Y. Den, J. Nakamura, T. Ogiso, and H. Ogura
    • Journal Title

      Proceedings of LREC2008

      Pages: 1019-1024

    • Peer Reviewed
  • [Journal Article] CRF-based statistical learning of Japanese accent sandhi for developing Japanese text-to-speech synthesis systems2007

    • Author(s)
      N. Minematsu, R. Kuroiwa, K. Hirose, and M. Watanabe
    • Journal Title

      Proceedings of ISCA Workshop on Speech Synthesis

      Pages: 148-153

    • Peer Reviewed
  • [Journal Article] Morphological annotation of a large spontaneous speech corpus in Japanese2007

    • Author(s)
      K. Uchimoto, and H. Isahara
    • Journal Title

      Proceedings of IJCAI2007

      Pages: 1731-1737

    • Peer Reviewed
  • [Journal Article] コーパス日本語学のための言語資源:形態素解析用電子化辞書の開発とその応用2007

    • Author(s)
      伝康晴・小木曽智信・小椋秀樹・山田篤・峯松信明・内元清貴・小磯花絵
    • Journal Title

      日本語科学

      Volume: 22 Pages: 101-122

    • Peer Reviewed
  • [Journal Article] Dependency-structure annotation to Corpus of Spontaneous Japanese2006

    • Author(s)
      K. Uchimoto, R. Hamabe, T. Maruyama, K. Takanashi, T. Kawahara, and H. Isahara
    • Journal Title

      Proceedings of LREC2006

      Pages: 635-638

    • Peer Reviewed
  • [Presentation] テキストの多様性をとらえる分類指標の体系化の試み2011

    • Author(s)
      小磯花絵・田中弥生・小木曽智信・近藤明日子
    • Organizer
      言語処理学会第17回年次大会
    • Place of Presentation
      豊橋技術科学大学(愛知)
    • Year and Date
      2011-03-09
  • [Presentation] UniDic汎用後処理ツールの設計と実装2010

    • Author(s)
      山田篤・伝康晴
    • Organizer
      特定領域研究「日本語コーパス」平成21年度公開ワークショップ
    • Place of Presentation
      東京工業大学(東京)
    • Year and Date
      20103015
  • [Presentation] 形態素解析辞書のベンチマークテスト―IPAdic・NAIST-jdic・UniDic のジャンル別精度比較―2010

    • Author(s)
      小木曽智信・小椋秀樹・小磯花絵・宮内佐夜香・渡部涼子・伝康晴
    • Organizer
      言語処理学会第16回年次大会
    • Place of Presentation
      東京大学(東京)
    • Year and Date
      2010-03-10
  • [Book] 特定領域研究「日本語コーパス」平成22年度研究成果報告書『現代日本語書き言葉均衡コーパス』形態論情報規定集 第4版(上・下)2011

    • Author(s)
      小椋秀樹・小磯花絵・冨士池優美・宮内佐夜香・小西光・原裕
    • Total Pages
      359
    • Publisher
      国立国語研究所
  • [Book] 特定領域研究「日本語コーパス」平成 22年度研究成果報告書『現代日本語書き言葉均衡コーパス』形態論情報データベースの設計と実装 改訂版2011

    • Author(s)
      小木曽智信・中村壮範
    • Total Pages
      145
    • Publisher
      国立国語研究所
  • [Remarks]

    • URL

      http://download.unidic.org/

URL: 

Published: 2013-07-31  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi