• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Analysis of all-digitized textbook data and the construction of a large-scale Japanese corpus

Research Project

Project/Area Number 14380174
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypeSingle-year Grants
Section一般
Research Field 情報システム学(含情報図書館学)
Research InstitutionTokyo University of Foreign Studies

Principal Investigator

SANO Hiroshi  Tokyo University of Foreign Studies, Faculty of Foreign Studies, Professor, 外国語学部, 教授 (30282776)

Co-Investigator(Kenkyū-buntansha) SHIBANO Koji  Tokyo University of Foreign Studies, Research Institute for languages and Cultures of Asia and Africa, Professor, アジア・アフリカ言語文化研究所, 教授 (50216024)
MINEGISHI Makoto  Tokyo University of Foreign Studies, Research Institute for languages and Cultures of Asia and Africa, Professor, アジア・アフリカ言語文化研究所, 教授 (20190712)
FUJIMURA Tomoko  Tokyo University of Foreign Studies, Japanese Language Center for International Studies, Associate Professor, 留学生日本語教育センター, 助教授 (20229040)
Project Period (FY) 2002 – 2004
Project Status Completed (Fiscal Year 2004)
Budget Amount *help
¥14,100,000 (Direct Cost: ¥14,100,000)
Fiscal Year 2004: ¥4,600,000 (Direct Cost: ¥4,600,000)
Fiscal Year 2003: ¥4,100,000 (Direct Cost: ¥4,100,000)
Fiscal Year 2002: ¥5,400,000 (Direct Cost: ¥5,400,000)
KeywordsJapanese Language processing / Large-scale Japanese corpus / Mathematical Linguistics / Corpus Linguistics / コーパス言語
Research Abstract

Research results
In 2002
In 2002, we used morphological analysis techniques to process a Japanese corpus that was made up of academic subjects such as Japanese language (65 files), social studies (209 files), and home economics (11 files) totaling 285 files of digitized textbook data. Because of the relatively high occurrence of hiragana characters in primary school textbooks, the computational morphological analysis produced a great deal of erroneous results. It was required to take a linguistic, albeit manual, approach to analyze the erroneous results and improve the methodology. It was found that the errors were not only caused by the high occurrence of hiragana in primary and secondary school textbooks, but also a lack of linguistic consideration in some parts of the morphological analysis. The types of errors were classified and organized. In 2005, the plan is to refer to this type classification and add more textbook data to the corpus for processing.
For the morphological analysis, … More one piece of software that was used was called "Chaser" (developed by the Matsumoto Lab at Nara Advanced Institute of Science and Technology). Additional software was developed to aid in the analysis of the morphologically processed corpus. Improvements were made to the Japanese language analytical software over time as it was used. These software programs were made available as freeware on the Internet.
In 2003
A Japanese language corpus was created from Japanese textbooks (the contents included Japanese language (65 files), social studies (209 files), and home economics (11 files) for a total of 285 files). The corpus was morphologically processed and the results were analyzed. Some software was developed for this analysis in 2004. This software for Japanese text analysis was made available as freeware in 2004.
The software that was developed made it possible to efficiently do analysis of the Japanese language and to collect information from a large-scale corpus to a degree that was not available before. It was used by graduate students to do their own Japanese language research. A book introducing methodology for analysis of Japanese was published, and the software was attached to it as a CD-ROM.
Using the textbook corpus, our research also included the analyses of kanji usage attributes, the word as a unit, the contextual usage of words, and the usage ratios of kanji characters in context. We found that the use of kanji characters largely depends on the type of word they appear in. Prior to this research, a definitive study of the usage qualities of kanji, their distribution, and their usage frequencies in written text did not exist.
In 2004
Further incremental improvements were made on our methodologies. The usage attributes of kanji and words were further analyzed within the paragraph as a unit. English textbooks were collected for the concurrent analysis of an English corpus. Less

Report

(4 results)
  • 2004 Annual Research Report   Final Research Report Summary
  • 2003 Annual Research Report
  • 2002 Annual Research Report
  • Research Products

    (16 results)

All 2004 2003 Other

All Journal Article (5 results) Book (1 results) Publications (10 results)

  • [Journal Article] 多言語対応・初級日本語e-Learning教材の開発2004

    • Author(s)
      佐野洋, 林俊成, 藤村知子, 芝野耕司
    • Journal Title

      CIEC(コンピュータ&エデュケーション)会誌 VOL17

      Pages: 8-8

    • NAID

      130004709698

    • Related Report
      2004 Annual Research Report
  • [Journal Article] 日本語学習素材作成のための日本語処理ソフトウェア2003

    • Author(s)
      佐野洋
    • Journal Title

      CIEC(コンピュータ&エデュケーション)会誌 VOL15

      Pages: 8-8

    • NAID

      130004709693

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] 日本語調査用ツール CLTOOL2003

    • Author(s)
      佐野洋
    • Journal Title

      東京外国語大学,語学研究書論集 第8号

      Pages: 8-8

    • NAID

      120000992495

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] ESP適合の教材コンテンツを実現する語学教育支援システム2003

    • Author(s)
      佐野洋
    • Journal Title

      CIEC(コンピュータ利用教育協議会)外国語教育研究部会 (CD-ROM書籍出版)

      Pages: 10-10

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Journal Article] Development of a Software for Creating Japanese Educational Materials2003

    • Author(s)
      SANO, Hiroshi
    • Journal Title

      Computer & Education, Council for Improvement of Education through Computers Vol15

      Pages: 85-90

    • NAID

      130004709693

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Book] WindowsPCによる日本語研究法2003

    • Author(s)
      佐野洋
    • Total Pages
      148
    • Publisher
      共立出版
    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2004 Final Research Report Summary
  • [Publications] 佐野洋: "日本学習素材作成のための日本語処理ソフトウェア"CIEC(コンピュータ&エデュケーション)会誌. Vol15. (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] 佐野洋: "日本語調査用ツール CLTOOL"東京外国語大学,語学研究書論集. 第8号. (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] 佐野洋: "ESP適合の教材コンテンツを実現する語学教育支援システム"CIEC(コンピュータ利用教育協議会),外国語教育研究部会. CD-ROM書籍出版. (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] 佐野洋: "WindowsPCによる日本語研究法"共立出版. 148 (2003)

    • Related Report
      2003 Annual Research Report
  • [Publications] 佐野洋: "ESP適合の教材コンテンツを実現する語学教育支援システム"『最新外国語CALLの研究と実践』,外国語教育研究部会,CIEC会誌. (CD-ROM出版). 8 (2003)

    • Related Report
      2002 Annual Research Report
  • [Publications] 佐野洋: "日本語調査用ツールCLTOOL"『語学研究所論集』,語学研究所,東京外国語大学. 8号. 10 (2003)

    • Related Report
      2002 Annual Research Report
  • [Publications] 佐野洋, 幸松英恵: "外国人のための日本語教育教材の問題点について"コンピュータと教育研究会報告,情報処理学会. Vol.2002, No96. 37-44 (2002)

    • Related Report
      2002 Annual Research Report
  • [Publications] 佐野洋: "ソフトウェア再利用による語彙調査用ツールの開発"言語処理学会年次大会講演論文集,言語処理学会. 679-682 (2003)

    • Related Report
      2002 Annual Research Report
  • [Publications] 佐野洋: "ESP適合の教材生成を目指した語学教育支援システム"情報処理学会全国大会後援論文集,情報処理学会. 4 (2003)

    • Related Report
      2002 Annual Research Report
  • [Publications] 佐野洋: "WindowsPCによる日本語研究"共立出版(出版予定). 200 (2003)

    • Related Report
      2002 Annual Research Report

URL: 

Published: 2002-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi