• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Construction of Programming Support Environment for Japanese Text Processing

Research Project

Project/Area Number 60460228
Research Category

Grant-in-Aid for General Scientific Research (B)

Allocation TypeSingle-year Grants
Research Field Informatics
Research InstitutionKyushu University

Principal Investigator

USHIJIMA Kazuo  Kyushu University, 工学部, 教授 (40037750)

Co-Investigator(Kenkyū-buntansha) TAKAGI Toshihisa  Kyushu University, 工学部, 助手 (30110836)
SUEYOSHI Toshinori  Kyushu University, 大学院総合理工学研究科, 助教授 (00117136)
ARAKI Keijiro  Kyushu University, 工学部, 助教授 (40117057)
FUJIMURA Naomi  Kyushu University, 情報処理教育センター, 助教授 (40117239)
Project Period (FY) 1985 – 1986
Project Status Completed (Fiscal Year 1986)
Budget Amount *help
¥5,600,000 (Direct Cost: ¥5,600,000)
Fiscal Year 1986: ¥1,900,000 (Direct Cost: ¥1,900,000)
Fiscal Year 1985: ¥3,700,000 (Direct Cost: ¥3,700,000)
KeywordsJapanese Text Processing / Normalized Japanese Text / String Matching / Boyer-Moore Algorithm / Software Developing Environment / Adaのパッケージ機能 / エラーメッセージの日本語化
Research Abstract

1. The Japanese text is composed of a free mixture of passages in two kinds of character sets: traditional alphabetic set coded into one byte units and Japanese character set coded into two byte units. At present, there is no standard form to express the mixture of these two character sets. To start with, we proposed a normalized Japanese text in which each one-byte character is coded into two bytes by appending a leading byte. This text allows us to treat both character sets uniformly.
2. We applied the string matching algorithms which are well known in the Roman alphabetic world to the normalized Japanese texts and compared their performances experimentally. As a result, we pointed out that the Boyer-Moore algorithm which is extremely efficient on Roman alphabetic texts shows the worst performance on Japanese texts with a large character set. This is because the initialization time of the table necessary for the algorithm is proportional to the size of the character set. We presented an efficient method to reduce the initialization time by representing the structure of the table hierarchically. We also developed a text scanning method which regards the normalized Japanese texts as only a series of one byte codes. In this method, mis-detections may occur. But, by using the normalized Japanese text, the mis-detections can be easily found. The Boyer-Moore algorithm implemented by this method shows the best performance.
3. We constructed packages to make it possible to handle Japanese texts in Ada which can not originally treat Japanese characters as predefined characters. Using these packages we can write easily Ada programs processing Japanese texts.
4. We developed a prototype of writing tools for Japanese documents, called SUIKOU, that analyzes machine-readable Japanese documents textually and provides writers with the useful information for polishing them.

Report

(1 results)
  • 1986 Final Research Report Summary
  • Research Products

    (11 results)

All Other

All Publications (11 results)

  • [Publications] J.Yoon;T.Takagi;K.Ushijima: Proc.of Pacific Computer Communications Symposium. 400-402 (1985)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1986 Final Research Report Summary
  • [Publications] 牛島和夫,日並順二,尹志煕,高木利久: コンピュータソフトウェア. 3. 35-46 (1986)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1986 Final Research Report Summary
  • [Publications] J.Yoon;T.Takagi;K.Ushijima: 京都大学数理解析研究所講究録. 586. 18-34 (1986)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1986 Final Research Report Summary
  • [Publications] K.Ushijima;T.Matsuo;K.Araki: Proc.1986 International Conference on Chinese Computing. 93-97 (1986)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1986 Final Research Report Summary
  • [Publications] J.Yoon;T.Takagi;K.Ushijima: Proc.1986 International Conference on Chinese Computing. 297-304 (1986)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1986 Final Research Report Summary
  • [Publications] M.Hirabaru;K.Ushijima: Proc.of International Computer Symposium 1986. 889-895 (1986)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      1986 Final Research Report Summary
  • [Publications] J. Yoon, T. Takagi, and K. Ushijima: "Comparison and Improvement of String Matching Algorithms for Texts with a Large Character Set" Proc. of Pacific Computer Communications Symposium. 400-402 (1985)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1986 Final Research Report Summary
  • [Publications] K. Ushijima, J. Hinami, J. Yoon, and T. Takagi: "Prototyping the System of Writing Tools for Japanese Documents" Computer Software. 3. 35-46 (1986)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1986 Final Research Report Summary
  • [Publications] J. Yoon, T. Takagi, and K. Ushijima: "Comparison and Improvement of String Matching Algorithms for Japanese Texts" Memoirs of the Research Institute for Mathematical Sciences, Kyoto Univ.586. 18-34 (1986)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1986 Final Research Report Summary
  • [Publications] K. Ushijima, T. Matsuo, and K. Araki: "Development of Ada Packages for Japanese Text Handling" Proc. 1986 International Conference on Chinese Computing. 93-97 (1986)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1986 Final Research Report Summary
  • [Publications] J. Yoon, T. Takagi, and K. Ushijima: "An Experimental Study of String Matching Algorithms for Japanese Texts" Proc. 1986 International Conference on Chinese Computing. 297-304 (1986)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      1986 Final Research Report Summary

URL: 

Published: 1987-03-31   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi