Construction of Programming Support Environment for Japanese Text Processing

Research Project

Project/Area Number	60460228
Research Category	Grant-in-Aid for General Scientific Research (B)
Allocation Type	Single-year Grants
Research Field	Informatics
Research Institution	Kyushu University
Principal Investigator	USHIJIMA Kazuo Kyushu University, 工学部, 教授 (40037750)
Co-Investigator(Kenkyū-buntansha)	TAKAGI Toshihisa Kyushu University, 工学部, 助手 (30110836) SUEYOSHI Toshinori Kyushu University, 大学院総合理工学研究科, 助教授 (00117136) ARAKI Keijiro Kyushu University, 工学部, 助教授 (40117057) FUJIMURA Naomi Kyushu University, 情報処理教育センター, 助教授 (40117239)
Project Period (FY)	1985 – 1986
Project Status	Completed (Fiscal Year 1986)
Budget Amount *help	¥5,600,000 (Direct Cost: ¥5,600,000) Fiscal Year 1986: ¥1,900,000 (Direct Cost: ¥1,900,000) Fiscal Year 1985: ¥3,700,000 (Direct Cost: ¥3,700,000)
Keywords	Japanese Text Processing / Normalized Japanese Text / String Matching / Boyer-Moore Algorithm / Software Developing Environment / Adaのパッケージ機能 / エラーメッセージの日本語化
Research Abstract	1. The Japanese text is composed of a free mixture of passages in two kinds of character sets: traditional alphabetic set coded into one byte units and Japanese character set coded into two byte units. At present, there is no standard form to express the mixture of these two character sets. To start with, we proposed a normalized Japanese text in which each one-byte character is coded into two bytes by appending a leading byte. This text allows us to treat both character sets uniformly. 2. We applied the string matching algorithms which are well known in the Roman alphabetic world to the normalized Japanese texts and compared their performances experimentally. As a result, we pointed out that the Boyer-Moore algorithm which is extremely efficient on Roman alphabetic texts shows the worst performance on Japanese texts with a large character set. This is because the initialization time of the table necessary for the algorithm is proportional to the size of the character set. We presented an efficient method to reduce the initialization time by representing the structure of the table hierarchically. We also developed a text scanning method which regards the normalized Japanese texts as only a series of one byte codes. In this method, mis-detections may occur. But, by using the normalized Japanese text, the mis-detections can be easily found. The Boyer-Moore algorithm implemented by this method shows the best performance. 3. We constructed packages to make it possible to handle Japanese texts in Ada which can not originally treat Japanese characters as predefined characters. Using these packages we can write easily Ada programs processing Japanese texts. 4. We developed a prototype of writing tools for Japanese documents, called SUIKOU, that analyzes machine-readable Japanese documents textually and provides writers with the useful information for polishing them.

Report

(1 results)

1986 Final Research Report Summary

Research Products
(11 results)

All Other

All Publications (11 results)

[Publications] J.Yoon;T.Takagi;K.Ushijima: Proc.of Pacific Computer Communications Symposium. 400-402 (1985)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1986 Final Research Report Summary
[Publications] 牛島和夫,日並順二,尹志煕,高木利久: コンピュータソフトウェア. 3. 35-46 (1986)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1986 Final Research Report Summary
[Publications] J.Yoon;T.Takagi;K.Ushijima: 京都大学数理解析研究所講究録. 586. 18-34 (1986)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1986 Final Research Report Summary
[Publications] K.Ushijima;T.Matsuo;K.Araki: Proc.1986 International Conference on Chinese Computing. 93-97 (1986)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1986 Final Research Report Summary
[Publications] J.Yoon;T.Takagi;K.Ushijima: Proc.1986 International Conference on Chinese Computing. 297-304 (1986)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1986 Final Research Report Summary
[Publications] M.Hirabaru;K.Ushijima: Proc.of International Computer Symposium 1986. 889-895 (1986)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1986 Final Research Report Summary
[Publications] J. Yoon, T. Takagi, and K. Ushijima: "Comparison and Improvement of String Matching Algorithms for Texts with a Large Character Set" Proc. of Pacific Computer Communications Symposium. 400-402 (1985)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1986 Final Research Report Summary
[Publications] K. Ushijima, J. Hinami, J. Yoon, and T. Takagi: "Prototyping the System of Writing Tools for Japanese Documents" Computer Software. 3. 35-46 (1986)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1986 Final Research Report Summary
[Publications] J. Yoon, T. Takagi, and K. Ushijima: "Comparison and Improvement of String Matching Algorithms for Japanese Texts" Memoirs of the Research Institute for Mathematical Sciences, Kyoto Univ.586. 18-34 (1986)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1986 Final Research Report Summary
[Publications] K. Ushijima, T. Matsuo, and K. Araki: "Development of Ada Packages for Japanese Text Handling" Proc. 1986 International Conference on Chinese Computing. 93-97 (1986)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1986 Final Research Report Summary
[Publications] J. Yoon, T. Takagi, and K. Ushijima: "An Experimental Study of String Matching Algorithms for Japanese Texts" Proc. 1986 International Conference on Chinese Computing. 297-304 (1986)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1986 Final Research Report Summary

Construction of Programming Support Environment for Japanese Text Processing

Principal Investigator

USHIJIMA Kazuo Kyushu University, 工学部, 教授 (40037750)

¥5,600,000 (Direct Cost: ¥5,600,000)

Report

Research Products

[Publications] J.Yoon;T.Takagi;K.Ushijima: Proc.of Pacific Computer Communications Symposium. 400-402 (1985)

Description

Related Report

[Publications] 牛島和夫,日並順二,尹志煕,高木利久: コンピュータソフトウェア. 3. 35-46 (1986)

Description

Related Report

[Publications] J.Yoon;T.Takagi;K.Ushijima: 京都大学数理解析研究所講究録. 586. 18-34 (1986)

Description

Related Report

[Publications] K.Ushijima;T.Matsuo;K.Araki: Proc.1986 International Conference on Chinese Computing. 93-97 (1986)

Description

Related Report

[Publications] J.Yoon;T.Takagi;K.Ushijima: Proc.1986 International Conference on Chinese Computing. 297-304 (1986)

Description

Related Report

[Publications] M.Hirabaru;K.Ushijima: Proc.of International Computer Symposium 1986. 889-895 (1986)

Description

Related Report

[Publications] J. Yoon, T. Takagi, and K. Ushijima: "Comparison and Improvement of String Matching Algorithms for Texts with a Large Character Set" Proc. of Pacific Computer Communications Symposium. 400-402 (1985)

Description

Related Report

[Publications] K. Ushijima, J. Hinami, J. Yoon, and T. Takagi: "Prototyping the System of Writing Tools for Japanese Documents" Computer Software. 3. 35-46 (1986)

Description

Related Report

[Publications] J. Yoon, T. Takagi, and K. Ushijima: "Comparison and Improvement of String Matching Algorithms for Japanese Texts" Memoirs of the Research Institute for Mathematical Sciences, Kyoto Univ.586. 18-34 (1986)

Description

Related Report

[Publications] K. Ushijima, T. Matsuo, and K. Araki: "Development of Ada Packages for Japanese Text Handling" Proc. 1986 International Conference on Chinese Computing. 93-97 (1986)

Description

Related Report

[Publications] J. Yoon, T. Takagi, and K. Ushijima: "An Experimental Study of String Matching Algorithms for Japanese Texts" Proc. 1986 International Conference on Chinese Computing. 297-304 (1986)

Description

Related Report