Construction of Programming Support Environment for Japanese Text Processing
Project/Area Number |
60460228
|
Research Category |
Grant-in-Aid for General Scientific Research (B)
|
Allocation Type | Single-year Grants |
Research Field |
Informatics
|
Research Institution | Kyushu University |
Principal Investigator |
USHIJIMA Kazuo Kyushu University, 工学部, 教授 (40037750)
|
Co-Investigator(Kenkyū-buntansha) |
TAKAGI Toshihisa Kyushu University, 工学部, 助手 (30110836)
SUEYOSHI Toshinori Kyushu University, 大学院総合理工学研究科, 助教授 (00117136)
ARAKI Keijiro Kyushu University, 工学部, 助教授 (40117057)
FUJIMURA Naomi Kyushu University, 情報処理教育センター, 助教授 (40117239)
|
Project Period (FY) |
1985 – 1986
|
Project Status |
Completed (Fiscal Year 1986)
|
Budget Amount *help |
¥5,600,000 (Direct Cost: ¥5,600,000)
Fiscal Year 1986: ¥1,900,000 (Direct Cost: ¥1,900,000)
Fiscal Year 1985: ¥3,700,000 (Direct Cost: ¥3,700,000)
|
Keywords | Japanese Text Processing / Normalized Japanese Text / String Matching / Boyer-Moore Algorithm / Software Developing Environment / Adaのパッケージ機能 / エラーメッセージの日本語化 |
Research Abstract |
1. The Japanese text is composed of a free mixture of passages in two kinds of character sets: traditional alphabetic set coded into one byte units and Japanese character set coded into two byte units. At present, there is no standard form to express the mixture of these two character sets. To start with, we proposed a normalized Japanese text in which each one-byte character is coded into two bytes by appending a leading byte. This text allows us to treat both character sets uniformly. 2. We applied the string matching algorithms which are well known in the Roman alphabetic world to the normalized Japanese texts and compared their performances experimentally. As a result, we pointed out that the Boyer-Moore algorithm which is extremely efficient on Roman alphabetic texts shows the worst performance on Japanese texts with a large character set. This is because the initialization time of the table necessary for the algorithm is proportional to the size of the character set. We presented an efficient method to reduce the initialization time by representing the structure of the table hierarchically. We also developed a text scanning method which regards the normalized Japanese texts as only a series of one byte codes. In this method, mis-detections may occur. But, by using the normalized Japanese text, the mis-detections can be easily found. The Boyer-Moore algorithm implemented by this method shows the best performance. 3. We constructed packages to make it possible to handle Japanese texts in Ada which can not originally treat Japanese characters as predefined characters. Using these packages we can write easily Ada programs processing Japanese texts. 4. We developed a prototype of writing tools for Japanese documents, called SUIKOU, that analyzes machine-readable Japanese documents textually and provides writers with the useful information for polishing them.
|
Report
(1 results)
Research Products
(11 results)