Co-Investigator(Kenkyū-buntansha) |
NAMIKI Mitaro Tokyo University of Agriculture and Technology, Faculty of Engineering, Associate Prof, 工学部, 助教授 (10208077)
HARA Shoichiro National Institute of Japanese Literature, Associate Prof, 研究情報学部, 助教授 (50218616)
YAMADA Shoji International Research Center for Japanese Studies, Research Division, Associate Prof, 研究部, 助教授 (20248751)
IWASAKI Hiroshi Kyoto Univ., Professor of Emeritus, コミュニティ振興学部, 教授 (50087904)
KAWAGUCHI Hiroshi Tezukayama Univ., Faculty of Information and Management, Associate Prof, 経営情報学部, 助教授 (80224749)
|
Budget Amount *help |
¥5,800,000 (Direct Cost: ¥5,800,000)
Fiscal Year 2002: ¥2,000,000 (Direct Cost: ¥2,000,000)
Fiscal Year 2001: ¥3,800,000 (Direct Cost: ¥3,800,000)
|
Research Abstract |
The purpose of this research is to build the electronic dictionaries, "Kuzushiji Kaidoku dictionary" and "Kuzushiji Yourei dictionary", used in which the specialist of the historical study, paleography, and literature deciphers the historical handwritten documents using the computer including mobil and note book styles, and to develop the computerized dictionary that can be used in a mobil environment. Moreover, it is to apply the dictionaries directly to the character recognition researches in the transliteration supporting system for historical documents (Historical document OCR) mentioned above. The following research results were obtained during this reserch period. (1) The images which is the index of "Kuzushiji Yourei dictionary" (it allows us to retrieve the shape of letters and examples of letter use based on the stroke (Kihitsu-jun) index) were input as the images with attributes such as "Kuzushiji Yourei dictionary code", "Mojikyo code" and "Shift-JIS" internal code, and an elec
… More
tronic Moji database was built (2) A retrieval function which the user can search the similar characters in the above-mentioned dictionary was developed (3) The "n-gram" method was applied to the researches in the historical document transliteration supporting system (historical document OCR), and it was confirmed that "n-gram" was effective when the lost or missing charahter in the document was presumed (4) To build the character pattern dictionary of about 240,000 characters on the historical document to be used in the recognition process, a development of segmentation program and the character selection work were carried out (5) The second edition of HCD series below in the historical document character database had been made as one of computerized dictionaries. (a) HCD2, title line for debt bond, Fushimiya Zenbei document, 200 lines, 1,378 characters, and binary format. (b) HCD2a, title line for the bond, Fushimiya Zenbei document, 200 lines, 1,378 characters, and 256 steps. c HCD2b, title line for debt bond, Fushimiya Zenbei document, 200 lines, and 24bits 1,378 character colors format. (d) HCD3, title line for debt bond, Fushimiya Zenbei document, 183 character types, 4933 characters, and binary format (6) The character recognition in the document focused on the title line was carried out using the above-mentioned dictionary. The research of the recognition techniques for matching the character pattern without segmentation for each character in title line was developed (7) Study on estimation for stroke order extracted from "Database of Kuzushiji Kaidoku dictionary" made by the dictionary has been carried out. Research reports including intermediate version for this study were published in March, 2001 and 2000 respectively besides papers regarding the historical document transliteration supporting system Less
|