Project/Area Number |
62460124
|
Research Category |
Grant-in-Aid for General Scientific Research (B)
|
Allocation Type | Single-year Grants |
Research Field |
電子通信系統工学
|
Research Institution | KOGAKUIN University |
Principal Investigator |
MINAMI Toshi Kogakuin Univ., Dept. of Eng., Professor, 工学部, 教授 (80146729)
|
Co-Investigator(Kenkyū-buntansha) |
KURUMA Koji Oki Electric Industry Co.Ltd., Video Commu.Eng.Dept., General Manager, 画像通信技術部, 部長
SHINOHARA Katsuyuki Kogakuin Univ., Dept. of Eng., Lecturer, 工学部, 講師 (40100309)
NAKAMURA Osamu Kogakuin Univ., Dept. of Eng., Associate Professor, 工学部, 助教授 (70100336)
TAKAHASHI Shizuaki Kogakuin Univ., Dept. of Eng., Associate Professor, 工学部, 助教授 (90100304)
|
Project Period (FY) |
1987 – 1990
|
Project Status |
Completed (Fiscal Year 1989)
|
Budget Amount *help |
¥6,400,000 (Direct Cost: ¥6,400,000)
Fiscal Year 1989: ¥1,000,000 (Direct Cost: ¥1,000,000)
Fiscal Year 1988: ¥2,400,000 (Direct Cost: ¥2,400,000)
Fiscal Year 1987: ¥3,000,000 (Direct Cost: ¥3,000,000)
|
Keywords | text processing / coding / mixad-mode / word recognition / character recognition / font recognition / multi media database access / figure primitive / ミクストモード / セグメンテーション / 単語単位の符号化 / ミクストモード通信 / 文書画像の符号化 / 文書画像の編集 / インデックスの自動抽出 |
Research Abstract |
The encoding algorithm of English texts using pattern recognition technique is proposed. First, each page of document is segmented into blocks which contain specific kind of contents such as text, tables, graphs etc. by the weighted propagation and shrinking operation. For the text blocks, words identification is carried out using dictionaries of 26 prefixes, 338 roots and 24 suffixes and then the identified words are encoded into digital signals. Unidentified words are encoded into the ISO code for information interchange character. For the table blocks, straight lines are extracted by the Hough trans form and the attributes of lines are transmitted with the graphic commands. The remainders in the table blocks are encoded into digital signals just like as the text blocks. For the graphic blocks, straight lines are again extracted by the Hough transform and symbols of six types: *, *, *, *, *, * are identified by pattern matching. The attributes of lines and symbol elements are also transmitted by the graphic commands. After the lines and symbols are removed, the remaining picture is encoded by the MMR coding scheme with line-skipped scanning. The skipped lines are restored using the preceding and following lines. The number of skipped lines are selected out of 0,1 or 3 referring the number of error pixels on the restored picture. To transmit the documents, editing information indicating document profile, layout object, presentation function etc. is added to the encoded signals. The computer simulation results for transmission of English documents including texts and black-and-white images show that the overall compression ratio of the processed signal to the conventional facsimile signal is 124.9. With regard to the text blocks, by adopting the word-unit encoding, 1.91 times of compression ratio improvement is achieved compared with the encoding by characters.
|