1989 Fiscal Year Final Research Report Summary
A study of Mixed-Mode Communication based on Word-Unit Processing and Graphic Commands
Project/Area Number |
62460124
|
Research Category |
Grant-in-Aid for General Scientific Research (B)
|
Allocation Type | Single-year Grants |
Research Field |
電子通信系統工学
|
Research Institution | KOGAKUIN University |
Principal Investigator |
MINAMI Toshi Kogakuin Univ., Dept. of Eng., Professor, 工学部, 教授 (80146729)
|
Co-Investigator(Kenkyū-buntansha) |
KURUMA Koji Oki Electric Industry Co.Ltd., Video Commu.Eng.Dept., General Manager, 画像通信技術部, 部長
SHINOHARA Katsuyuki Kogakuin Univ., Dept. of Eng., Lecturer, 工学部, 講師 (40100309)
NAKAMURA Osamu Kogakuin Univ., Dept. of Eng., Associate Professor, 工学部, 助教授 (70100336)
TAKAHASHI Shizuaki Kogakuin Univ., Dept. of Eng., Associate Professor, 工学部, 助教授 (90100304)
|
Project Period (FY) |
1987 – 1990
|
Keywords | text processing / coding / mixad-mode / word recognition / character recognition / font recognition / multi media database access / figure primitive |
Research Abstract |
The encoding algorithm of English texts using pattern recognition technique is proposed. First, each page of document is segmented into blocks which contain specific kind of contents such as text, tables, graphs etc. by the weighted propagation and shrinking operation. For the text blocks, words identification is carried out using dictionaries of 26 prefixes, 338 roots and 24 suffixes and then the identified words are encoded into digital signals. Unidentified words are encoded into the ISO code for information interchange character. For the table blocks, straight lines are extracted by the Hough trans form and the attributes of lines are transmitted with the graphic commands. The remainders in the table blocks are encoded into digital signals just like as the text blocks. For the graphic blocks, straight lines are again extracted by the Hough transform and symbols of six types: *, *, *, *, *, * are identified by pattern matching. The attributes of lines and symbol elements are also transmitted by the graphic commands. After the lines and symbols are removed, the remaining picture is encoded by the MMR coding scheme with line-skipped scanning. The skipped lines are restored using the preceding and following lines. The number of skipped lines are selected out of 0,1 or 3 referring the number of error pixels on the restored picture. To transmit the documents, editing information indicating document profile, layout object, presentation function etc. is added to the encoded signals. The computer simulation results for transmission of English documents including texts and black-and-white images show that the overall compression ratio of the processed signal to the conventional facsimile signal is 124.9. With regard to the text blocks, by adopting the word-unit encoding, 1.91 times of compression ratio improvement is achieved compared with the encoding by characters.
|