2001 Fiscal Year Final Research Report Summary

Development of OCR (optical character recognition) system for scientific documents

Research Project

Project/Area Number	10558056
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Single-year Grants
Section	展開研究
Research Field	情報システム学(含情報図書館学)
Research Institution	KYUSHU UNIVERSITY
Principal Investigator	SUZUKI Masakazu Faculty of Mathematics, Kyushu University, Prof., 大学院・数理研究院, 教授 (20112302)
Co-Investigator(Kenkyū-buntansha)	FUKUDA Ryoji Oita University, Faculty of Engineering, Ass. Prof., 工学部, 助教授 (70238492) EJIMA Toshiaki Kyushu Inst. of Technology, Faculty of Computer Science and Systems Engineering, Prof., 情報工学部, 教授 (00124553) TAMARI Fumikazu Fukuoka Univ. of Education, Faculty of Education, Prof., 教育学部, 教授 (70036937) YAMAGATA Hedeaki Ricoh Co., Ltd., Software Research Center, ソフトウェア所・第2研究室, 研究員 TACHIKAWA Michiyoshi Ricoh Co., Ltd., Software Research Center, ソフトウェア所・第2研究室, 研究室長
Project Period (FY)	1998 – 2001
Keywords	OCR / Mathematical symbol recognition / Mathematical formual recognition / Document Analysis / Digital library
Research Abstract	In this research, we developed an OCR system adapted to scientific documents, in view of its application to retro-digitization of mathematical journals and automatic Braille transcription of mathematical documents. The target images are those obtained from clearly printed documents by 400-600DPI scanner. Since there is no commercial OCR software which can recognize mathematical symbols, we developed our own OCR engine. It recognizes with about 450 kinds of characters and symbols used in mathematical expressions, and distinguishes well the italic fonts and upright fonts of alphabets. For the recognition of text areas, there are several efficient post-processing methods to improve recognition results using linguistic information, while in mathematical expression areas, some other different methods of post-processing based on the structure of mathematical notations are efficient. Therefore, we developed algorithms to separate text area and mathematical expression areas, for both Japanese and English documents. As for the structure analysis of mathematical expressions, we developed a new method, robust against the recognition errors of characters and similar characters of different sizes. We first construct a network joining characters (symbols) by possible links of relations with cost, Finally, we obtain the result of the recognition of mathematical formulas as the spanning tree of minimum cost of the network, after reevaluating the candidates by using the cost reflecting global structure of the mathematical expressions. The advantage of this method is that local errors of the recognition are recovered automatically by the total cost of the recognition tree. We also developed handwriting interface to edit mathematical expressions to use it as an easy user interface to correct the recognition errors of mathematical expressions.

Research Products
(12 results)

All Other

All Publications (12 results)

[Publications] 能隅進一, 福田亮治, 玉利文和, 鈴木昌和: "絞り込み法による数式文字認識とその日本語/数式領域切りだしへの応用"電子情報通信学会論文誌. j83-DII, No.3. 895-906 (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] T.Kanahori, K.Tabata, W.Cong, F.Tamari, M.Suzuki: "On-Line Recognition of Mathematical Expressions Using Automatic Rewriting Method"Advances in Multimodal Interfaces ICMI2000, Lecture Notes in Computer Science, Springer. 1948. 394-401 (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Y.Eto, M.Suzuki: "Mathematical Formula Recognition Using Virtual Link Network"Proceedings of the 6th International Conference on Document Analysis and Recognition, Seattle, IEEE Computer Society Press. 430-437 (2001)
- Description
  「研究成果報告書概要(和文)」より
[Publications] T.Kanahori, M.Suzuki: "A Recognition Method of Matrices by Using Variable Block Pattern Elements Generating Rectangular Areas"Proceedings of the 4th IAPR International Workshop on Graphics Recognition. 455-469 (2001)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 村上玄生, 鈴木昌和: "Center Bandを用いた数式構造解析の安定化"電子情報通信学会技術研究報告. PRMU2001-270. 203-210 (2002)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 安藤英里子, 鈴木昌和: "文字画像の実時間クラスタリングを用いた文書認識と修正作業の効率化-英文数学書-"子情報通信学会技術研究報告. PRMU2001-271. 211-218 (2002)
- Description
  「研究成果報告書概要(和文)」より
[Publications] S. Nouzumi, R. Fukuda, F. Tamari, M. Suzuki: "Mathematical symbol recognition using filtering method and its application to the segmentation of Japanese area/Mathematical area"Transactions of IEICE. J83-DII No.3. 895-906 (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] T. Kanahori, K. Tabata, W. Cong, F. Tamari, M. Suzuki: "On-Line Recognition of Mathematical Expressions Using Automatic Rewriting Method"Advances in Multimodal Interfaces-ICMI2000, Lecture Notes in Computer Science 1948, Springer. 394-401 (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Y. Eto, M. Suzuki: "Mathematical Formula Recognition Using Virtual Link Network"Proceedings of the Sixth International Conference on Document Analysis and Recognition, Seattle, IEEE Computer Society Press. 430-437 (2001)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] T. Kanahori, M. Suzuki: "A Recognition Method of Matrices by Using. Variable Block Pattern Elements Generating Rectangular Areas"Proceedings of the Fourth IAPR International Workshop on Graphics Recognition. 455-469 (2001)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] M. Murakami, M. Suzuki: "Improvement of Mathematical structure analysis by using Center-Band"Technical Report of IEICE, PRMU2001-270. 203-210 (2002)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] E. Ando, M. Suzuki: "Document recognition by real-time classifications of character images and reduction of correction labor of recognition results"Technical Report of IEICE, PRMU2001-271. 211-218 (2002)
- Description
  「研究成果報告書概要(欧文)」より

2001 Fiscal Year Final Research Report Summary

Development of OCR (optical character recognition) system for scientific documents

Principal Investigator

SUZUKI Masakazu Faculty of Mathematics, Kyushu University, Prof., 大学院・数理研究院, 教授 (20112302)

Research Products

[Publications] 能隅進一, 福田亮治, 玉利文和, 鈴木昌和: "絞り込み法による数式文字認識とその日本語/数式領域切りだしへの応用"電子情報通信学会論文誌. j83-DII, No.3. 895-906 (2000)

Description

[Publications] T.Kanahori, K.Tabata, W.Cong, F.Tamari, M.Suzuki: "On-Line Recognition of Mathematical Expressions Using Automatic Rewriting Method"Advances in Multimodal Interfaces ICMI2000, Lecture Notes in Computer Science, Springer. 1948. 394-401 (2000)

Description

[Publications] Y.Eto, M.Suzuki: "Mathematical Formula Recognition Using Virtual Link Network"Proceedings of the 6th International Conference on Document Analysis and Recognition, Seattle, IEEE Computer Society Press. 430-437 (2001)

Description

[Publications] T.Kanahori, M.Suzuki: "A Recognition Method of Matrices by Using Variable Block Pattern Elements Generating Rectangular Areas"Proceedings of the 4th IAPR International Workshop on Graphics Recognition. 455-469 (2001)

Description

[Publications] 村上玄生, 鈴木昌和: "Center Bandを用いた数式構造解析の安定化"電子情報通信学会技術研究報告. PRMU2001-270. 203-210 (2002)

Description

[Publications] 安藤英里子, 鈴木昌和: "文字画像の実時間クラスタリングを用いた文書認識と修正作業の効率化-英文数学書-"子情報通信学会技術研究報告. PRMU2001-271. 211-218 (2002)

Description

[Publications] S. Nouzumi, R. Fukuda, F. Tamari, M. Suzuki: "Mathematical symbol recognition using filtering method and its application to the segmentation of Japanese area/Mathematical area"Transactions of IEICE. J83-DII No.3. 895-906 (2000)

Description

[Publications] T. Kanahori, K. Tabata, W. Cong, F. Tamari, M. Suzuki: "On-Line Recognition of Mathematical Expressions Using Automatic Rewriting Method"Advances in Multimodal Interfaces-ICMI2000, Lecture Notes in Computer Science 1948, Springer. 394-401 (2000)

Description

[Publications] Y. Eto, M. Suzuki: "Mathematical Formula Recognition Using Virtual Link Network"Proceedings of the Sixth International Conference on Document Analysis and Recognition, Seattle, IEEE Computer Society Press. 430-437 (2001)

Description

[Publications] T. Kanahori, M. Suzuki: "A Recognition Method of Matrices by Using. Variable Block Pattern Elements Generating Rectangular Areas"Proceedings of the Fourth IAPR International Workshop on Graphics Recognition. 455-469 (2001)

Description

[Publications] M. Murakami, M. Suzuki: "Improvement of Mathematical structure analysis by using Center-Band"Technical Report of IEICE, PRMU2001-270. 203-210 (2002)

Description

[Publications] E. Ando, M. Suzuki: "Document recognition by real-time classifications of character images and reduction of correction labor of recognition results"Technical Report of IEICE, PRMU2001-271. 211-218 (2002)

Description