Improvement and performance evaluation of the mathematical formula recognition method for digitalization of mathematical journals
Project/Area Number |
14580446
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
情報システム学(含情報図書館学)
|
Research Institution | Shinshu University |
Principal Investigator |
OKAMOTO Masayuki Shinshu University, Department of Information Engineering, Professor, 工学部, 教授 (50109196)
|
Co-Investigator(Kenkyū-buntansha) |
SUZUKI Masakazu Kyushu University, Graduate School of Mathematics, Professor, 大学院・数理学研究院, 教授 (20112302)
|
Project Period (FY) |
2002 – 2004
|
Project Status |
Completed (Fiscal Year 2004)
|
Budget Amount *help |
¥3,300,000 (Direct Cost: ¥3,300,000)
Fiscal Year 2004: ¥900,000 (Direct Cost: ¥900,000)
Fiscal Year 2003: ¥1,000,000 (Direct Cost: ¥1,000,000)
Fiscal Year 2002: ¥1,400,000 (Direct Cost: ¥1,400,000)
|
Keywords | Mathematical formula Recognition / Document Image Processing / Character Recognition / Pattern Recognition |
Research Abstract |
This research project aimed improvement and performance evaluation of the mathematical formula recognition system which has been developed in our laboratory. Automatic recognition of mathematical formula plays an important roles in digitization of scientific or engineering documents. But current OCR systems can not deal with mathematical formulas due to their two dimensional layout of characters or symbols. We have collaborated with Professor Michler of the University of Essen, Germany, on the project of "Retro-digitalization of mathematical journals, and their integration searchable digital libraries". In this project, we developed a mathematical formula recognition system. This time, we improved this system in order to deal with the problems such as wide variety of formula types, low printing quality, and touching or separated characters and symbols. To evaluate the recognition performance, two kinds of mathematical journals were scanned and a Ground-Truth of formula images were created. This Ground-Truth includes 21472 formula images. The results of performance evaluation with respect to the recognition of symbols and structures are 99.4% and 99.09% respectively, This results show the potential of OCR which can convert scientific documents into electronic forms.
|
Report
(4 results)
Research Products
(21 results)