Implementation of supporting system and environment for auto-extracting texts from early-modern printed books
Project/Area Number |
26280119
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Partial Multi-year Fund |
Section | 一般 |
Research Field |
Library and information science/Humanistic social informatics
|
Research Institution | Nara Women's University |
Principal Investigator |
Joe Kazuki 奈良女子大学, 生活環境科学系, 教授 (90283928)
|
Co-Investigator(Kenkyū-buntansha) |
高田 雅美 奈良女子大学, 生活環境科学系, 講師 (20397574)
|
Research Collaborator |
KIMEZAWA Tsukasa 国立国会図書館西館, 電子図書館課, 書士
|
Project Period (FY) |
2014-04-01 – 2017-03-31
|
Project Status |
Completed (Fiscal Year 2016)
|
Budget Amount *help |
¥11,960,000 (Direct Cost: ¥9,200,000、Indirect Cost: ¥2,760,000)
Fiscal Year 2016: ¥3,380,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥780,000)
Fiscal Year 2015: ¥3,380,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥780,000)
Fiscal Year 2014: ¥5,200,000 (Direct Cost: ¥4,000,000、Indirect Cost: ¥1,200,000)
|
Keywords | 近代書籍用OCR / 文字認識 / 特徴量 / アンサンブル学習 / 特徴抽出 / Webアプリケーション / 遺伝的プログラミング / コンテンツ・アーカイブ / テキスト化 / 近代書籍テキスト化 / 進化計算 / Webサービス / データベース / デジタルアーカイブ / Webプログラミング |
Outline of Final Research Achievements |
In this research, we implemented a supporting system and environment for auto-extracting texts from early-modern printed books. Apart from the current DTP, early-modern printed character recognition requires picture images of early-modern printed books for learning samples. When we collect up to 1000 types characters, the task is not so difficult while when it reaches to about 2000, the task is almost impossible. So we implemented an early-modern printed character recognition system with inefficient learning samples to apply early-modern printed books. The system detects unrecognizable character types to ask user for the correct type. The correctly recognized characters are given to the learning samples so that the recognition system is improved.
|
Report
(4 results)
Research Products
(8 results)