Implementation of supporting system and environment for auto-extracting texts from early-modern printed books

Research Project

Project/Area Number	26280119
Research Category	Grant-in-Aid for Scientific Research (B)
Allocation Type	Partial Multi-year Fund
Section	一般
Research Field	Library and information science/Humanistic social informatics
Research Institution	Nara Women's University
Principal Investigator	Joe Kazuki 奈良女子大学, 生活環境科学系, 教授 (90283928)
Co-Investigator(Kenkyū-buntansha)	高田雅美奈良女子大学, 生活環境科学系, 講師 (20397574)
Research Collaborator	KIMEZAWA Tsukasa 国立国会図書館西館, 電子図書館課, 書士
Project Period (FY)	2014-04-01 – 2017-03-31
Project Status	Completed (Fiscal Year 2016)
Budget Amount *help	¥11,960,000 (Direct Cost: ¥9,200,000、Indirect Cost: ¥2,760,000) Fiscal Year 2016: ¥3,380,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥780,000) Fiscal Year 2015: ¥3,380,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥780,000) Fiscal Year 2014: ¥5,200,000 (Direct Cost: ¥4,000,000、Indirect Cost: ¥1,200,000)
Keywords	近代書籍用OCR / 文字認識 / 特徴量 / アンサンブル学習 / 特徴抽出 / Webアプリケーション / 遺伝的プログラミング / コンテンツ・アーカイブ / テキスト化 / 近代書籍テキスト化 / 進化計算 / Webサービス / データベース / デジタルアーカイブ / Webプログラミング
Outline of Final Research Achievements	In this research, we implemented a supporting system and environment for auto-extracting texts from early-modern printed books. Apart from the current DTP, early-modern printed character recognition requires picture images of early-modern printed books for learning samples. When we collect up to 1000 types characters, the task is not so difficult while when it reaches to about 2000, the task is almost impossible. So we implemented an early-modern printed character recognition system with inefficient learning samples to apply early-modern printed books. The system detects unrecognizable character types to ask user for the correct type. The correctly recognized characters are given to the learning samples so that the recognition system is improved.

Report

(4 results)

2016 Annual Research Report Final Research Report ( PDF )
2015 Annual Research Report
2014 Annual Research Report

Research Products
(8 results)

All 2016 2015

All Journal Article (3 results) (of which Peer Reviewed: 3 results, Acknowledgement Compliant: 2 results, Open Access: 1 results) Presentation (5 results) (of which Int'l Joint Research: 1 results, Invited: 2 results)

[Journal Article] 近代書籍を対象とした多フォント漢字認識2016
- Author(s)
  粟津妙華, 上坂和美，高田雅美, 城和貴
- Journal Title
  
  情報処理学会論文誌数理モデル化と応用
  
  Volume: 9(2) Pages: 33-40
- NAID
  170000148129
- Related Report
  2016 Annual Research Report
- Peer Reviewed / Acknowledgement Compliant
[Journal Article] Comparison of Feature Extraction Methods for Early-Modern Japanese Printed Character Recognition2016
- Author(s)
  Kazumi Kosaka, Kaori Fujimoto, Yu Ishikawa, Masami Takata, Kazuki Joe
- Journal Title
  
  Proceedings of PDPTA2016
  
  Volume: Final Edition Pages: 408-414
- Related Report
  2016 Annual Research Report
- Peer Reviewed / Open Access / Acknowledgement Compliant
[Journal Article] 活字データの分類を用いた進化計算による近代書籍からのルビ除去2015
- Author(s)
  粟津妙華, 高田雅美, 城和貴
- Journal Title
  
  情報処理学会論文誌数理モデル化と応用
  
  Volume: 8-1 Pages: 72-79
- NAID
  110009886645
- Related Report
  2014 Annual Research Report
- Peer Reviewed
[Presentation] デジタルコレクション　自動テキスト化への道2016
- Author(s)
  城和貴
- Organizer
  国立国会図書館　デジタルライブラリーカフェ
- Place of Presentation
  国立国会図書館
- Year and Date
  2016-11-25
- Related Report
  2016 Annual Research Report
- Invited
[Presentation] Comparison of Feature Extraction Methods for Early-Modern Japanese Printed Character Recognition2016
- Author(s)
  Kazumi Kosaka, Kaori Fujimoto, Yu Ishikawa, Masami Takata, Kazuki Joe
- Organizer
  PDPTA2016 MPS workshop
- Place of Presentation
  米国ラスベガス
- Year and Date
  2016-07-25
- Related Report
  2015 Annual Research Report
- Int'l Joint Research
[Presentation] 近代書籍用OCRのための学習用特定フォントセットの自動生成手法2015
- Author(s)
  岩田彩, 上坂和美, 粟津妙華, 石川由羽, 高田雅美, 城和貴
- Organizer
  情報処理学会数理モデル化と問題解決研究会
- Place of Presentation
  北見工業大学
- Year and Date
  2015-09-22
- Related Report
  2015 Annual Research Report
[Presentation] An Effective and Interactive Training Data Collection Method for Early-Modern Japanese Printed Character Recognition2015
- Author(s)
  Kazumi Kosaka, Taeka Awazu , Yu Ishikawa , Masami Takata, and Kazuki Joe
- Organizer
  PDPTA2015 MPS workshop
- Place of Presentation
  米国ラスベガス
- Year and Date
  2015-07-27 – 2015-07-30
- Related Report
  2014 Annual Research Report
[Presentation] 近代デジタルライブラリーの自動テキスト化－ＯＣＲと共同校正の課題解決に向けた技術－2015
- Author(s)
  城和貴
- Organizer
  国立国会図書館主催講演会
- Place of Presentation
  国立国会図書館関西館
- Year and Date
  2015-03-05
- Related Report
  2014 Annual Research Report
- Invited

Implementation of supporting system and environment for auto-extracting texts from early-modern printed books

Principal Investigator

Joe Kazuki 奈良女子大学, 生活環境科学系, 教授 (90283928)

¥11,960,000 (Direct Cost: ¥9,200,000、Indirect Cost: ¥2,760,000)

Report

Research Products

[Journal Article] 近代書籍を対象とした多フォント漢字認識2016

Author(s)

Journal Title

NAID

Related Report

[Journal Article] Comparison of Feature Extraction Methods for Early-Modern Japanese Printed Character Recognition2016

Author(s)

Journal Title

Related Report

[Journal Article] 活字データの分類を用いた進化計算による近代書籍からのルビ除去2015

Author(s)

Journal Title

NAID

Related Report

[Presentation] デジタルコレクション 自動テキスト化への道2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Comparison of Feature Extraction Methods for Early-Modern Japanese Printed Character Recognition2016

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 近代書籍用OCRのための学習用特定フォントセットの自動生成手法2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] An Effective and Interactive Training Data Collection Method for Early-Modern Japanese Printed Character Recognition2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] 近代デジタルライブラリーの自動テキスト化－ＯＣＲと共同校正の課題解決に向けた技術－2015

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] デジタルコレクション　自動テキスト化への道2016