• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2017 Fiscal Year Final Research Report

Improvement of Modern Document Textualization System with Integrated Use of Letter Shape Information and Language Model

Research Project

  • PDF
Project/Area Number 26730161
Research Category

Grant-in-Aid for Young Scientists (B)

Allocation TypeMulti-year Fund
Research Field Library and information science/Humanistic social informatics
Research InstitutionThe University of Tokyo

Principal Investigator

Masuda Katsuya  東京大学, 大学総合教育研究センター, 特任助教 (20512114)

Project Period (FY) 2014-04-01 – 2018-03-31
KeywordsOCR / デジタルテキスト化 / 誤り訂正 / 自然言語処理 / デジタルアーカイブ / 近代書籍
Outline of Final Research Achievements

In this research, we have developed an OCR error correction system with the aim to improve the accuracy of digitization of modern documents. We have constructed language resources of modern documents for evaluation of our system and construction of language model for modern documents. We have constructed an error correction system consist of three part, OCR error detection, candidate character generation and selection of a character from candidates. In each part, we use both letter shape information and language model to detect error or to generate candidates. We confirmed that feedback of OCR error correction to the OCR system leads to an improvement of accuracy of the OCR system.

Free Research Field

自然言語処理

URL: 

Published: 2019-03-29  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi