• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Improvement of Modern Document Textualization System with Integrated Use of Letter Shape Information and Language Model

Research Project

Project/Area Number 26730161
Research Category

Grant-in-Aid for Young Scientists (B)

Allocation TypeMulti-year Fund
Research Field Library and information science/Humanistic social informatics
Research InstitutionThe University of Tokyo

Principal Investigator

Masuda Katsuya  東京大学, 大学総合教育研究センター, 特任助教 (20512114)

Project Period (FY) 2014-04-01 – 2018-03-31
Project Status Completed (Fiscal Year 2017)
Budget Amount *help
¥3,380,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥780,000)
Fiscal Year 2016: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Fiscal Year 2015: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2014: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
KeywordsOCR / デジタルテキスト化 / 誤り訂正 / 自然言語処理 / デジタルアーカイブ / 近代書籍 / デジタルヒューマニティーズ
Outline of Final Research Achievements

In this research, we have developed an OCR error correction system with the aim to improve the accuracy of digitization of modern documents. We have constructed language resources of modern documents for evaluation of our system and construction of language model for modern documents. We have constructed an error correction system consist of three part, OCR error detection, candidate character generation and selection of a character from candidates. In each part, we use both letter shape information and language model to detect error or to generate candidates. We confirmed that feedback of OCR error correction to the OCR system leads to an improvement of accuracy of the OCR system.

Report

(5 results)
  • 2017 Annual Research Report   Final Research Report ( PDF )
  • 2016 Research-status Report
  • 2015 Research-status Report
  • 2014 Research-status Report
  • Research Products

    (3 results)

All 2016 2015

All Journal Article (1 results) (of which Peer Reviewed: 1 results,  Open Access: 1 results) Presentation (2 results)

  • [Journal Article] Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization2015

    • Author(s)
      Katsuya Masuda, Makoto Tanji, Hideki Mima
    • Journal Title

      Journal of the Japanese Association for Digital Humanities

      Volume: 1 Issue: 1 Pages: 37-43

    • DOI

      10.17928/jjadh.1.1_37

    • NAID

      130005096576

    • ISSN
      2188-7276
    • Related Report
      2015 Research-status Report
    • Peer Reviewed / Open Access
  • [Presentation] 言語情報と字形情報を用いた近代書籍に対するOCR誤り訂正2016

    • Author(s)
      増田勝也
    • Organizer
      人文科学とコンピュータ(じんもんこん)シンポジウム2016
    • Place of Presentation
      国立国語研究所(東京都立川市)
    • Year and Date
      2016-12-10
    • Related Report
      2016 Research-status Report
  • [Presentation] 大域的情報を用いたOCR文字誤り訂正2015

    • Author(s)
      増田勝也
    • Organizer
      言語処理学会第21回年次大会
    • Place of Presentation
      京都大学(京都府京都市)
    • Year and Date
      2015-03-17
    • Related Report
      2014 Research-status Report

URL: 

Published: 2014-04-04   Modified: 2019-03-29  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi