2018 Fiscal Year Final Research Report
Recognizing Japanese brush script in image
Project/Area Number |
16K12545
|
Research Category |
Grant-in-Aid for Challenging Exploratory Research
|
Allocation Type | Multi-year Fund |
Research Field |
Library and information science/Humanistic social informatics
|
Research Institution | National Institute of Japanese Literature |
Principal Investigator |
Nomoto Tadashi 国文学研究資料館, 研究部, 准教授 (20321557)
|
Co-Investigator(Kenkyū-buntansha) |
相田 満 国文学研究資料館, 研究部, 准教授 (00249921)
|
Research Collaborator |
Terasawa Kengo
|
Project Period (FY) |
2016-04-01 – 2019-03-31
|
Keywords | くずし字 / 画像検索 |
Outline of Final Research Achievements |
The goal of the present work is to develop an approach that enables the recognition of Japanese bush scripts in image without resorting to OCRs or hand annotated labels. To this end, we considered three approaches: (1) an approach which maps a modern Kanji into a corresponding Kuzushi-ji, which comes in a variety of shapes and forms, and uses the latter to identify a Kuzushi-ji character we are interested in; (2) an alternative approach where we use a modern Kanji in place of Kuzushi-ji to do the identification; (3) finally, one which leverages CycleGan to generate a pseudo Kuzushi-ji which we use as a query to match against the image. The experiments found that the first method, one which relies on the mapping of a modern Kanji into a possible Kuzushi-ji performed significantly better than the rest, suggesting that the recognition of Kuzushi-ji character has benefitted greatly from the use of the mapping.
|
Free Research Field |
自然言語処理
|
Academic Significance and Societal Importance of the Research Achievements |
デジタル技術の発展に伴い国内の歴史的典籍が大量にデジタル化されアーカイブされている.それらのほとんどは画像形式で保存されているため,キーワードによる自由な検索ができず,コンテンツの再利用や知財化へ向けた取組みの大きな障壁になっている.手動あるいはOCRによる翻刻を用いた検索なども提案されているが実用の域に達していない.この点において本件は有用な貢献が期待できる.
|