2014 Fiscal Year Research-status Report

Research on visualization and information extraction from ancient Mongolian historical documents

Research Project

Project/Area Number	26730166
Research Institution	Ritsumeikan University
Principal Investigator	バトジャルガルビルゲ立命館大学, 衣笠総合研究機構, 研究員 (30725396)
Project Period (FY)	2014-04-01 – 2017-03-31
Keywords	historical documents / traditional Mongolian / named entity extraction / digital library
Outline of Annual Research Achievements	In this research, we propose an information extraction method for digitized ancient Mongolian documents by utilizing an ancient-modern dictionary. In the FY2014, the following language resources have been prepared. 1. An ancient-modern (traditional Mongolian and Cyrillic Mongolian) dictionary and parallel corpora: A dictionary have been built by comparing the statistical information such as co-occurrence frequencies and word frequencies that had appeared both in modern and ancient parallel corpora of ancient Mongolian historical documents such as "The Altan Tobchi", "The Story of Asragch" and the "The Secret History of the Mongols”. 2. Annotated training data: Annotated training data have been prepared manually by utilizing a chronological book of ancient Mongolian kings and the Mongol Empire-"Altan tovch".
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason Useful language resources have been prepared according to the research plan. As planned, above language resources will allow advancing my research towards to the research goal in developing an automatic named entity extraction method by employing automated text mining techniques that aim to reduce the labor-intensive annotation on historical text. Several rules for information extraction and Named-entity recognition are partially prepared.
Strategy for Future Research Activity	In the FY2015, we will propose a named-entities extraction method for ancient Mongolian historical documents that will utilize ancient Mongolian linguistic grammar-based techniques along with a statistical model by employing text mining techniques. The following tasks will be implemented: 1) Extracting and tagging the named entities such as historical figures and place names in ancient Mongolian historical documents 2) Tagging the personal names including generational or dynastic information, an inherited or life-time title of nobility, or a traditional descriptive phrase or nick-names. Besides extracting the named-entities, the following tasks will be done in creating the digital representations of ancient Mongolian historical documents: 1. To encode contextual information for formalizing and representing explicit information about context. 2. To encode ancient words, which were misspelled or written differently than ancient orthography, along with their modern orthography while preserving the writing of original manuscripts. 3. To represent editorial markup, commentaries, alterations, revisions, corrections, transcriptions and interpretations. Moreover, continuous experiments will be conducted to improve the proposed methods.
Causes of Carryover	The remaining budget have occurred due to my maedaoshi (前倒し) application. When I applied for maedaoshi, I set an approximate amount since my actual expenditure was unclear. Moreover, the kaken-hi system allowed me select round figures only.
Expenditure Plan for Carryover Budget	I will use the remaining budget for next years research.

Research Products
(4 results)

All 2015 2014

All Presentation (4 results)

[Presentation] 言語が異なる浮世絵データベース間における同一作品の同定手法の提案2015
- Author(s)
  木村泰典, Biligsaikhan Batjargal, 木村文則, 前田亮
- Organizer
  第77回情報処理学会全国大会
- Place of Presentation
  京都大学(京都府)
- Year and Date
  2015-03-18
[Presentation] 人文系データベースの共同研究を管理するプラットフォーム構築について2015
- Author(s)
  山路正憲 and Biligsaikhan Batjargal
- Organizer
  第4回知識・芸術・文化情報学研究会
- Place of Presentation
  立命館大学梅田キャンパス(大阪府)
- Year and Date
  2015-02-07
[Presentation] Identifying the Same Records across multiple Ukiyo-e Image Databases Using Textual Data in Different Languages2014
- Author(s)
  Biligsaikhan Batjargal, Takeo Kuyama, Fuminori Kimura, and Akira Maeda
- Organizer
  Digital Libraries 2014: ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014) and International Conference on Theory and Practice of Digital Libraries (TPDL 2014)
- Place of Presentation
  London, U.K.
- Year and Date
  2014-09-10
[Presentation] An Approach to Named Entity Extraction from Historical Documents in Traditional Mongolian Script2014
- Author(s)
  Biligsaikhan Batjargal, Garmaabazar Khaltarkhuu, Fuminori Kimura, and Akira Maeda
- Organizer
  Digital Libraries 2014: ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014) and International Conference on Theory and Practice of Digital Libraries (TPDL 2014)
- Place of Presentation
  London, U.K.
- Year and Date
  2014-09-09

2014 Fiscal Year Research-status Report

Research on visualization and information extraction from ancient Mongolian historical documents

Principal Investigator

バトジャルガル ビルゲ 立命館大学, 衣笠総合研究機構, 研究員 (30725396)

Current Status of Research Progress

Reason

Research Products

[Presentation] 言語が異なる浮世絵データベース間における同一作品の同定手法の提案2015

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] 人文系データベースの共同研究を管理するプラットフォーム構築について2015

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Identifying the Same Records across multiple Ukiyo-e Image Databases Using Textual Data in Different Languages2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] An Approach to Named Entity Extraction from Historical Documents in Traditional Mongolian Script2014

Author(s)

Organizer

Place of Presentation

Year and Date

バトジャルガルビルゲ立命館大学, 衣笠総合研究機構, 研究員 (30725396)