• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2014 Fiscal Year Research-status Report

Research on visualization and information extraction from ancient Mongolian historical documents

Research Project

Project/Area Number 26730166
Research InstitutionRitsumeikan University

Principal Investigator

バトジャルガル ビルゲ  立命館大学, 衣笠総合研究機構, 研究員 (30725396)

Project Period (FY) 2014-04-01 – 2017-03-31
Keywordshistorical documents / traditional Mongolian / named entity extraction / digital library
Outline of Annual Research Achievements

In this research, we propose an information extraction method for digitized ancient Mongolian documents by utilizing an ancient-modern dictionary. In the FY2014, the following language resources have been prepared.

1. An ancient-modern (traditional Mongolian and Cyrillic Mongolian) dictionary and parallel corpora:
A dictionary have been built by comparing the statistical information such as co-occurrence frequencies and word frequencies that had appeared both in modern and ancient parallel corpora of ancient Mongolian historical documents such as "The Altan Tobchi", "The Story of Asragch" and the "The Secret History of the Mongols”.
2. Annotated training data:
Annotated training data have been prepared manually by utilizing a chronological book of ancient Mongolian kings and the Mongol Empire-"Altan tovch".

Current Status of Research Progress
Current Status of Research Progress

2: Research has progressed on the whole more than it was originally planned.

Reason

Useful language resources have been prepared according to the research plan. As planned, above language resources will allow advancing my research towards to the research goal in developing an automatic named entity extraction method by employing automated text mining techniques that aim to reduce the labor-intensive annotation on historical text.

Several rules for information extraction and Named-entity recognition are partially prepared.

Strategy for Future Research Activity

In the FY2015, we will propose a named-entities extraction method for ancient Mongolian historical documents that will utilize ancient Mongolian linguistic grammar-based techniques along with a statistical model by employing text mining techniques. The following tasks will be implemented: 1) Extracting and tagging the named entities such as historical figures and place names in ancient Mongolian historical documents 2) Tagging the personal names including generational or dynastic information, an inherited or life-time title of nobility, or a traditional descriptive phrase or nick-names.
Besides extracting the named-entities, the following tasks will be done in creating the digital representations of ancient Mongolian historical documents:
1. To encode contextual information for formalizing and representing explicit information about context.
2. To encode ancient words, which were misspelled or written differently than ancient orthography, along with their modern orthography while preserving the writing of original manuscripts.
3. To represent editorial markup, commentaries, alterations, revisions, corrections, transcriptions and interpretations.
Moreover, continuous experiments will be conducted to improve the proposed methods.

Causes of Carryover

The remaining budget have occurred due to my maedaoshi (前倒し) application. When I applied for maedaoshi, I set an approximate amount since my actual expenditure was unclear. Moreover, the kaken-hi system allowed me select round figures only.

Expenditure Plan for Carryover Budget

I will use the remaining budget for next years research.

  • Research Products

    (4 results)

All 2015 2014

All Presentation (4 results)

  • [Presentation] 言語が異なる浮世絵データベース間における同一作品の同定手法の提案2015

    • Author(s)
      木村 泰典, Biligsaikhan Batjargal, 木村 文則, 前田 亮
    • Organizer
      第77回情報処理学会全国大会
    • Place of Presentation
      京都大学(京都府)
    • Year and Date
      2015-03-18
  • [Presentation] 人文系データベースの共同研究を管理するプラットフォーム構築について2015

    • Author(s)
      山路正憲 and Biligsaikhan Batjargal
    • Organizer
      第4回 知識・芸術・文化情報学研究会
    • Place of Presentation
      立命館大学梅田キャンパス(大阪府)
    • Year and Date
      2015-02-07
  • [Presentation] Identifying the Same Records across multiple Ukiyo-e Image Databases Using Textual Data in Different Languages2014

    • Author(s)
      Biligsaikhan Batjargal, Takeo Kuyama, Fuminori Kimura, and Akira Maeda
    • Organizer
      Digital Libraries 2014: ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014) and International Conference on Theory and Practice of Digital Libraries (TPDL 2014)
    • Place of Presentation
      London, U.K.
    • Year and Date
      2014-09-10
  • [Presentation] An Approach to Named Entity Extraction from Historical Documents in Traditional Mongolian Script2014

    • Author(s)
      Biligsaikhan Batjargal, Garmaabazar Khaltarkhuu, Fuminori Kimura, and Akira Maeda
    • Organizer
      Digital Libraries 2014: ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014) and International Conference on Theory and Practice of Digital Libraries (TPDL 2014)
    • Place of Presentation
      London, U.K.
    • Year and Date
      2014-09-09

URL: 

Published: 2016-06-01  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi