2014 Fiscal Year Research-status Report
Research on visualization and information extraction from ancient Mongolian historical documents
Project/Area Number |
26730166
|
Research Institution | Ritsumeikan University |
Principal Investigator |
バトジャルガル ビルゲ 立命館大学, 衣笠総合研究機構, 研究員 (30725396)
|
Project Period (FY) |
2014-04-01 – 2017-03-31
|
Keywords | historical documents / traditional Mongolian / named entity extraction / digital library |
Outline of Annual Research Achievements |
In this research, we propose an information extraction method for digitized ancient Mongolian documents by utilizing an ancient-modern dictionary. In the FY2014, the following language resources have been prepared.
1. An ancient-modern (traditional Mongolian and Cyrillic Mongolian) dictionary and parallel corpora: A dictionary have been built by comparing the statistical information such as co-occurrence frequencies and word frequencies that had appeared both in modern and ancient parallel corpora of ancient Mongolian historical documents such as "The Altan Tobchi", "The Story of Asragch" and the "The Secret History of the Mongols”. 2. Annotated training data: Annotated training data have been prepared manually by utilizing a chronological book of ancient Mongolian kings and the Mongol Empire-"Altan tovch".
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
Useful language resources have been prepared according to the research plan. As planned, above language resources will allow advancing my research towards to the research goal in developing an automatic named entity extraction method by employing automated text mining techniques that aim to reduce the labor-intensive annotation on historical text.
Several rules for information extraction and Named-entity recognition are partially prepared.
|
Strategy for Future Research Activity |
In the FY2015, we will propose a named-entities extraction method for ancient Mongolian historical documents that will utilize ancient Mongolian linguistic grammar-based techniques along with a statistical model by employing text mining techniques. The following tasks will be implemented: 1) Extracting and tagging the named entities such as historical figures and place names in ancient Mongolian historical documents 2) Tagging the personal names including generational or dynastic information, an inherited or life-time title of nobility, or a traditional descriptive phrase or nick-names. Besides extracting the named-entities, the following tasks will be done in creating the digital representations of ancient Mongolian historical documents: 1. To encode contextual information for formalizing and representing explicit information about context. 2. To encode ancient words, which were misspelled or written differently than ancient orthography, along with their modern orthography while preserving the writing of original manuscripts. 3. To represent editorial markup, commentaries, alterations, revisions, corrections, transcriptions and interpretations. Moreover, continuous experiments will be conducted to improve the proposed methods.
|
Causes of Carryover |
The remaining budget have occurred due to my maedaoshi (前倒し) application. When I applied for maedaoshi, I set an approximate amount since my actual expenditure was unclear. Moreover, the kaken-hi system allowed me select round figures only.
|
Expenditure Plan for Carryover Budget |
I will use the remaining budget for next years research.
|