2016 Fiscal Year Annual Research Report
Research on visualization and information extraction from ancient Mongolian historical documents
Project/Area Number |
26730166
|
Research Institution | Ritsumeikan University |
Principal Investigator |
バトジャルガル ビルゲ 立命館大学, 総合科学技術研究機構, 研究員 (30725396)
|
Project Period (FY) |
2014-04-01 – 2017-03-31
|
Keywords | historical documents / traditional Mongolian / named entity extraction / digital library / machine learning |
Outline of Annual Research Achievements |
In this fiscal year, in order to propose information extraction methods for digitized ancient Mongolian documents, we have proposed and demonstrated a named entity extraction method for digitized ancient Mongolian historical documents by using the features of traditional Mongolian script and languages resources such as an ancient-modern dictionary. Named entities such as personal names and place names were extracted by employing machine learning techniques that reduce the labor-intensive analysis on historical text. The Text Encoding Initiative (TEI) guidelines are applied to digital text representations of ancient Mongolian historical documents. We have developed a web-based system to visualize historical figures and ancient place names, as well as to make TEI encoded text and the scanned images of manuscripts available on the Internet. Research results and achievements have been published in parts in several International conference papers.
|