研究実績の概要 |
In this fiscal year, in order to propose information extraction methods for digitized ancient Mongolian documents, we have proposed and demonstrated a named entity extraction method for digitized ancient Mongolian historical documents by using the features of traditional Mongolian script and languages resources such as an ancient-modern dictionary. Named entities such as personal names and place names were extracted by employing machine learning techniques that reduce the labor-intensive analysis on historical text. The Text Encoding Initiative (TEI) guidelines are applied to digital text representations of ancient Mongolian historical documents. We have developed a web-based system to visualize historical figures and ancient place names, as well as to make TEI encoded text and the scanned images of manuscripts available on the Internet. Research results and achievements have been published in parts in several International conference papers.
|