2019 Fiscal Year Research-status Report
Research on Knowledge Extraction from Ancient Mongolian Historical Documents using Deep Learning
Project/Area Number |
17K00457
|
Research Institution | Ritsumeikan University |
Principal Investigator |
バトジャルガル ビルゲサイハン 立命館大学, 衣笠総合研究機構, 研究員 (30725396)
|
Project Period (FY) |
2017-04-01 – 2021-03-31
|
Keywords | historical documents / traditional Mongolian / named entity extraction / deep learming / machine learning |
Outline of Annual Research Achievements |
In the FY 2019, the following tasks have been mainly performed: 1) developing a deep learning network and training that deep learning model, and 2) conducting intermediate experiments. Some unique features of ancient Mongolian historical documents have been defined. Current outputs of the deep learning network over ancient Mongolian historical documents are: 1) possible alterations, insertions and corrections in the manuscript, 2) predictions to distinguish different letters that have the same shape, and 3) predictions of ancient words, which were misspelled or written differently than ancient orthography, mistakes, missing parts or letters in manuscripts. Moreover, continues experiments were conducted to check the accuracy of the deep learning model.
|
Current Status of Research Progress |
Current Status of Research Progress
3: Progress in research has been slightly delayed.
Reason
The planned user evaluations that were expected to be conducted by the experts and humanities researchers both in Japan and Mongolia were delayed significantly and the results and feedbacks were not obtained as planned due to Novel Coronavirus (2019-nCoV) spreads. Travel restrictions on the rapid spread of coronavirus disease 2019 (COVID-19), the entry prohibition to the University, and inaccessibility to research facilities were slowing down this research.
|
Strategy for Future Research Activity |
In the FY2020, delayed user evaluations will be conducted by experts and humanities researchers. The proposed system will be evaluated by 1) conducting experiments and calculating standard measures such as precision, recall and F-measure; and 2) user evaluations among experts and users who have tried the proposed system. We plan to conduct evaluations at the National University of Mongolia and Ritsumeikan University in Japan. Assistances of experts and students are necessary on a part-time basis. Continuous experiments will be conducted to improve the proposed methods. We will also carry out user evaluations by several experts. Feedback from the researchers will be received in a timely manner. Further improvements of the system will be done based on the evaluation results and user feedback. Research achievements and results will be presented at the domestic and international conferences. Development of the proposed method will also be continued. In the FY2020, based on the research results obtained in the previous years, we will develop a web-based system and make it available on the Internet. The extracted deep learning results will be utilized for building digital text representations of manuscripts. Moreover, we want to apply the method to other domains or historical documents of different genres and times in different languages, since some recent deep learning models don’t require some NLP subtasks such as morphological analysis or syntactic analysis in certain languages.
|
Causes of Carryover |
The remaining budget have occurred due to restrictions and bans of the coronavirus disease 2019 (COVID-19) spreads.
I will use the remaining budget for next year’s research for 1) conducting surveys and evaluations among overseas users, and 2) obtaining analyses and feedbacks from face-to-face meetings, that were delayed due to COVID19.
|
Research Products
(7 results)