研究課題/領域番号 |
17K00457
|
研究機関 | 立命館大学 |
研究代表者 |
バトジャルガル ビルゲサイハン 立命館大学, 衣笠総合研究機構, 研究員 (30725396)
|
研究期間 (年度) |
2017-04-01 – 2021-03-31
|
キーワード | historical documents / traditional Mongolian / named entity extraction / deep learming / machine learning |
研究実績の概要 |
In this research, we propose a comprehensive information extraction and analysis method for digitized ancient Mongolian historical documents. The proposed method will recognize new features and patterns from historical manuscripts by utilizing deep learning techniques. In the FY2018, the following tasks have been mainly performed: 1. Defining some unique features of ancient Mongolian historical documents for the deep learning model: We have defined some features of ancient Mongolian historical documents in traditional Mongolian script could have higher weights in deep learning networks, which are: 1) suffixes that have some unique features and 2) end of a token - several final letters have some special features in traditional Mongolian script. 2. Building and training a deep learning model for ancient Mongolian historical documents: We were working to build a deep learning model for processing, classifying and analyzing digital texts and scanned images of ancient Mongolian historical documents at massive scale. Manually annotated training data and collected digital texts of ancient Mongolian manuscripts were utilized for recognizing features and patterns of ancient Mongolian linguistic grammar within manuscripts by employing deep learning networks.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
2: おおむね順調に進展している
理由
Some unique features of ancient Mongolian historical documents have been defined according to the research plan. As planned, unique features will allow advancing my research towards to the research goal in developing a comprehensive information extraction and analysis method to recognize new features and patterns from historical manuscripts by utilizing deep learning techniques. Continuous experiments were conducted to check the accuracy of the deep learning model. Deep learning models under consideration are: word vector representation, recursive neural network and convolutional neural network. Ongoing research results and achievements have been published in parts in a book chapter and an International conference paper.
|
今後の研究の推進方策 |
In the FY2019, user evaluations will be conducted by experts and humanities researchers, while improving the deep learning models for ancient Mongolian historical documents. The proposed system will be evaluated by 1) conducting experiments and calculating standard measures such as precision, recall and F-measure; and 2) user evaluations among experts and users who have tried the proposed system. We plan to conduct evaluations at the National University of Mongolia and Ritsumeikan University in Japan. Assistances of experts and students are necessary on a part-time basis. Continuous experiments will be conducted to improve the proposed methods. We will also carry out user evaluations by several experts. Feedback from the researchers will be received in a timely manner. Further improvements of the system will be done based on the evaluation results and user feedback. Research achievements and results will be presented at the domestic and international conferences. Development of the proposed method will also be continued.
|