• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2018 Fiscal Year Research-status Report

Research on Knowledge Extraction from Ancient Mongolian Historical Documents using Deep Learning

Research Project

Project/Area Number 17K00457
Research InstitutionRitsumeikan University

Principal Investigator

バトジャルガル ビルゲサイハン  立命館大学, 衣笠総合研究機構, 研究員 (30725396)

Project Period (FY) 2017-04-01 – 2021-03-31
Keywordshistorical documents / traditional Mongolian / named entity extraction / deep learming / machine learning
Outline of Annual Research Achievements

In this research, we propose a comprehensive information extraction and analysis method for digitized ancient Mongolian historical documents. The proposed method will recognize new features and patterns from historical manuscripts by utilizing deep learning techniques.
In the FY2018, the following tasks have been mainly performed:
1. Defining some unique features of ancient Mongolian historical documents for the deep learning model:
We have defined some features of ancient Mongolian historical documents in traditional Mongolian script could have higher weights in deep learning networks, which are: 1) suffixes that have some unique features and 2) end of a token - several final letters have some special features in traditional Mongolian script.
2. Building and training a deep learning model for ancient Mongolian historical documents:
We were working to build a deep learning model for processing, classifying and analyzing digital texts and scanned images of ancient Mongolian historical documents at massive scale. Manually annotated training data and collected digital texts of ancient Mongolian manuscripts were utilized for recognizing features and patterns of ancient Mongolian linguistic grammar within manuscripts by employing deep learning networks.

Current Status of Research Progress
Current Status of Research Progress

2: Research has progressed on the whole more than it was originally planned.

Reason

Some unique features of ancient Mongolian historical documents have been defined according to the research plan. As planned, unique features will allow advancing my research towards to the research goal in developing a comprehensive information extraction and analysis method to recognize new features and patterns from historical manuscripts by utilizing deep learning techniques. Continuous experiments were conducted to check the accuracy of the deep learning model. Deep learning models under consideration are: word vector representation, recursive neural network and convolutional neural network. Ongoing research results and achievements have been published in parts in a book chapter and an International conference paper.

Strategy for Future Research Activity

In the FY2019, user evaluations will be conducted by experts and humanities researchers, while improving the deep learning models for ancient Mongolian historical documents.
The proposed system will be evaluated by 1) conducting experiments and calculating standard measures such as precision, recall and F-measure; and 2) user evaluations among experts and users who have tried the proposed system. We plan to conduct evaluations at the National University of Mongolia and Ritsumeikan University in Japan. Assistances of experts and students are necessary on a part-time basis. Continuous experiments will be conducted to improve the proposed methods. We will also carry out user evaluations by several experts. Feedback from the researchers will be received in a timely manner. Further improvements of the system will be done based on the evaluation results and user feedback. Research achievements and results will be presented at the domestic and international conferences. Development of the proposed method will also be continued.

  • Research Products

    (12 results)

All 2019 2018

All Journal Article (1 results) (of which Peer Reviewed: 1 results,  Open Access: 1 results) Presentation (10 results) (of which Int'l Joint Research: 3 results) Book (1 results)

  • [Journal Article] Cross-Language Record Linkage based on Semantic Matching of Metadata2019

    • Author(s)
      Yuting SONG, Biligsaikhan BATJARGAL, Akira MAEDA
    • Journal Title

      日本データベース学会英文論文誌(DBSJ Journal)

      Volume: 17巻1号 Pages: 1-18

    • Peer Reviewed / Open Access
  • [Presentation] Metadata Similarity Calculation in Cross-Language Record Linkage based on Cross-lingual Embedding Models2019

    • Author(s)
      Yuting Song, Biligsaikhan Batjargal, Akira Maeda
    • Organizer
      第11回データ工学と情報マネジメントに関するフォーラム (第17回日本データベース学会年次大会)
  • [Presentation] 古代文字検索のためのフォントからの字形特徴量の抽出および活用可能性の検討2019

    • Author(s)
      李 康穎, Biligsaikhan Batjargal, 前田 亮
    • Organizer
      第11回データ工学と情報マネジメントに関するフォーラム (第17回日本データベース学会年次大会)
  • [Presentation] 浮世絵ディジタルアーカイブのための分散表現による作品の関連性に基づいた推薦システム2019

    • Author(s)
      王 嘉韻, Biligsaikhan Batjargal, 前田 亮, 川越 恭二
    • Organizer
      第11回データ工学と情報マネジメントに関するフォーラム (第17回日本データベース学会年次大会)
  • [Presentation] Creating a Digital Edition of Ancient Mongolian Historical Documents2018

    • Author(s)
      Biligsaikhan Batjargal, Garmaabazar Khaltarkhuu, and Akira Maeda
    • Organizer
      Digital Humanities 2018
    • Int'l Joint Research
  • [Presentation] 専門性の深化を目的とした人文系大規模データベースの構築 -ポータルデータベースと横断検索システムによる世界規模の所蔵品検索・閲覧システム-2018

    • Author(s)
      Biligsaikhan Batjargal
    • Organizer
      国際シンポジウム「デジタル時代における人文学の学術基盤をめぐって」
  • [Presentation] 伝統的モンゴル文字で書かれた歴史書類のデジタル版の作成2018

    • Author(s)
      Biligsaikhan Batjargal
    • Organizer
      第52回 ARCセミナー
  • [Presentation] 古代文字のディジタル化とその活用の可能性2018

    • Author(s)
      前田 亮, バトジャルガル ビルゲサイハン, 李 康穎
    • Organizer
      2018年度日本古文書学会大会「古文書学への招待―ひらかれる研究の窓」
  • [Presentation] Ownership Stamp Character Recognition System Based on Ancient Character Typeface2018

    • Author(s)
      Kangying Li, Biligsaikhan Batjargal, and Akira Maeda
    • Organizer
      The 20th International Conference on Asia-Pacific Digital Libraries (ICADL2018)
    • Int'l Joint Research
  • [Presentation] A Recommender System in Ukiyo-e Digital Archive for Japanese Art Novices2018

    • Author(s)
      Jiayun Wang, Biligsaikhan Batjargal, Akira Maeda, and Kyoji Kawagoe
    • Organizer
      The 20th International Conference on Asia-Pacific Digital Libraries (ICADL2018)
    • Int'l Joint Research
  • [Presentation] 古代文字フォント字形の特徴抽出に基づく蔵書印の検索支援2018

    • Author(s)
      李 康穎, Biligsaikhan Batjargal, 前田 亮
    • Organizer
      人文科学とコンピュータシンポジウム(じんもんこん2018)
  • [Book] Cross-Lingual and Cross-Chronological Information Access to Multilingual Historical Documents. In Sammy Beban Chumbow, Editor, Multilingualism and Bilingualism2018

    • Author(s)
      Biligsaikhan Batjargal
    • Total Pages
      174
    • Publisher
      IntechOpen
    • ISBN
      978-1-78923-226-4

URL: 

Published: 2019-12-27  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi