Multilingual Knowledge Discovery in Digital Cultural Collections
Project/Area Number |
20K20135
|
Research Category |
Grant-in-Aid for Early-Career Scientists
|
Allocation Type | Multi-year Fund |
Review Section |
Basic Section 90020:Library and information science, humanistic and social informatics-related
|
Research Institution | Ritsumeikan University |
Principal Investigator |
SONG Yuting 立命館大学, 情報理工学部, 助教 (50849388)
|
Project Period (FY) |
2020-04-01 – 2022-03-31
|
Project Status |
Discontinued (Fiscal Year 2021)
|
Budget Amount *help |
¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000)
Fiscal Year 2022: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2021: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2020: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
|
Keywords | Entity matching / MT Evaluation / Entity recognition / Relation extraction / Word embeddings / MT evaluation / Metadata translation / Knowledge extraction / Cultural collections / Multilingual information |
Outline of Research at the Start |
Recently, many cultural institutions have been making their cultural collections accessible through their metadata. However, multilingual knowledge in digital collections is less considered for accessing these collections. This research aims to extract multilingual knowledge from metadata, including entities and object relations, by utilizing neural network based techniques of entity extraction and representation learning. The extracted knowledge can be applied to improve multilingual information access to digital cultural collections and help people understanding digital cultural objects.
|
Outline of Annual Research Achievements |
This year we focused on improving the method of cross-lingual entity matching and collecting datasets for machine translation evaluation. First, we proposed a novel method to identify records that refer to the same Japanese artwork entity in Japanese and English data sources. Our approach considered an entity as a sequence of attributes and employed a multilingual BERT-based network to enable cross-lingual entities to be compared without aligning the schema. In addition, we collected datasets and conducted further experiments to evaluate machine translations on translating ukiyo-e metadata records, especially the genre of bijin-e. In another work, we have investigated and evaluated the current state-of-the-art models to automatically discover entities and relations in short texts.
|
Report
(2 results)
Research Products
(6 results)