2020 Fiscal Year Research-status Report
Multilingual Knowledge Discovery in Digital Cultural Collections
Project/Area Number |
20K20135
|
Research Institution | Ritsumeikan University |
Principal Investigator |
SONG Yuting 立命館大学, 情報理工学部, 助教 (50849388)
|
Project Period (FY) |
2020-04-01 – 2023-03-31
|
Keywords | Word embeddings / MT evaluation / Metadata translation / Entity recognition / Relation extraction |
Outline of Annual Research Achievements |
This year we focused on improving bilingual word embeddings models and collecting datasets of metadata records. First, we proposed a method to improve the accuracy of Japanese-English bilingual word embeddings. Second, we did preliminary attempts to evaluate machine translations on translating ukiyo-e metadata records from Japanese to English. In addition, in order to conduct further experiments, we collected English human translations of Japanese ukiyo-e metadata records by using a crowdsourcing platform. Moreover, the machine translations of ukiyo-e metadata records were evaluated by both Japanese and English native speakers through a crowdsourcing platform (Lancers). Overall, the project has been smoothly conducted step by step according to the research proposal.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
The project progress is going smoothly as planned. We have proposed a method to improve Japanese-English word embedding. Besides, we have evaluated the performance of online machine translation systems (i.e., Google Translator, Microsoft Translate, DeepL Translator) on translating Japanese ukiyo-e metadata to English. In addition, we have collected Japanese-English metadata records for future research. What's more, we have investigated the current neural network based models of entity and relation extraction, which can be applied to the dataset of ukiyo-e metadata in the next year.
|
Strategy for Future Research Activity |
For future work, we will focus on developing neural network based methods for learning multilingual representations of metadata and extracting named entities from Japanese and English textual metadata in cultural collections. We will also manually annotated named entities in metadata records, which are essential for training and evaluating entity extraction models.
|
Causes of Carryover |
We will use the budget to purchase hardware such as GPUs to be able to conduct research based on deep neural networks. Besides, some funds will be spent on crowdsourcing jobs for data annotations. Finally, we will attend the conferences to disseminate research results.
|
Research Products
(5 results)