2023 Fiscal Year Research-status Report
Building an Error-Annotated Corpus of Learner Indonesian and Developing an Automated Writing Support for Japanese Students Using Deep Linguistic Indonesian Parsers
Project/Area Number |
23K12235
|
Research Institution | Kanda University of International Studies |
Principal Investigator |
MOELJADI David 神田外語大学, 外国語学部, 講師 (60928290)
|
Project Period (FY) |
2023-04-01 – 2026-03-31
|
Keywords | learner corpus / Indonesian language / language education / error annotation |
Outline of Annual Research Achievements |
In 2023 I have gathered more than 1200 written assignments (essays) from more than 300 students (all students gave their consent). The students are from 6 universities: Kanda University of International Studies (KUIS), Tokyo University of Foreign Studies (TUFS), Ritsumeikan Asia Pacific University (APU), Sophia University, Chuo University, and Keio University. I have made and revised an error tagset which currently consists of 4 categories (lexical, grammatical, spelling, and other errors) and 48 error tags. As for the annotation software, I use UAM Corpus Tool version3. I employed 4 Japanese students from KUIS to input the data from the consent forms and to type the handwritten assignments. Four Indonesian teachers from KUIS, TUFS, and APU annotated the corpus.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
Initially, my plan was to gather students' essays from 30 students from 3 universities in Japan (KUIS, TUFS, and APU). However, I managed to gather more than 1200 essays from more than 300 students from 6 universities. Because of the large amount of essays I gathered, the annotation process has not finished yet. At present approximately less than one fourth of the essays have been annotated and checked. In addition, I planned to release the annotated corpus in the first year, but because of the reason mentioned above, I am planning to do it after all the essays have been annotated and checked.
|
Strategy for Future Research Activity |
During my presentation in a research meeting at TUFS, I received some feedbacks from Malay/Indonesian lecturers and experts. They suggested me to focus on building the learner corpus for 3 years instead of building it for only one year and spend the next two years to develop an automated writing support for students. Building a learner corpus is time consuming and labor consuming. However, it is very important not only for language teaching but also for grammar research and other research purposes. Building a useful and good quality of data source (corpus) is already a big project. Thus, I would like to continue gathering more essays from students and, at the same time, annotating the errors in the essays.
|
Causes of Carryover |
I will use the money this year for joining international and domestic conferences (travel expenses) which I scheduled in the previous year (2023) and paying annotators (personnel expenditure and remuneration)
|
Research Products
(6 results)
-
-
[Journal Article] NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages2023
Author(s)
Winata Genta Indra、Aji Alham Fikri、Cahyawijaya Samuel、Mahendra Rahmad、Koto Fajri、Romadhony Ade、Kurniawan Kemal、Moeljadi David、Prasojo Radityo Eko、Fung Pascale、Baldwin Timothy、Lau Jey Han、Sennrich Rico、Ruder Sebastian
-
Journal Title
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Volume: 1
Pages: 815-834
DOI
Peer Reviewed / Open Access / Int'l Joint Research
-
-
-
-