Building an Error-Annotated Corpus of Learner Indonesian and Developing an Automated Writing Support for Japanese Students Using Deep Linguistic Indonesian Parsers

Research Project

Project/Area Number	23K12235
Research Category	Grant-in-Aid for Early-Career Scientists
Allocation Type	Multi-year Fund
Review Section	Basic Section 02100:Foreign language education-related
Research Institution	Kanda University of International Studies
Principal Investigator	MOELJADI David 神田外語大学, 外国語学部, 講師 (60928290)
Project Period (FY)	2023-04-01 – 2026-03-31
Project Status	Granted (Fiscal Year 2023)
Budget Amount *help	¥2,470,000 (Direct Cost: ¥1,900,000、Indirect Cost: ¥570,000) Fiscal Year 2025: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000) Fiscal Year 2024: ¥910,000 (Direct Cost: ¥700,000、Indirect Cost: ¥210,000) Fiscal Year 2023: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
Keywords	learner corpus / Indonesian language / language education / error annotation / feedback system
Outline of Research at the Start	An error-annotated learner corpus is a very useful source to know types and frequencies of mistakes made by foreign language learners. It can also be employed to develop a Computer Assisted Language Learning system which can provide accurate and immediate feedback. In this research, I focus on the Indonesian language writing skill of Japanese university students taking Indonesian language courses, particularly at Kanda University of International Studies, Tokyo University of Foreign Studies, and Ritsumeikan Asia Pacific University.
Outline of Annual Research Achievements	In 2023 I have gathered more than 1200 written assignments (essays) from more than 300 students (all students gave their consent). The students are from 6 universities: Kanda University of International Studies (KUIS), Tokyo University of Foreign Studies (TUFS), Ritsumeikan Asia Pacific University (APU), Sophia University, Chuo University, and Keio University. I have made and revised an error tagset which currently consists of 4 categories (lexical, grammatical, spelling, and other errors) and 48 error tags. As for the annotation software, I use UAM Corpus Tool version3. I employed 4 Japanese students from KUIS to input the data from the consent forms and to type the handwritten assignments. Four Indonesian teachers from KUIS, TUFS, and APU annotated the corpus.
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason Initially, my plan was to gather students' essays from 30 students from 3 universities in Japan (KUIS, TUFS, and APU). However, I managed to gather more than 1200 essays from more than 300 students from 6 universities. Because of the large amount of essays I gathered, the annotation process has not finished yet. At present approximately less than one fourth of the essays have been annotated and checked. In addition, I planned to release the annotated corpus in the first year, but because of the reason mentioned above, I am planning to do it after all the essays have been annotated and checked.
Strategy for Future Research Activity	During my presentation in a research meeting at TUFS, I received some feedbacks from Malay/Indonesian lecturers and experts. They suggested me to focus on building the learner corpus for 3 years instead of building it for only one year and spend the next two years to develop an automated writing support for students. Building a learner corpus is time consuming and labor consuming. However, it is very important not only for language teaching but also for grammar research and other research purposes. Building a useful and good quality of data source (corpus) is already a big project. Thus, I would like to continue gathering more essays from students and, at the same time, annotating the errors in the essays.

Report

(1 results)

2023 Research-status Report

Research Products
(6 results)

All 2023

All Journal Article (2 results) (of which Int'l Joint Research: 1 results, Peer Reviewed: 2 results, Open Access: 2 results) Presentation (4 results) (of which Int'l Joint Research: 3 results)

[Journal Article] Penyusunan KOPER: Korpus Pemelajar Bahasa Indonesia Beranotasi Eror2023
- Author(s)
  David Moeljadi
- Journal Title
  
  Prosiding Kongres Bahasa Indonesia XII
  
  Volume: 1 Pages: 429-444
- Related Report
  2023 Research-status Report
- Peer Reviewed / Open Access
[Journal Article] NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages2023
- Author(s)
  Winata Genta Indra、Aji Alham Fikri、Cahyawijaya Samuel、Mahendra Rahmad、Koto Fajri、Romadhony Ade、Kurniawan Kemal、Moeljadi David、Prasojo Radityo Eko、Fung Pascale、Baldwin Timothy、Lau Jey Han、Sennrich Rico、Ruder Sebastian
- Journal Title
  
  Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
  
  Volume: 1 Pages: 815-834
- DOI
  10.18653/v1/2023.eacl-main.57
- Related Report
  2023 Research-status Report
- Peer Reviewed / Open Access / Int'l Joint Research
[Presentation] Penyusunan Koper: Korpus Pemelajar Bahasa Indonesia Beranotasi Eror2023
- Author(s)
  David Moeljadi
- Organizer
  Kongres Bahasa Indonesia XII
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] エラータグ付きインドネシア語学習者コーパスの構築2023
- Author(s)
  David Moeljadi
- Organizer
  日本インドネシア学会第 54回研究大会
- Related Report
  2023 Research-status Report
[Presentation] A study of morphology of onomatopoeias in Indonesian2023
- Author(s)
  David Moeljadi
- Organizer
  The 26th International Symposium on Malay/Indonesian Linguistics (ISMIL)
- Related Report
  2023 Research-status Report
- Int'l Joint Research
[Presentation] Building the Old Javanese Wordnet2023
- Author(s)
  David Moeljadi
- Organizer
  International Kawi Culture Festival
- Related Report
  2023 Research-status Report
- Int'l Joint Research

Building an Error-Annotated Corpus of Learner Indonesian and Developing an Automated Writing Support for Japanese Students Using Deep Linguistic Indonesian Parsers

Principal Investigator

MOELJADI David 神田外語大学, 外国語学部, 講師 (60928290)

¥2,470,000 (Direct Cost: ¥1,900,000、Indirect Cost: ¥570,000)

Current Status of Research Progress

Reason

Report

Research Products

[Journal Article] Penyusunan KOPER: Korpus Pemelajar Bahasa Indonesia Beranotasi Eror2023

Author(s)

Journal Title

Related Report

[Journal Article] NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages2023

Author(s)

Journal Title

DOI

Related Report

[Presentation] Penyusunan Koper: Korpus Pemelajar Bahasa Indonesia Beranotasi Eror2023

Author(s)

Organizer

Related Report

[Presentation] エラータグ付きインドネシア語学習者コーパスの構築2023

Author(s)

Organizer

Related Report

[Presentation] A study of morphology of onomatopoeias in Indonesian2023

Author(s)

Organizer

Related Report

[Presentation] Building the Old Javanese Wordnet2023

Author(s)

Organizer

Related Report