Improving Measures of Lexical Diversity and Multi-word Expressions for Japanese EFL Learners

研究課題

研究課題/領域番号	23K00597
研究種目	基盤研究(C)
配分区分	基金
応募区分	一般
審査区分	小区分02080:英語学関連
研究機関	京都産業大学
研究代表者	ブルックスギャビン京都産業大学, 外国語学部, 講師 (10610818)
研究分担者	Jordan Jennifer 関西学院大学, 総合政策学部, 専任講師 (00469264) Higginbotham George 叡啓大学, ソーシャルシステムデザイン学部, 准教授 (20885090) CLENTON JONATHAN 広島大学, 人間社会科学研究科(総), 准教授 (80762434)
研究期間 (年度)	2023-04-01 – 2026-03-31
研究課題ステータス	交付 (2023年度)
配分額 *注記	4,680千円 (直接経費: 3,600千円、間接経費: 1,080千円) 2025年度: 2,210千円 (直接経費: 1,700千円、間接経費: 510千円) 2024年度: 1,430千円 (直接経費: 1,100千円、間接経費: 330千円) 2023年度: 1,040千円 (直接経費: 800千円、間接経費: 240千円)
キーワード	lexical diversity / multi-word expressions / NLP / learner corpora / Lexical Diversity / Multi-word Expressions / Corpus Linguistics
研究開始時の研究の概要	This project attempts to address the gap that exists with the accuracy of part-of-speech taggers that have been trained and validated on L1 language corpora when they are used on learner corpora produced by L1 Japanese learners of English. In order to do this, we will create a tagged corpus that consists of written essays, transcribed discussions, and transcribed presentations from a cohort of L1 Japanese university students. This will be used to create a POS model that will then be tested on a similar corpus of texts. The newly created POS tagger will then be made available for public use.
研究実績の概要	Following the research plan, this year I began to collect and analyze the data necessary for this project. I set up the two corpora (one of L1 Japanese English Language Learners and one with a diverse collection of L1 language backgrounds). I also performed a preliminary analysis to see how well existing NLP and LLM libraries would deal with L2 learner data and presented about these findings. In doing so I identified some areas where existing packages struggle with L2 spoken and written texts. I presented about these findings at a number of conferences. This will allow me achieve the purpose of the research by helping to identify the shortcomings with existing packages so that I can begin to address these issues in the next stage of the project.
現在までの達成度 (区分)	現在までの達成度 (区分) 2: おおむね順調に進展している理由 As stated above, this semester I was able to meet most of the goals set out in the initial proposal. I was able to organize the two corpora into a format that will allow them to be analyzed effectively in the next stage of the project. There were two items that were slightly more difficult than expected. First of all, the transcription of spoken texts was slower and slightly more expensive than originally anticipated. This resulted in a smaller corpus than I was initially anticipating for the spoken texts. However, in the preliminary analysis, this did not seem to affect my ability to use the corpus to analyze the effectiveness of the tools. The other issue was with the POS tagger as I was not able to get the GUI from the previous application to work with the additional features. This meant things had to be manually coded, which made it difficult to find RAs who were able to do assist with this part of the project. This year, I hope to be able to rewrite the GUI so that it works with the additional features that are necessary for this project.
今後の研究の推進方策	This year I intend to finish updating the POS tagger and begin to validate it on four test corpora, two spoken and two written. If necessary, I will also continue to add to the spoken corpus and get more texts transcribed for the purpose of analysis. One possible method for doing this that I intend to investigate is the use of a revised version of Whisper that has been updated to improve its performance on L2 speaker texts in order to be able to increase the number of spoken texts in the corpus. While it will still be necessary to check and clean the resulting texts, having an RA do this will be faster and more economical than hiring a transcriber to complete the process. After this has been completed, my goal is to use the updated tagger to replicate three existing studies involving lexical diversity and the use of multi-word expressions. The first two of these will look at lexical diversity and the final one will examine MWE usage over time. My hope is to complete and present on these studies by the end of the year and have them submitted for publication. After I have tested the updated tagger, I will begin to examine how to best make it available to other researchers.

報告書

(1件)

2023 実施状況報告書

研究成果
(3件)

すべて 2023

すべて学会発表 (3件) (うち国際学会 1件)

[学会発表] Using Whisper to automate the transcription of L2 learners' spoken texts2023
- 著者名/発表者名
  Brooks, Gavin; Jordan, Jen
- 学会等名
  JALT CALL Conference 2023
- 関連する報告書
  2023 実施状況報告書
[学会発表] Automated transcription and measures of LD in spoken texts2023
- 著者名/発表者名
  Brooks, Gavin; Jordan, Jen
- 学会等名
  H-LRF Conference 2023
- 関連する報告書
  2023 実施状況報告書
- 国際学会
[学会発表] Automated Transcription and Measures of Lexical Diversity in L2 Spoken Texts2023
- 著者名/発表者名
  Brooks, Gavin; Jordan, Jen
- 学会等名
  JALT CUE Conference 2023
- 関連する報告書
  2023 実施状況報告書

Improving Measures of Lexical Diversity and Multi-word Expressions for Japanese EFL Learners

研究代表者

ブルックス ギャビン 京都産業大学, 外国語学部, 講師 (10610818)

4,680千円 (直接経費: 3,600千円、間接経費: 1,080千円)

現在までの達成度 (区分)

理由

報告書

研究成果

[学会発表] Using Whisper to automate the transcription of L2 learners' spoken texts2023

著者名/発表者名

学会等名

関連する報告書

[学会発表] Automated transcription and measures of LD in spoken texts2023

著者名/発表者名

学会等名

関連する報告書

[学会発表] Automated Transcription and Measures of Lexical Diversity in L2 Spoken Texts2023

著者名/発表者名

学会等名

関連する報告書

ブルックスギャビン京都産業大学, 外国語学部, 講師 (10610818)