研究課題/領域番号 |
23K00597
|
研究機関 | 京都産業大学 |
研究代表者 |
ブルックス ギャビン 京都産業大学, 外国語学部, 講師 (10610818)
|
研究分担者 |
Jordan Jennifer 関西学院大学, 総合政策学部, 専任講師 (00469264)
Higginbotham George 叡啓大学, ソーシャルシステムデザイン学部, 准教授 (20885090)
CLENTON JONATHAN 広島大学, 人間社会科学研究科(総), 准教授 (80762434)
|
研究期間 (年度) |
2023-04-01 – 2026-03-31
|
キーワード | lexical diversity / multi-word expressions / NLP / learner corpora |
研究実績の概要 |
Following the research plan, this year I began to collect and analyze the data necessary for this project. I set up the two corpora (one of L1 Japanese English Language Learners and one with a diverse collection of L1 language backgrounds). I also performed a preliminary analysis to see how well existing NLP and LLM libraries would deal with L2 learner data and presented about these findings. In doing so I identified some areas where existing packages struggle with L2 spoken and written texts. I presented about these findings at a number of conferences. This will allow me achieve the purpose of the research by helping to identify the shortcomings with existing packages so that I can begin to address these issues in the next stage of the project.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
2: おおむね順調に進展している
理由
As stated above, this semester I was able to meet most of the goals set out in the initial proposal. I was able to organize the two corpora into a format that will allow them to be analyzed effectively in the next stage of the project. There were two items that were slightly more difficult than expected. First of all, the transcription of spoken texts was slower and slightly more expensive than originally anticipated. This resulted in a smaller corpus than I was initially anticipating for the spoken texts. However, in the preliminary analysis, this did not seem to affect my ability to use the corpus to analyze the effectiveness of the tools. The other issue was with the POS tagger as I was not able to get the GUI from the previous application to work with the additional features. This meant things had to be manually coded, which made it difficult to find RAs who were able to do assist with this part of the project. This year, I hope to be able to rewrite the GUI so that it works with the additional features that are necessary for this project.
|
今後の研究の推進方策 |
This year I intend to finish updating the POS tagger and begin to validate it on four test corpora, two spoken and two written. If necessary, I will also continue to add to the spoken corpus and get more texts transcribed for the purpose of analysis. One possible method for doing this that I intend to investigate is the use of a revised version of Whisper that has been updated to improve its performance on L2 speaker texts in order to be able to increase the number of spoken texts in the corpus. While it will still be necessary to check and clean the resulting texts, having an RA do this will be faster and more economical than hiring a transcriber to complete the process. After this has been completed, my goal is to use the updated tagger to replicate three existing studies involving lexical diversity and the use of multi-word expressions. The first two of these will look at lexical diversity and the final one will examine MWE usage over time. My hope is to complete and present on these studies by the end of the year and have them submitted for publication. After I have tested the updated tagger, I will begin to examine how to best make it available to other researchers.
|
次年度使用額が生じた理由 |
Some of the analysis intended to be done by RAs had to be done by the PI instead due to the fact that it was necessary to manually enter the code. This year we are hoping to update the GUI to allow RAs to work on this part of the project. We also want to use a different method for transcribing the data that is more economical. Due to these reasons, these funds were not used in the previous year, but will be used this year for their intended purpose.
|