| 研究課題/領域番号 |
23K00597
|
| 研究種目 |
基盤研究(C)
|
| 配分区分 | 基金 |
| 応募区分 | 一般 |
| 審査区分 |
小区分02080:英語学関連
|
| 研究機関 | 京都産業大学 |
研究代表者 |
ブルックス ギャビン 京都産業大学, 外国語学部, 講師 (10610818)
|
| 研究分担者 |
Jordan Jennifer 関西学院大学, 総合政策学部, 専任講師 (00469264)
Higginbotham George 叡啓大学, ソーシャルシステムデザイン学部, 准教授 (20885090)
CLENTON JONATHAN 広島大学, 人間社会科学研究科(総), 准教授 (80762434)
|
| 研究期間 (年度) |
2023-04-01 – 2026-03-31
|
| 研究課題ステータス |
交付 (2024年度)
|
| 配分額 *注記 |
4,680千円 (直接経費: 3,600千円、間接経費: 1,080千円)
2025年度: 2,210千円 (直接経費: 1,700千円、間接経費: 510千円)
2024年度: 1,430千円 (直接経費: 1,100千円、間接経費: 330千円)
2023年度: 1,040千円 (直接経費: 800千円、間接経費: 240千円)
|
| キーワード | learner corpora / NLP / Lexical Diversity / Lexical Sophistication / Multiword Expressions / lexical diversity / multi-word expressions / Multi-word Expressions / Corpus Linguistics |
| 研究開始時の研究の概要 |
This project attempts to address the gap that exists with the accuracy of part-of-speech taggers that have been trained and validated on L1 language corpora when they are used on learner corpora produced by L1 Japanese learners of English. In order to do this, we will create a tagged corpus that consists of written essays, transcribed discussions, and transcribed presentations from a cohort of L1 Japanese university students. This will be used to create a POS model that will then be tested on a similar corpus of texts. The newly created POS tagger will then be made available for public use.
|
| 研究実績の概要 |
Following the research plan, I continued to build on the corpora developed in the previous year. This year, I used NLP tools to analyze both written and spoken learner texts, focusing on how different lexical and structural features relate to proficiency. Using a random forest approach, I examined how various linguistic measures derived from the corpus data varied across different proficiency levels. This analysis helped identify which features were most predictive of learner proficiency and highlighted the relative strengths and limitations of current NLP metrics when applied to L2 data.
|
| 現在までの達成度 |
現在までの達成度
2: おおむね順調に進展している
理由
As noted above, the project is progressing mostly as planned. The learner corpus is now in an analyzable format, and initial findings have been presented at international conferences. While some minor delays were experienced due to limitations in available tagging tools, I began developing an updated version of the POS tagger based on TreeTagger and expect to have a functional version available by summer. This will streamline further analysis and support broader dissemination of the tools. Preparations for cross-corpus comparisons are also underway.
|
| 今後の研究の推進方策 |
This year, I will continue validating the updated POS tagger across multiple corpora and begin presenting the results to receive feedback for refinement. I also plan to explore the use of LLMs to investigate the relationship between L1 and L2 multi-word expressions by analyzing next-word prediction probabilities. This approach may reveal new patterns in how MWEs are acquired and used by L2 learners. In addition, I will assess how best to integrate these tools into the broader research workflow and make them accessible to other researchers through appropriate documentation and support.
|