2017 Fiscal Year Annual Research Report
Investigating a Learner Corpus of Computer-mediated Communication
Project/Area Number |
26580077
|
Research Institution | Gakushuin University |
Principal Investigator |
MARCHAND Tim 学習院大学, 国際社会科学部, 准教授 (20645197)
|
Co-Investigator(Kenkyū-buntansha) |
阿久津 純恵 東洋大学, ライフデザイン学部, 講師 (20460024)
|
Project Period (FY) |
2014-04-01 – 2018-03-31
|
Keywords | learner corpus / CMC / longitudinal development / mixed-effects regression |
Outline of Annual Research Achievements |
(1) Initiated a more robust, three-step process to identify and tag spelling errors, resulting in 5567 potential replacements tagged in the learner corpus. US and UK spelling alternates have also been identified and replaced where necessary for more accurate bigram analysis with large reference corpora (such as COCA). (2) Tested the POS-tag accuracy for the learner corpus by comparing a manual tagged random token sample from the corpus with the tagged output from WMatrix. Accuracy figures for the CLAWS7 tags: Precision 0.974 (0.980), Recall 0.977 (0.981) and F-measure 0.976 (0.981) with the figures in parentheses representing the results of modification after manual correcting for some of the corpus-based tagging errors. (3) Identification of learner proficiency levels, through the analysis of the questionnaire data. Learners placed into CEFR equivalent proficiency: A1 13% A2 19% B1 39% B2 16%C1 2%NA 10% (4) Although incomplete, have piloted Mixed-effects regression models to find correlations between learner profile and longitudinal development. Initial results suggest that the most significant correlations occur between learner engagement variables and development, rather than proficiency level, although this needs to be examined more thoroughly.
|