2017 Fiscal Year Annual Research Report
Design and collection of a written corpus of learner Spanish in Japan
Project/Area Number |
17H07270
|
Research Institution | Kansai Gaidai University |
Principal Investigator |
VALVERDE Pilar 関西外国語大学, 外国語学部, 講師 (10588205)
|
Project Period (FY) |
2017-08-25 – 2019-03-31
|
Keywords | learner corpus / corpus linguistics / SLA / learner Spanish |
Outline of Annual Research Achievements |
During the last six months I have completed the design of the learner corpus and have started to collect data at some universities.
1) With regard to the design of the corpus. First, I have established the list of variables that are relevant for our study, concerning the learners and the written tasks. I have designed a learner profile questionnaire to be administered to students majoring in Spanish at Kansai Gaidai (around 500 in total) and a collection of written tasks to be carried out by the students.
2) With regard to the collection of data, at Kyoto University I have obtained the following from the students of Spanish as a second foreign language : four compositions for each of the 280 students, along with information about the linguistic background of the students.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
The research has progressed as expected, since the design of the corpus has been completed and have started the collection of data from learners.
|
Strategy for Future Research Activity |
From April, data is being collected at Kansai Gaidai from students majoring in Spanish. Specifically, learner profiles from around 500 students, and two written tasks per semester. In addition, I will search for collaboration from other universities from where I can obtain data.
During this coming year, the data that has been collected in paper form will be input into electronic form and stored in a database. The texts written by learners will be processed further. First, correcting the ortographic mistakes in the texts. Second, enriching the texts with lemma part-of-speech information with the help of automatic tools, so that one can make complex searches on them. Finally, uploading the texts to a corpus search system and distributing the corpus under a Creative Commons License BY-NC-SA.
|