Creating the first oral and written corpus of Japanese learners of Spanish as a foreign language
Project/Area Number |
23K00698
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Review Section |
Basic Section 02100:Foreign language education-related
|
Research Institution | Hiroshima University |
Principal Investigator |
GARCIA CARLOS 広島大学, 外国語教育研究センター, 准教授 (30817169)
|
Co-Investigator(Kenkyū-buntansha) |
VALVERDE Pilar 関西外国語大学, 外国語学部, 准教授 (10588205)
|
Project Period (FY) |
2023-04-01 – 2026-03-31
|
Project Status |
Granted (Fiscal Year 2023)
|
Budget Amount *help |
¥1,950,000 (Direct Cost: ¥1,500,000、Indirect Cost: ¥450,000)
Fiscal Year 2025: ¥520,000 (Direct Cost: ¥400,000、Indirect Cost: ¥120,000)
Fiscal Year 2024: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000)
Fiscal Year 2023: ¥780,000 (Direct Cost: ¥600,000、Indirect Cost: ¥180,000)
|
Keywords | learner corpus / oral corpus / written corpus / Spanish / foreign language / Spanish Foreign Language |
Outline of Research at the Start |
In this research project, we aim to create the first oral and written corpus of Spanish as a Foreign Language learners in Japan by merging and enriching two corpora we created in our previous works.
|
Outline of Annual Research Achievements |
In this research project, we aim to create the first oral and written corpus of Spanish as a Foreign Language learners (SFL) in Japan by merging and enriching two corpora we created in our previous works. The objectives FY2023 were to stablish the requirements for integrating both corpora. A systematic review of SFL learner’s corpora available on the Internet guided us on how to maximize the potential use of the oral corpus in studies related to linguistic and interactional phenomena. We have stablished the steps to process the transcriptions of the oral corpus for their linguistic annotation. In short: 1) Preprocessing step for automatic parts of speech (PoS) processing, 2) PoS processing using the Freeling tool; 3) Adding metadata and xml tags. We have also addressed two key issues: the logic structure of the metadata scheme and the XML tags. In the written corpus, each document is assigned to one participant, but in the oral corpus, each transcript corresponds to a conversation with 2 participants. In addition, in its present version, the written corpus uses XML tags for sentence and document. For the oral corpus, tags for segmenting each turn and its participant are also required. Finally, by using the markup language HTML5 for the oral corpus, it will be possible to read the transcription with interactional codes and listen the audio for each turn. This will allow studies about interactional phenomena. Preliminary tests for exporting our transcriptions made using the linguistic annotator ELAN software to HTML5 have been proven to be successful and feasible.
|
Current Status of Research Progress |
Current Status of Research Progress
3: Progress in research has been slightly delayed.
Reason
As our research objectives for FY2023 were successfully accomplished, we consider that this project is progressing rather smoothly. However, there is a crucial point that should be considered and could delay this project. In FY2024 funds are required as honoraria for a graduate student with sufficient knowledge of Spanish, Japanese and Linguistics, to prepare the data to be processed with an automatic linguistic annotation tool. We still need to find this graduate student with the required background.
|
Strategy for Future Research Activity |
In FY2024 we aim to process the data for the integration of the oral corpus (Corpus of Natural Conversations) into the platform and online interface of the written corpus (CELEN corpus). This processing will include: (1) Preprocessing of each transcriptions for PoS tagging (for instance, deleting transcription marks of interactional phenomena such as codes for pauses, fillers or overlapping), (2) automatic PoS tagging of each transcription using the Freeling tool, (3) assigning metadata and XML tags to each transcription, (4) preparing HTML5 versions of each transcription, (5) reviewing the materials, and (6) integrating the materials into the Sketch platform and online site of the written corpus (CELEN corpus). In addition, we will prepare the transcriptions of the oral corpus to be incorporated as HTML5 documents in the CELEN corpus online site. In order to achieve these objectives, we need the collaboration of a graduate student under our guidance.
|
Report
(1 results)
Research Products
(3 results)