研究実績の概要 |
The two primary objectives of this research project were to create a corpus of Kansai vernacular Japanese, and to make that corpus available on the internet. In total, 138 sociolinguistic interviews were conducted. Each interview was transcribed, and checked for errors. The transcriptions were parsed and tagged with part of speech data using Mecab. The tagged data was checked by students hired for this job, and mistakes were corrected. I estimate the final accuracy rate is about 98%. The final data, along with supporting documents such as a description of the transcription methods, are available on a google website. The data is shared under a creative commons license. Users may be downloaded and used free of charge. However, users are prohibited from using the data for profit.
|