2006 Fiscal Year Final Research Report Summary
The Project for the Corpus of Spontaneous Japanese Spoken by Non-Native Speakers
Project/Area Number |
17202011
|
Research Category |
Grant-in-Aid for Scientific Research (A)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Japanese language education
|
Research Institution | Osaka University |
Principal Investigator |
TOKI Satoshi Osaka University, School of Letters, Professor, 文学研究科, 教授 (10138662)
|
Co-Investigator(Kenkyū-buntansha) |
MAEKAWA Kikuo The National Institute for Japanese Language, Department of Language Research, Leader of the Corpus Compilation Group, 研究開発部門, 第2領域長 (20173693)
KASHIMA Tanomu Nagoya University, Education Center for IntenationalStudents, Professor, 留学生センター, 教授 (60204377)
NAKANISHI Kumiko Kyoto University of Foreign Studies, Faculty of Foreign Studies, Assistant Professor, 外国語学部, 助教授 (30296769)
YAMASHITA Yoichi Ritsumeikan University, College of Information Science and Engineering, Professor, 情報理工学部, 教授 (80174689)
ESAKI Tetsuya University of Yamanashi, International Student Center, Lecturer, 留学生センター, 講師 (40420343)
|
Project Period (FY) |
2005 – 2006
|
Keywords | linguistics / phonetics / non-native speakers of Japanese / spontaneous speech corpus / teaching Japanese as a foreign language |
Research Abstract |
The Corpus of Spontaneous Japanese spoken by non-native speakers is a large-scale annotated corpus for spoken language research. Corpus has been focused on as new language research data, which was established with high quality on a large scale. Most of the corpora, however, aim at the speech of native speakers, and few at non-native speakers. Our research, paying attention to such speech of non-native speakers, has established a large-scale annotated corpus. This corpus could show phonetic features in speech of non-native speakers, and furthermore provide research of interlanguage and sociolinguistics with beneficial data. The Corpus of Spontaneous Japanese spoken by non-native speakers contains about 2000 minutes of spontaneous speech that correspond to about 360k words. All these speech material are recorded using head-worn close-talking microphones and DAT, and down-sampled to 16kHz, 16bit accuracy. The speech material is transcribed using a two-way transcription scheme designed especially for CSJ (the Corpus of Spontaneous Japanese). Recorded speech is transcribed in two different ways: orthographic and phonetic transcriptions. In "orthographic" transcription, speech is transcribed using Kanji (Chinese logograph) and Kana (Japanese syllabary) just like ordinary Japanese text, but unlike the ordinary Japanese writing, our orthographic transcription has rigorous rules about the usage of Kanji and Kana letters. Part of the corpus is segment labeled. The labels are basically phonemic, but some phonetic labels are used, too. Phonetic labels are introduced for the study of phonetic variation and spontaneous speech-specific phenomena. 5 utterances are also intonation labeled with X-JToBI. In the scheme of X-JToBI both the tone and BI (boundary index) labels were considerably extended to match the paralinguistic features of the spontaneous speech intonation.
|