2005 Fiscal Year Final Research Report Summary
Annotation and Computer Processing of Language Resources in Non-Latin Scripts and Phonetic Transcription
Project/Area Number |
15202008
|
Research Category |
Grant-in-Aid for Scientific Research (A)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Linguistics
|
Research Institution | The University of Tokyo |
Principal Investigator |
MATSUMURA Kazuto The University of Tokyo, Graduate School of Humanities and Sociology, Professor, 大学院人文社会系研究科, 教授 (40165866)
|
Co-Investigator(Kenkyū-buntansha) |
FUKUI Rei The University of Tokyo, Graduate School of Humanities and Sociology, Associate Professor, 大学院人文社会系研究科, 助教授 (50199189)
TAKIZAWA Naohiro Nagoya University, Graduate School of International Development, Professor, 大学院国際開発研究科, 教授 (60252285)
YAMADA Hisanari Otaru University of Commerce, Center for Language Studies, Associate Professor, 言語センター, 助教授 (60345246)
CHIBA Shoju Reitaku University, College of Foreign Studies, Associate Professor, 外国語学部, 助教授 (70337723)
HATANO Toshie The University of Tokyo, Graduate School of Humanities and Sociology, Research Associate, 大学院人文社会系研究科, 助手 (40376520)
|
Project Period (FY) |
2003 – 2005
|
Keywords | phonetic alphabet / Cyrillic / corpus / endangered language / markup / multilingual computing / Language resources / Unicode |
Research Abstract |
The main objective of this three-year project was the digitization of linguistic resources of endangered and minority languages of Russia and neighboring countries. A greater part of the linguistic resources digitized during the course of this project are texts written in some variety of Cyrillic script or phonetic transcriptions of recorded speech. The languages concerned were Avar (Daghestan), Itelmen (Kamchatka), and Uralic languages (Estonian, Mari, Vepsian). Each of the digitized texts or linguistic documents is encoded in UTF-8 (the most common method of Unicode encoding at the moment). A quality OpenType font (named JLOT-Fluralic) equipped with all the glyphs of the Uralic Phonetic Alphabet (UPA) as well as the Cyrillic characters defined in the Unicode Standard 4.0 was created for this purpose in collaboration with Finnish colleagues. An open hands-on seminar was held on the XML markup of texts. Most of the linguistic documents created in this project were provided with XML markup and converted into (well-formed) XML documents. Digitized linguistic resources constitute a major part of linguistic documentation of endangered and minority languages. In order to obtain concrete pictures of communities of minority languages, visits were paid to active members and researchers in local speech communities: the city of Hitoyoshi, Kumamoto Prefecture (Kuma dialect), as well as Naha, Ginowan and Itoman of Okinawa Prefecture (Okinawan).
|
Research Products
(42 results)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
[Book] 言語学 第2版2004
Author(s)
松村一登
Total Pages
272
Publisher
東京大学出版会
Description
「研究成果報告書概要(和文)」より
-
-
-