2005 Fiscal Year Final Research Report Summary

Annotation and Computer Processing of Language Resources in Non-Latin Scripts and Phonetic Transcription

Research Project

Project/Area Number	15202008
Research Category	Grant-in-Aid for Scientific Research (A)
Allocation Type	Single-year Grants
Section	一般
Research Field	Linguistics
Research Institution	The University of Tokyo
Principal Investigator	MATSUMURA Kazuto The University of Tokyo, Graduate School of Humanities and Sociology, Professor, 大学院人文社会系研究科, 教授 (40165866)
Co-Investigator(Kenkyū-buntansha)	FUKUI Rei The University of Tokyo, Graduate School of Humanities and Sociology, Associate Professor, 大学院人文社会系研究科, 助教授 (50199189) TAKIZAWA Naohiro Nagoya University, Graduate School of International Development, Professor, 大学院国際開発研究科, 教授 (60252285) YAMADA Hisanari Otaru University of Commerce, Center for Language Studies, Associate Professor, 言語センター, 助教授 (60345246) CHIBA Shoju Reitaku University, College of Foreign Studies, Associate Professor, 外国語学部, 助教授 (70337723) HATANO Toshie The University of Tokyo, Graduate School of Humanities and Sociology, Research Associate, 大学院人文社会系研究科, 助手 (40376520)
Project Period (FY)	2003 – 2005
Keywords	phonetic alphabet / Cyrillic / corpus / endangered language / markup / multilingual computing / Language resources / Unicode
Research Abstract	The main objective of this three-year project was the digitization of linguistic resources of endangered and minority languages of Russia and neighboring countries. A greater part of the linguistic resources digitized during the course of this project are texts written in some variety of Cyrillic script or phonetic transcriptions of recorded speech. The languages concerned were Avar (Daghestan), Itelmen (Kamchatka), and Uralic languages (Estonian, Mari, Vepsian). Each of the digitized texts or linguistic documents is encoded in UTF-8 (the most common method of Unicode encoding at the moment). A quality OpenType font (named JLOT-Fluralic) equipped with all the glyphs of the Uralic Phonetic Alphabet (UPA) as well as the Cyrillic characters defined in the Unicode Standard 4.0 was created for this purpose in collaboration with Finnish colleagues. An open hands-on seminar was held on the XML markup of texts. Most of the linguistic documents created in this project were provided with XML markup and converted into (well-formed) XML documents. Digitized linguistic resources constitute a major part of linguistic documentation of endangered and minority languages. In order to obtain concrete pictures of communities of minority languages, visits were paid to active members and researchers in local speech communities: the city of Hitoyoshi, Kumamoto Prefecture (Kuma dialect), as well as Naha, Ginowan and Itoman of Okinawa Prefecture (Okinawan).

Research Products
(42 results)

All 2006 2005 2004 2003 2002 Other

All Journal Article (35 results) Book (7 results)

[Journal Article] マリ語の言語資料とその電子化2006
- Author(s)
  松村一登
- Journal Title
  
  Uralica 14
  
  Pages: 45-56
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] 『青空文庫』を言語コーパスとして使おう-メタデータ構築による歴史的・社会言語学的研究への応用の試み-2006
- Author(s)
  千葉庄寿
- Journal Title
  
  言語処理学会第12回年次大会発表論文集
  
  Pages: 915-918
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Digitization of Mari linguistic resources2006
- Author(s)
  Kazuto Matsumura
- Journal Title
  
  Uralica 14
  
  Pages: 45-56
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] How to use Aozora Bunko as linguistic corpora : building and utilizing metadata for diacronic and sociolinguistic study2006
- Author(s)
  Shoju Chiba
- Journal Title
  
  Proceedings of the 12th Annual Meeting of the Association for Natural Language Processing
  
  Pages: 915-918
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] A corpus-based study of the'haven't NP' pattern in American English2005
- Author(s)
  滝沢直宏
- Journal Title
  
  Aspects of English Negation
  
  Pages: 159-171
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] コーパスと言語研究2005
- Author(s)
  滝沢直宏
- Journal Title
  
  日語教育 32
  
  Pages: 3-20
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] イテリメン語テキスト12005
- Author(s)
  小野智香子
- Journal Title
  
  環北太平洋の言語 12
  
  Pages: 81-88
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] 1930年代のカルムイクにおける言語政策2005
- Author(s)
  荒井幸康
- Journal Title
  
  日本モンゴル学会紀要 35
  
  Pages: 41-56
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] 1930年代のブリヤー卜の言語政策2005
- Author(s)
  荒井幸康
- Journal Title
  
  スラヴ研究 52
  
  Pages: 145-176
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] A corpus-based study of the 'haven't NP' pattern in American English2005
- Author(s)
  Naohiro Takizawa
- Journal Title
  
  Aspects of English Negation
  
  Pages: 159-171
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Corpora and Linguistic Studies2005
- Author(s)
  Naohiro Takizawa
- Journal Title
  
  Journal of Japanese Language Education Association 32
  
  Pages: 3-20
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Itelmen Text 12005
- Author(s)
  Chikako Ono
- Journal Title
  
  Languages of the North Pacific Rim 12
  
  Pages: 81-88
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] 周辺的な構文を記述するためのコーパス利用-現代英語におけるSOV構文を例に-2004
- Author(s)
  滝沢直宏
- Journal Title
  
  英語コーパス研究 11
  
  Pages: 153-167
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] 形態情報注釈入りロシア語コーパス作成のためのツール2004
- Author(s)
  山田久就
- Journal Title
  
  ロシア語ロシア文学研究 36
  
  Pages: 111-118
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] The Cognitive Unit of Segmentation for Speech in Japanese2004
- Author(s)
  畑野智栄
- Journal Title
  
  The 18th Internat ional Congress on Acoustics
  
  Pages: Th. P3.15
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] l telmen Verb stem : Morphological Features and Syntactic structure of Intransitive and Transitive2004
- Author(s)
  小野智香子
- Journal Title
  
  Languages of the North Pacific Rim 9
  
  Pages: 169-177
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] イテリメン語の両唇軟口蓋音について2004
- Author(s)
  小野智香子
- Journal Title
  
  環北太平洋の言語 11
  
  Pages: 79-90
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] A Corpus-Based Description of Peripheral Linguistic Phenomena : With Special Reference to the SOV Construction in Present-Day English2004
- Author(s)
  Naohiro Takizawa
- Journal Title
  
  English Corpus Studies 11
  
  Pages: 153-167
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Tools for building morphologically annotated corpora of Russian2004
- Author(s)
  Hisanari Yamada
- Journal Title
  
  Bulletin of the Japan Association for the Study of Russian Language and Literature 36
  
  Pages: 111-118
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] The cognitive unit of segmentation for speech in Japanese2004
- Author(s)
  Toshie Hatano
- Journal Title
  
  The 18th International Congress on Acoustics
  
  Pages: Th.3.15
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Itelmen verb stem : morphological features and syntactic structure of Intransitive and Transitive2004
- Author(s)
  Chikako Ono
- Journal Title
  
  Languages of the North Pacific Rim 9
  
  Pages: 169-177
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Itelmen labial-velar fricative and approximant2004
- Author(s)
  Chikako Ono
- Journal Title
  
  Languages of the North Pacific Rim 11
  
  Pages: 79-90
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Pitch accent systems in Korean2003
- Author(s)
  福井玲
- Journal Title
  
  Proceeding of the Symposium : Cross-linguistic studies of Tonal Phonomena. ILCAA
  
  Pages: 275-286
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] ロシア北東部における先住少数民族の言語使用2003
- Author(s)
  小野智香子
- Journal Title
  
  ことばと社会 7
  
  Pages: 63-87
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] カムチャッカの自然とともに生きる-イテリメン2003
- Author(s)
  小野智香子
- Journal Title
  
  北のことばフィールド・ノート-18の言語と文化-
  
  Pages: 119-134
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Pitch accent systems in Korean, Proceeding of the Symposium : Cross-linguistic Studies of Tonal Phonomena.2003
- Author(s)
  Rei Fukui
- Journal Title
  
  ILCAA
  
  Pages: 275-286
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Language use of the indigenous minority peoples in the north-eastern part of Russia2003
- Author(s)
  Chikako Ono
- Journal Title
  
  Language and Society 7
  
  Pages: 63-87
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] To Live in the Nature of Kamchatka-Itelmen2003
- Author(s)
  Chikako Ono
- Journal Title
  
  Field notes on Northern Languages-18 languages and cultures
  
  Pages: 119-134
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] カルムィクのことば2002
- Author(s)
  荒井幸康
- Journal Title
  
  日本モンゴル学会紀要 32
  
  Pages: 13-27
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Mapping (in) direct causation : a corpus-based approach to the Finnish causative constructions
- Author(s)
  千葉庄寿
- Journal Title
  
  東北大学言語学論集 (印刷中)
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] 構造化された言語データが言語研究にもたらすもの-コーパスを利用する言語研究者の知識基盤としてのXML-
- Author(s)
  千葉庄寿
- Journal Title
  
  麗澤大学紀要 82(印刷中)
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] イテリメン語の動詞語幹の分類とその派生法
- Author(s)
  小野智香子
- Journal Title
  
  環北太平洋の言語 13(印刷中)
- Description
  「研究成果報告書概要(和文)」より
[Journal Article] Mapping (in) direct causation : a corpus-based approach to the Finnish causative constructions
- Author(s)
  Shoju Chiba
- Journal Title
  
  Tohoku University Linguistics Journal (in press)
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Structured electronic data and corpus-based research : XML as (coming) technological core for linguists
- Author(s)
  Shoju Chiba
- Journal Title
  
  Reitaku University Journal 82(in press)
- Description
  「研究成果報告書概要(欧文)」より
[Journal Article] Classification of Itelmen verb stems and their derivation system
- Author(s)
  Chikako Ono
- Journal Title
  
  Languages of the North Pacific Rim 13(in press)
- Description
  「研究成果報告書概要(欧文)」より
[Book] コーパスで一目瞭然2006
- Author(s)
  滝沢直宏
- Total Pages
  207
- Publisher
  小学館
- Description
  「研究成果報告書概要(和文)」より
[Book] 言語の統合と分離-1920-1940年代のモンゴル・ブリヤート・カルムイクの言語政策の相関関係を中心に-2006
- Author(s)
  荒井幸康
- Total Pages
  251
- Publisher
  三元社
- Description
  「研究成果報告書概要(和文)」より
[Book] A corpus is Indeed Informative!2006
- Author(s)
  Naohiro Takizawa
- Total Pages
  207
- Publisher
  Shogakukan
- Description
  「研究成果報告書概要(欧文)」より
[Book] 言語学第2版2004
- Author(s)
  松村一登
- Total Pages
  272
- Publisher
  東京大学出版会
- Description
  「研究成果報告書概要(和文)」より
[Book] Linguistics : an introduction 2 Edition2004
- Author(s)
  Kazuto Matsumura
- Total Pages
  272
- Publisher
  University of Tokyo press
- Description
  「研究成果報告書概要(欧文)」より
[Book] 麗澤大学言語研究センター
- Author(s)
  千葉庄寿
- Publisher
  文科系研究者のための多言語処理入門-Windows XP環境を例に-(印刷中)
- Description
  「研究成果報告書概要(和文)」より
[Book] An Introduction to Multilingual Computing for the Humanities : Handling Multilingual Texts wih Windows XP
- Author(s)
  Shoju Chiba
- Publisher
  Linguistic Research Center, Reitaku University(in press)
- Description
  「研究成果報告書概要(欧文)」より

2005 Fiscal Year Final Research Report Summary

Annotation and Computer Processing of Language Resources in Non-Latin Scripts and Phonetic Transcription

Principal Investigator

MATSUMURA Kazuto The University of Tokyo, Graduate School of Humanities and Sociology, Professor, 大学院人文社会系研究科, 教授 (40165866)

Research Products

[Journal Article] マリ語の言語資料とその電子化2006

Author(s)

Journal Title

Description

[Journal Article] 『青空文庫』を言語コーパスとして使おう-メタデータ構築による歴史的・社会言語学的研究への応用の試み-2006

Author(s)

Journal Title

Description

[Journal Article] Digitization of Mari linguistic resources2006

Author(s)

Journal Title

Description

[Journal Article] How to use Aozora Bunko as linguistic corpora : building and utilizing metadata for diacronic and sociolinguistic study2006

Author(s)

Journal Title

Description

[Journal Article] A corpus-based study of the'haven't NP' pattern in American English2005

Author(s)

Journal Title

Description

[Journal Article] コーパスと言語研究2005

Author(s)

Journal Title

Description

[Journal Article] イテリメン語テキスト12005

Author(s)

Journal Title

Description

[Journal Article] 1930年代のカルムイクにおける言語政策2005

Author(s)

Journal Title

Description

[Journal Article] 1930年代のブリヤー卜の言語政策2005

Author(s)

Journal Title

Description

[Journal Article] A corpus-based study of the 'haven't NP' pattern in American English2005

Author(s)

Journal Title

Description

[Journal Article] Corpora and Linguistic Studies2005

Author(s)

Journal Title

Description

[Journal Article] Itelmen Text 12005

Author(s)

Journal Title

Description

[Journal Article] 周辺的な構文を記述するためのコーパス利用-現代英語におけるSOV構文を例に-2004

Author(s)

Journal Title

Description

[Journal Article] 形態情報注釈入りロシア語コーパス作成のためのツール2004

Author(s)

Journal Title

Description

[Journal Article] The Cognitive Unit of Segmentation for Speech in Japanese2004

Author(s)

Journal Title

Description

[Journal Article] l telmen Verb stem : Morphological Features and Syntactic structure of Intransitive and Transitive2004

Author(s)

Journal Title

Description

[Journal Article] イテリメン語の両唇軟口蓋音について2004

Author(s)

Journal Title

Description

[Journal Article] A Corpus-Based Description of Peripheral Linguistic Phenomena : With Special Reference to the SOV Construction in Present-Day English2004

Author(s)

Journal Title

Description

[Journal Article] Tools for building morphologically annotated corpora of Russian2004

Author(s)

Journal Title