2010 Fiscal Year Final Research Report
Compilation of a balanced book corpus of contemporary written Japanese
Project Area | Compilation of a balanced corpus of written Japanese: Infrastructure for the coming Japanese linguistics |
Project/Area Number |
18061007
|
Research Category |
Grant-in-Aid for Scientific Research on Priority Areas
|
Allocation Type | Single-year Grants |
Review Section |
Humanities and Social Sciences
|
Research Institution | The National Institute for Japanese Language |
Principal Investigator |
YAMAZAKI Makoto The National Institute for Japanese Language, 言語資源研究系, 准教授 (30182489)
|
Co-Investigator(Kenkyū-buntansha) |
MARUYAMA Takehiko 国立国語研究所, 言語資源研究系, 助教 (90392539)
KASHINO Wakako 国立国語研究所, 言語資源研究系, 准教授 (50311147)
SANO Motoki 国立国語研究所, コーパス開発センター, プロジェクト特別研究員 (60455425)
YAMAGUCHI Masaya 国立国語研究所, 言語資源研究系, 助教 (30302920)
MABUCHI Yoko 国立国語研究所, コーパス開発センター, プロジェクト特別研究員 (10415614)
TAKADA Tomokazu 国立国語研究所, 理論・構造研究系, 准教授 (90415612)
OGURA Hideki 国立国語研究所, 言語資源研究系, 准教授 (00321547)
FUJIIKE Yumi 国立国語研究所, コーパス開発センター, プロジェクト特別研究員 (20510572)
ONUMA Etsu 国立国語研究所, 管理部研究推進課, 専門職員 (00311150)
MORIMOTO Sachiko 学習院大学, 大学院・人文科学研究科, 助教 (80342939)
大和 淳 文化庁, 長官官房著作権課, 課長補佐 (10377103)
|
Project Period (FY) |
2006 – 2010
|
Keywords | 均衡コーパス / 書き言葉 / 代表性 / 書籍 / サンプリング / XML / 形態解析 / 著作権処理 |
Research Abstract |
We have compiled a large balanced corpus of books which will be a highly useful resource for the future research of Japanese language. This corpus is the first authentic balanced written corpus in Japan and has the following characteristics.(1)Represents the distribution of population properly by random sampling. (2)Segmented by two kinds of word unit(short word unit and long word unit). (3)Text strucrure, morphological information and character information are annotated using XML.(4)Every sample is sought the copyright permission as long as possible. The book corpus is the main part of the BCCWJ(Balanced Corpus of Contemporary Written Japanese) and will be open to the public in 2011.
|
Research Products
(35 results)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
[Presentation] Design, Compilation, and Preliminary Analyses of Balanced Corpus of Contemporary Written Japanese2010
Author(s)
Maekawa, Kikuo, Makoto Yamazaki, Takehiko Maruyama, Masaya Yamaguchi, Hideki Ogura, Wakako Kashino, Toshinobu Ogiso, Hanae Koiso, Yasuharu Den
Organizer
7th International Conference on Language Resources and Evaluation (LREC2010)
Place of Presentation
Mediterranean conference centre, Valleta, Malta.
Year and Date
2010-05-20
-
-
-
-
-
-
-
-
-
-
-
-
[Presentation] 現代日本語書き言葉均衡コーパスの設計と検索デモンストレーション2007
Author(s)
山崎誠, 丸山岳彦, 山口昌也, 小椋秀樹, 森本祥子, 柏野和佳子, 佐野大樹, 高田智和, 間淵洋子, 北村雅則, 小木曽智信, 小磯花絵, 冨士池優美, 小沼悦, 田中牧郎, 前川喜久雄
Organizer
日本語学会2007年度秋季大会(沖縄国際大学)予稿集(pp.239-246)
Place of Presentation
沖縄国際大学
Year and Date
2007-11-18
-
-
-
-
-
[Remarks] ホームページ等
-
-