2018 年度実績報告書

良質な用例を大規模なコーパスから自動的に抽出できるモデルの構築および試作版の開発

研究課題

研究課題/領域番号	18F18808
研究機関	大学共同利用機関法人人間文化研究機構国立国語研究所
研究代表者	PARDESHI P.V. 大学共同利用機関法人人間文化研究機構国立国語研究所, 理論・対照研究領域, 教授 (00374984)
研究分担者	HMELJAK MARIJA 大学共同利用機関法人人間文化研究機構国立国語研究所, 理論・対照研究領域, 外国人特別研究員
研究期間 (年度)	2018-11-09 – 2020-03-31
キーワード	example sentences / learners' dictionary / lexicography
研究実績の概要	As the title 良質な用例を大規模なコーパスから自動的に抽出できるモデルの構築および試作版の開発 suggests the aim of this project is to develop a model for selecting pedagogically valid Japanese example sentences from a general corpus. In order to develop a filter to select pedagogically valid Japanese example sentences from a general corpus, we started to investigate automatically measurable criteria of readability, typicality and informativity. We collected example sentences from learners' dictionaries, reference works and graded readers and are in the process of constructing a graded corpus of example sentences, to be used as a data set for verifying the usabililty of existing readability formulas on single sentences or short usage examples for learners of Japanese as a foreign language.
現在までの達成度 (区分)	現在までの達成度 (区分) 2: おおむね順調に進展している理由 We collected example sentences from learners' dictionaries, reference works and graded readers and are in the process of constructing a graded corpus of example sentences, to be used as a data set for verifying the usabililty of existing readability formulas on single sentences or short usage examples for learners of Japanese as a foreign language.
今後の研究の推進方策	Firstly, we plan to verify the usability of existing readability formulas on the graded corpus of example sentences. Secondly, using the same data set, we plan to develop a methodology to assess the typicality of example candidates, by comparing their syntactical and collocational patterns to those found in NINJAL-LWP for BCCWJ. Thirdly, to assess the informativity of example sentence candidates, we plan to collect examples of different length from other corpora, annotate the informativity level of each example sentence, investigate measurable criteria (including the presence of typical syntactic patterns and their elements; the proportion of proper nouns, pronouns, etc.) and produce a statistical model for the assessment of informativity. Finally we plan to implement this model in an openly accessible online example search system.