研究実績の概要 |
As the title 良質な用例を大規模なコーパスから自動的に抽出できるモデルの構築および試作版の開発 suggests the aim of this project is to develop a model for selecting pedagogically valid Japanese example sentences from a general corpus. In order to develop a filter to select pedagogically valid Japanese example sentences from a general corpus, we started to investigate automatically measurable criteria of readability, typicality and informativity. We collected example sentences from learners' dictionaries, reference works and graded readers and are in the process of constructing a graded corpus of example sentences, to be used as a data set for verifying the usabililty of existing readability formulas on single sentences or short usage examples for learners of Japanese as a foreign language.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
2: おおむね順調に進展している
理由
We collected example sentences from learners' dictionaries, reference works and graded readers and are in the process of constructing a graded corpus of example sentences, to be used as a data set for verifying the usabililty of existing readability formulas on single sentences or short usage examples for learners of Japanese as a foreign language.
|
今後の研究の推進方策 |
Firstly, we plan to verify the usability of existing readability formulas on the graded corpus of example sentences. Secondly, using the same data set, we plan to develop a methodology to assess the typicality of example candidates, by comparing their syntactical and collocational patterns to those found in NINJAL-LWP for BCCWJ. Thirdly, to assess the informativity of example sentence candidates, we plan to collect examples of different length from other corpora, annotate the informativity level of each example sentence, investigate measurable criteria (including the presence of typical syntactic patterns and their elements; the proportion of proper nouns, pronouns, etc.) and produce a statistical model for the assessment of informativity. Finally we plan to implement this model in an openly accessible online example search system.
|