Feature visualizer and detector for scientific texts
Project/Area Number |
19K00850
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Review Section |
Basic Section 02100:Foreign language education-related
|
Research Institution | The University of Aizu |
Principal Investigator |
BLAKE John 会津大学, コンピュータ理工学部, 上級准教授 (80635954)
|
Co-Investigator(Kenkyū-buntansha) |
Mozgovoy Maxim 会津大学, コンピュータ理工学部, 准教授 (60571776)
|
Project Period (FY) |
2019-04-01 – 2022-03-31
|
Project Status |
Granted (Fiscal Year 2020)
|
Budget Amount *help |
¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000)
Fiscal Year 2021: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2020: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2019: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
|
Keywords | language processing / feature extraction / tense identification / feature visualization / lexical patterns / grammatical patterns / genre / visualization / language features / nlp / iCALL |
Outline of Research at the Start |
This research aims to develop and evaluate an interactive online multimedia tool that can visualize the typical language features in scientific texts written in English. There are two functionalities. (1) The feature visualizer shows and explains commonly-used language features present in a corpus of fully-annotated short research articles. (2) The feature detector identifies core language features in texts submitted by users. This helps students compare their own writing to expected conventions in scientific writing.
|
Outline of Annual Research Achievements |
In the second year, we aimed to and were able to improve the feature detector by integrating more functionalities, such as tense-aspect identification and various types of information structure. The tense-aspect identification function classifies and labels grammatical tenses using the twelve commonly-used terms (e.g. past progressive, future perfect, etc.). The tense-aspect identification function also classifies finite verbs by voice, and so that feature will also be available for users. The information structure function, which identifies information focus, information flow and end-weight is currently deployed. In both functionalities the accuracy and precision can be further improved. In the deployed feature detector, for any text submitted users can: 1. Create a text profile using standard lists such as the academic word list and academic vocabulary list; 2. Identify particular sets of words, such as TOEIC vocabulary and words related to computer science; 3. Display readability indices (e.g. Gunning Fog and Flesch Kincaid scores); 4. Show text statistics (e.g. percent of complex words, average words per sentence); 5. Identify whether sentences are front-heavy or adhere to the end-weight principle; 6. Display the thematic development of subsequent sentences (e.g. constant or ruptured), and 7. Show the information focus (e.g. new or given information). Links to the deployed version are available on the homepage of the principal investigator.
|
Current Status of Research Progress |
Current Status of Research Progress
1: Research has progressed more than it was originally planned.
Reason
Many of the technical challenges have been overcome. The primary focus now is on increasing the accuracy and precision of pattern-matching functions, and increasing the usability of the system.
|
Strategy for Future Research Activity |
In the third year our focus will be on increasing the usability of both the text visualizer, which reveals language features in a pre-annotated corpus and the text detector, which shows language features in raw text. Functionalities developed for the text detector that can be adapted for use in the text visualizer will be identified and incorporated. A systematic evaluation of the accuracy, usability and efficacy will be conducted to identify areas for future work.
|
Report
(2 results)
Research Products
(6 results)