2020 Fiscal Year Research-status Report
Feature visualizer and detector for scientific texts
Project/Area Number |
19K00850
|
Research Institution | The University of Aizu |
Principal Investigator |
BLAKE John 会津大学, コンピュータ理工学部, 上級准教授 (80635954)
|
Co-Investigator(Kenkyū-buntansha) |
Mozgovoy Maxim 会津大学, コンピュータ理工学部, 准教授 (60571776)
|
Project Period (FY) |
2019-04-01 – 2022-03-31
|
Keywords | language processing / feature extraction / tense identification / feature visualization |
Outline of Annual Research Achievements |
In the second year, we aimed to and were able to improve the feature detector by integrating more functionalities, such as tense-aspect identification and various types of information structure. The tense-aspect identification function classifies and labels grammatical tenses using the twelve commonly-used terms (e.g. past progressive, future perfect, etc.). The tense-aspect identification function also classifies finite verbs by voice, and so that feature will also be available for users. The information structure function, which identifies information focus, information flow and end-weight is currently deployed. In both functionalities the accuracy and precision can be further improved. In the deployed feature detector, for any text submitted users can: 1. Create a text profile using standard lists such as the academic word list and academic vocabulary list; 2. Identify particular sets of words, such as TOEIC vocabulary and words related to computer science; 3. Display readability indices (e.g. Gunning Fog and Flesch Kincaid scores); 4. Show text statistics (e.g. percent of complex words, average words per sentence); 5. Identify whether sentences are front-heavy or adhere to the end-weight principle; 6. Display the thematic development of subsequent sentences (e.g. constant or ruptured), and 7. Show the information focus (e.g. new or given information). Links to the deployed version are available on the homepage of the principal investigator.
|
Current Status of Research Progress |
Current Status of Research Progress
1: Research has progressed more than it was originally planned.
Reason
Many of the technical challenges have been overcome. The primary focus now is on increasing the accuracy and precision of pattern-matching functions, and increasing the usability of the system.
|
Strategy for Future Research Activity |
In the third year our focus will be on increasing the usability of both the text visualizer, which reveals language features in a pre-annotated corpus and the text detector, which shows language features in raw text. Functionalities developed for the text detector that can be adapted for use in the text visualizer will be identified and incorporated. A systematic evaluation of the accuracy, usability and efficacy will be conducted to identify areas for future work.
|
Causes of Carryover |
Payment to be made to for services received but not yet invoiced.
|