2019 Fiscal Year Research-status Report
Using Google Translate for Academic English Writing Instruction
Project/Area Number |
18K00656
|
Research Institution | The University of Aizu |
Principal Investigator |
Heo Younghyon 会津大学, コンピュータ理工学部, 上級准教授 (10631476)
|
Co-Investigator(Kenkyū-buntansha) |
Perkins Jeremy 会津大学, コンピュータ理工学部, 上級准教授 (30725635)
Paik Incheon 会津大学, コンピュータ理工学部, 教授 (70336478)
|
Project Period (FY) |
2018-04-01 – 2021-03-31
|
Keywords | machine-translated text / keyword analysis |
Outline of Annual Research Achievements |
We first compared the two text classification methods using different machine learning techniques (deep learning vs. supervised learning), and we found that supervised learning (SVM) with keyword analysis (TF-IDF) predicts machine-translated texts better with the accuracy rate above 80% across the board than Deep MPN (the accuracy rate ranges from 49.9% to 66.7%). This suggests that the analysis on the use of words (keyword or n-gram analysis) is the key aspect of the machine learning analysis of machine-translated texts. In the following research, we calculated the accuracy of document classification using two types of n-gram (unigram and bigram). The goal was to get best candidate model for the analysis. We calculated the accuracy of document classification and the similarity of feature vectors for each number of words using unigram and bigram. It was shown that for both types of n-gram, by setting the number of sentences as minimum 50 in a document and number of words as 1200, we could obtain high accuracy of classification (unigram: accuracy rate of 0.98 for 50/1200, bigram: accuracy rate of 0.979 for 50/1200). It was concluded that the best model for detection can be established with the condition of the document size being 50 line and word size being 1200.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
In this research, two linguists and one computer scientist are working together as a team. Regarding the research methodology and the result including technical terms and concepts in computer science, it is quite challenging for linguists to understand, and we have to rely a lot on the general interpretation of the result provided by the computer scientist. Our team members could manage to communicate with each other regarding the setting of the experiment and the result by putting much effort to communicate in general terms. We think finding out the linguistic implication from the highly technical experimental result will be much easier if we can communicate more smoothly by having a team member with the knowledge of both linguistics and computer science.
|
Strategy for Future Research Activity |
Based on the results of our study in 2018 and 2019, we will develop class materials for teaching how to use Google Translate properly for academic English writing in the Thesis Writing and Presentation class for the 4th-year students at our university. Class materials will consist of four parts: 1) machine learning detection of Google translated documents, 2) linguistic features of machine-translated texts, 3) how to use Google Translate properly and 4) how to use Google Translate to learn about writing skills. In teaching the proper use of it, we can advise students how to use it based on our findings in 2018 and 2019. During the several sessions of teaching how to use Google Translate properly using the teaching materials provided, students will have a session of using the Google Translate with the sentences from their drafts. Learning about the features of Google translated sentences and good ways to produce natural sentences using Google Translate will help them write better English sentences in the thesis.
|
Causes of Carryover |
We originally planned to participate in two conferences in 2019, but we ended up attending just one conference. We hope to spend more budget on creating good quality teaching materials and participate in more than 2 international/domestic conferences in 2020 for the overall report of the entire 3-year project.
|
Research Products
(2 results)