研究課題/領域番号 |
18K00656
|
研究機関 | 会津大学 |
研究代表者 |
Heo Younghyon 会津大学, コンピュータ理工学部, 上級准教授 (10631476)
|
研究分担者 |
Perkins Jeremy 会津大学, コンピュータ理工学部, 上級准教授 (30725635)
Paik Incheon 会津大学, コンピュータ理工学部, 教授 (70336478)
|
研究期間 (年度) |
2018-04-01 – 2021-03-31
|
キーワード | machine-translated text |
研究実績の概要 |
In 2018, we examined the distinguishing features of machine-translated texts and human-written texts by L2 Japanese speakers of English. A Lime Survey was conducted asking participants to distinguish machine-translated from L2 human-written scientific abstracts from Japan. Three surveys were made, each with five machine-translated and five human-written abstracts. Twenty-four participants provided judgments on ten abstracts each and also provided the basis for their judgment. Participants were essentially guessing (51% accuracy). Teachers noted that machine-translated texts contained long sentences with many subordinate clauses; incoherent sentences/paragraphs; stylistic problems (not in keeping with conventions of scientific writing); incorrect passive use. Human-written texts were also reported to contain simple grammar errors; misspellings; punctuation/spacing errors; passive overuse; incorrect passive use; typical L2 expressions. We also conducted n-gram analysis of machine-translated texts and human written texts. Since the original text was too long to run analysis with (385,184 machine-translated and 193,922 human-written English sentences), we limited the number of data to 150,000 lines of each text type. The most frequent unigram, bigram and trigram have been analyzed separately for both types of texts and also the comparison was made between the two text types to see how the ranking of the same n-gram is different in two types of texts.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
2: おおむね順調に進展している
理由
We were hoping to find thirty participants in the survey. It would be still okay to do the analysis with the result from twenty-four participants, but we will try to find more L2 teachers for the survey. Although the n-gram analysis took much more time than we expected, we managed to finish all three types of n-gram analysis (unigram, bigram and trigram).
|
今後の研究の推進方策 |
In 2019, we will run machine-learning (both supervised learning and deep learning) using the technique that most accurately detects machine-translated texts. The accuracy rate will be compared to that of human raters (experienced L2 instructors who participated in the survey). Also, we will run machine learning in combination with techniques such as TF-IDF (keyword analysis) and n-gram analysis. The result of the keyword analysis (determining keywords that distinguishes machine-translated and human-written texts) will be compared to the result of n-gram analysis (describing the ranking of more frequent keywords in two different types of texts). This technique can help us explore the difference between the machine-translated and human written texts from a linguistic point of view. Professor Paik and his students will run machine learning, and the PI and the two Co-Is (Professor Perkins and Professor Paik) will compare and analyze the results of machine-learning, n-gram analysis and the Lime survey.
|
次年度使用額が生じた理由 |
We plan to attend more conferences (more than we originally planned) in 2019 to report two different results as they are worth reported separately with different interesting aspects. So we wanted to save the budget from 2018 a little so that it can be used for more conference presentations. We applied for two international conferences and waiting for the results regarding the acceptance.
|