2020 Fiscal Year Research-status Report
Natural language processing for academic writing in English
Project/Area Number |
18K11446
|
Research Institution | The University of Kitakyushu |
Principal Investigator |
Goh ChooiLing 北九州市立大学, 国際環境工学部, 特任准教授 (90531616)
|
Co-Investigator(Kenkyū-buntansha) |
LEPAGE YVES 早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (70573608)
|
Project Period (FY) |
2018-04-01 – 2022-03-31
|
Keywords | academic writing aids / lexical bundles / word embeddings / sentence embeddings / text generation / plagiarism detection / text style transfer |
Outline of Annual Research Achievements |
In the 3rd fiscal, research was carried out on style transfer between abstract and conclusion sections of academic papers. Most of the time, these two sections share similar contents but, e.g., typically tenses are different. Using cycleGANs, a method was developed to transfer the style of an abstract into the style in a conclusion, while preserving content, and vice versa (1 paper at int. conf. with reviewing, ICACSIS 2020). The typicality of the lexical bundles collected from academic papers were quantified using the product of individual KL-divergence scores and the probability of a bundle to appear in a given type of section. The proposed measure of typicality ranks typical lexical bundles in seemingly the right sections: abstract, introduction or conclusion (1 paper at int. conf. with reviewing, ICNLP 2020, best presentation award). This typicality measure ensures the use of plagiarism-free lexical bundles in given sections of articles. The masked language models (BERT etc) were used to investigate the usefulness of filling in blanks for verbs used in academic writings, as in Cloze tests. In experiments, we were able to control the selections of terms used in scientific articles, and improve the proficiency of academic writing style (1 paper at 言語処理学会第27回年次大会, no reviewing). A website has been set up and designed to help researchers to compose their scientific articles. It includes a text drafting pane, dictionary lookup, search of similar words/sentences, text generation and plagiarism checking. Further developments are in progress.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
Improvements have been made in querying for similar words or sentences. A collection of plagiarism-free lexical bundles has been assembled. These lexical bundles are classified based on their typicality in several types of sections. Suggestions of verbs by filling in blanks (Cloze tests) are possible. Sentences in abstract and conclusion can be generated interchangeably by using two in-house style transfer methods. A website with a demo site is being designed and implemented.
|
Strategy for Future Research Activity |
In the extended fiscal year, a text generation engine and plagiarism checking will be linked on the website. The text editor on the website will include the features to combine possible chunks, lexical bundles from already published articles, and make possible substitution of words that conform to academic style. A style transfer module between abstract and conclusion (both ways) will be integrated. An academic style score will be calculated so that to ensure the text style will be typical to the sections of a paper. Metrics used for plagiarism will be surveyed and some algorithms used for detecting plagiarism will be integrated.
|
Causes of Carryover |
During the fiscal year 2020, conferences have been canceled due to the Covid-19. Therefore, traveling expenses are left over. This travel expenses will be used to attend some conferences if situation is allowed, and to hire a research assistant for the continuous development of the web application during the extended fiscal year.
|
Research Products
(5 results)