2019 Fiscal Year Research-status Report
Natural language processing for academic writing in English
Project/Area Number |
18K11446
|
Research Institution | The University of Kitakyushu |
Principal Investigator |
Goh ChooiLing 北九州市立大学, 国際環境工学部, 特任准教授 (90531616)
|
Co-Investigator(Kenkyū-buntansha) |
LEPAGE YVES 早稲田大学, 理工学術院(情報生産システム研究科・センター), 教授 (70573608)
|
Project Period (FY) |
2018-04-01 – 2021-03-31
|
Keywords | academic writing aids / lexical bundles / word embeddings / sentence embeddings / text generation / plagiarism detection |
Outline of Annual Research Achievements |
In the 2nd fiscal year, research was carried out on the use of word embedding models to search for substitute words used for academic writing. Human evaluation has been carried out and results compared to a machine translation system (1 paper at int. conf. with reviewing committee, PACLING 2019). N-grams from ACL-ARC have been extracted and classified into True and False lexical bundles using machine learning models trained on manually checked bundles. 18,000 true lexical bundles have been collected and publicly released (1 paper at int. conf. with reviewing committee, ICACSIS 2019). They are useful for composing fluent academic texts. They are plagiarism-free. Work on using sentence embeddings to search for similar sentences in Abstract sections has been conducted. Similar sentences are presented to non-native writers to help them make correction (1 paper at 言語処理学会第26回年次大会, no reviewing committee) A web site has been set up based on the prototype built in the 1st fiscal year. A part-time research assistant is hired to setup the server, create and administer the website, and design and implement the front end user interface. This website is designed to be able to help researchers to compose their scientific articles. It includes a text drafting pane, automatic translation to English when necessary, dictionary lookup, search of similar words/sentences, text generation and finally plagiarism checking. Currently only interface is provided, the main engines will be linked in the future.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
Some improvements have been made to the research on searching similar words and sentences, and also collection of plagiarism-free lexical bundles. A website has been built and will be put on running when the text generation part is ready.
|
Strategy for Future Research Activity |
In the third fiscal year, main focus will be on the text generation part. Following the current research trend, deep learning will be applied to generate new text based on the the original text, collection of lexical bundles and the ACL-ARC knowledge base. The text generation engine must be able to combine possible chunks, lexical bundles, discursive and argumentative connectors from already published articles besides conserving the original meaning of the text. Furthermore, text style must be typical to the sections of a paper, which means that typicality of phrases must be conformed. The final part of the research is concerning plagiarism. Metrics used for plagiarism will be surveyed and algorithms used for detecting plagiarism will be determined.
|
Causes of Carryover |
During the end of fiscal year 2019, conferences have been canceled due to the Covid-19. Therefore, traveling expenses are left over. These travel expenses will be used for attending some international conferences in the next fiscal year. In addition, a research assistant for developing the web application was not found until the end of the fiscal year. This research assistant will be hired continuously during the 3rd fiscal year.
|
Research Products
(4 results)