2019 Fiscal Year Research-status Report
Development of a System for Collecting Context Data for Large-Scale Inverse Reinforcement Learning
Project/Area Number |
17K00295
|
Research Institution | Hokkaido University |
Principal Investigator |
RZEPKA Rafal 北海道大学, 情報科学研究院, 助教 (80396316)
|
Project Period (FY) |
2017-04-01 – 2021-03-31
|
Keywords | knowledge completion / natural language / language models / context processing |
Outline of Annual Research Achievements |
I proceeded with further development of a system to acquire contextual common sense knowledge. I have develop a module that automatically completes given sentence to add contextual information to predict its changes, and it is possible to input any situation. The task is performed according to not only to the standard semantic categories like actors or places but the system is also able to predict possible causes, effects, problems and costs. I conducted experiments to compare automatic and human evaluations to measure the reliability of the system. Ethically ambiguous input sentences were used but the algorithm achieved rather low agreement with human evaluators. More natural sentences (I used self-generated ones to avoid bias) and more evaluators will be needed. The context retrieval and processing were used in other tasks involving affective processing, motivation-oriented dialog, metaphor understanding, persuasiveness estimation. Trials with event completion were continued and event chain generation was extended to four related events which was not achieved by previous works. A Slack-based chatbot for demonstrating context processing was developed, but still the retrieved knowledge base size is not big enough for open dialog. I tried utilizing BERT model language directly, but one sentence input needs several queries which takes too much time. I used help from researchers and students from Queensland University of Technology to acquire tools for more sophisticated language processing in English language, but the cooperation was halted by the Coronavirus outbreak.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
The algorithms creation went rather smoothly but the context-related data for Japanese is still relatively small. I have automatically translated almost 100.000 short stories corpus from English to alleviate this problem after hopefully unsupervised post-editing process. Toolkit for English is still not fully created but a new multilingual NLP toolset Stanza was published and it should put processing other languages back on track. Also more language models are now available which will accelerate the comparison and sharing knowledge between languages. As mentioned above, new insights about context help in various tasks, combined together with standard methods like deep-learning, led to improving existing systems and publishing several papers, therefore I assess the output as rather satisfactory.
|
Strategy for Future Research Activity |
As the methods are tested, the last year of the project will concentrate on three topics. First will be further accumulation of knowledge, also in languages other than Japanese, second will be implementing the knowledge into dialog, third will be experimentation. More empirical prove showing that the knowledge acquired accurately reflects the real world thanks to contextual processing is needed. In addition, an evaluation experiment will be conducted with a scenario-type dialogue system to confirm the authenticity of contextualized output. Also a new type of evaluation experiment will be conducted in which the naturalness of the acquired knowledge can be confirmed while conversing and giving each subject slightly different information. Finally, I plan to release the generated contextual data and publish the results domestically and internationally.
|
Remarks |
As the collaboration was prematurely ended due to the COVID-19 outbreak, there is not any concrete material yet.
|
Research Products
(13 results)