研究実績の概要 |
As the first step of the research on contextual data, I have concentrated on collecting textual data sets and analyzing existing corpora from the point of view of context processing. I have obtained two big Japanese web corpora (NWC and ClueWeb), and for further processing several others in English (Book Sentences, Book 5grams, COW), Chinese (Weibo, SogouT), German (COW), Spanish (Billion Words Corpus, COW) and Polish (IPIPAN). For basic emotional relationship discovery I have prepared or obtained polarity lexicons for these languages. Simultaneously, I was testing various parameters as time (presented in "Natural Language Processing for Predicting Everyday Behavior with and without Time and Duration Information" paper), moral judgement (presented in "What People Say? - Web-based Casuistry for Artificial Morality Experiments". Next I performed tests with automatically retrieved relational chains and discussed their possibility to replace humans in a data annotation process and evaluating other artificial intelligence systems as automatic behavior evaluators (presented in "Conscious vs. Unaware Evaluation - Using Collective Intelligence for an Automatic Evaluation of Acts"). After hearing the feedback and performing several discussions with AI researchers I have slightly replanned the proposed database architecture to mirror probabilistic communication between parallelly processed chains of consecutive knowledge chains. I have described the latest ideas in "Importance of Contextual Knowledge in Artificial Moral Agents Development" paper during AAAI Spring Symposium.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
2: おおむね順調に進展している
理由
I have collected language data for knowledge acquisition in a greater scale than expected. However, because the data collection and multilingual lexicon was time consuming, the ConceptNet ontology expansion is slower than planned. The algorithm for automatic evaluation of knowledge usualness works well in shorter chains, but is still insufficient for longer ones. As the corpus data for Japanese language grew three times bigger, also indexing it correctly takes time. On the other hand, collecting corpora in other languages allowed me to start automatic concept translation and evaluation which should directly lead to the ontology enlargement. I was planning writing two publications but managed to get twice as much accepted and for that reason I evaluate my progress rather well.
|
今後の研究の推進方策 |
After the multilingual data indexing is ready for fast search, the next step is to prepare linguistic clues for finding relations between cause and effect in order to collect contextual data as agent, object, place, time, etc. Because I have also prepared lexicons for Russian and Korean, it should be done for all seven languages (Japanese, English, Chinese, German, Polish, Russian and Korean), also these languages require corpora, which I still have not obtained. After that, the knowledge chains retrieval process will start. As mentioned in the grant application, I will first concentrate on features described by Bentham in his idea of Felicific Algorithm. Because common sense topic is very wide, I will focus on context for ethical judgement by using sentiment analysis. I will try to keep the pace with experimenting and publishing my progress to both national and international audiences, which should be easier after I decided to work with more languages as planned. First comparisons between cultures and trials with discovering common ethical drives should be possible.
|