2020 Fiscal Year Final Research Report

Research on the effectiveness of using RNN in topic models

Research Project

PDF

Project/Area Number	18K11440
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Review Section	Basic Section 61030:Intelligent informatics-related
Research Institution	Rikkyo University (2020) Nagasaki University (2018-2019)
Principal Investigator	MASADA Tomonari 立教大学, 人工知能科学研究科, 教授 (60413928)
Project Period (FY)	2018-04-01 – 2021-03-31
Keywords	機械学習 / テキストマイニング / トピックモデル / 深層学習
Outline of Final Research Achievements	Topic models, including LDA (latent Dirichlet allocation), can automatically extract semantically meaningful themes from a large corpus. However, text analysis using topic models often only considers word frequencies in a document and does not consider the way words are arranged. This work aims to improve topic models with RNN (recurrent neural network) for modeling word order. Several previous studies propose a method for combining RNN with topic models. Therefore, we have tried to propose a new method. As a result, we have proposed a new topic model using NNs (neural networks), where we perform no VAE (variational autoencoder) inference. We instead maximize the target given in the original LDA paper by training NNs in an amortized manner and obtaining posterior parameters as output of NNs. However, we currently only use MLP (multilayer perceptron) and thus have not achieved our goals yet. We now have a plan to replace MLP with RNN or other more recent NN architectures in near future.
Free Research Field	機械学習
Academic Significance and Societal Importance of the Research Achievements	20年近くテキストマイニングに使われてきたトピックモデルと、新しい技術である深層学習とを、どのように組み合わせれば効果的なテキストマイニングが実現できるか。この問いに本研究は取り組んだ。成果としては中途の段階ではあるが、従来研究でこの組み合わせを実現するために使われている変分オートエンコーダとは異なるアイディアにもとづいて、トピックモデルと深層学習を組み合わせる可能性が確かに見えたのは重要な成果である。この方向で研究を続ければ、膨大な文書集合に潜む多様な話題を抽出するツールとしてのトピックモデルを、深層学習による言語データのモデリングと組み合わせることで、さらに強力にすることができるだろう。