2011 Fiscal Year Final Research Report
Information Navigation using Statistical Rhymes
Project/Area Number |
22700150
|
Research Category |
Grant-in-Aid for Young Scientists (B)
|
Allocation Type | Single-year Grants |
Research Field |
Intelligent informatics
|
Research Institution | Nagasaki University |
Principal Investigator |
MASADA Tomonari 長崎大学, 大学院・工学研究科, 准教授 (60413928)
|
Project Period (FY) |
2010 – 2011
|
Keywords | データマイニング / 確率モデル / ベイズ理論 / トピックモデル / 並列化 |
Research Abstract |
This project is based on the following assumption : Words that co-occur in statistically significant frequency can be used as a guide in useful information navigation system even when those co-occurrences are not based on semantic similarity or relatedness. We call such co-occurrences statistical rhyme. We have been trying to extract statistical rhymes with Bayesian probabilistic models. We consequently succeeded in proposing a new LDA(latent Dirichlet allocation)-like topic extraction method that can give a segmentation of word token sequences appearing in bibliographic data, which we can observe in references section of academic papers or in publications section of researchers' Web sites. Our method split each bibliographic data into the segments each corresponding to different data field, e. g. authors, paper title, journal, pages, publication year, etc. Further, we improved segmentation accuracy by making the inference semi-supervised.
|
Research Products
(4 results)