2014 Fiscal Year Final Research Report
Hierarchical Discovery of Sub-structures and Rare Patterns of Them in Large Text Data
Project/Area Number |
24300059
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Partial Multi-year Fund |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Kyushu University |
Principal Investigator |
IKEDA Daisuke 九州大学, システム情報科学研究科(研究院, 准教授 (00294992)
|
Co-Investigator(Kenkyū-buntansha) |
NAKATOH Tetsuya 九州大学, 情報基盤研究開発センター, 助教 (20253502)
YAMADA Yasuhiro 島根大学, 大学院総合理工学研究科, 助教 (50529609)
|
Co-Investigator(Renkei-kenkyūsha) |
BABA Kensuke 九州大学, 附属図書館, 准教授 (70380681)
|
Project Period (FY) |
2012-04-01 – 2015-03-31
|
Keywords | 例外文字列パタン / 純度の高いパタン / purity measure |
Outline of Final Research Achievements |
This research is devoted to finding infrequent patterns of frequent sub-patterns from large text data. Because the text data follows Zipf's law, there exist so many infrequent patterns. Therefore, the goal is quite challenging. Among so many candidates of infrequent patterns, we try to find relatively many, but absolutely few, composite patterns of frequent patterns. To do so, our two basic approaches are to extend the framework of peculiar patterns we have already developed and to create a new framework based on pure patterns. For both approaches, we evaluated their effectiveness using bacterial genome sequences. In addition to them, we developed mining methods for data in various fields, such as clustering geotagged blogs, context-aware information retrieval, and query expansion for academic theses.
|
Free Research Field |
テキストマイニング
|