Project/Area Number |
24300059
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Partial Multi-year Fund |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Kyushu University |
Principal Investigator |
IKEDA Daisuke 九州大学, システム情報科学研究科(研究院, 准教授 (00294992)
|
Co-Investigator(Kenkyū-buntansha) |
NAKATOH Tetsuya 九州大学, 情報基盤研究開発センター, 助教 (20253502)
YAMADA Yasuhiro 島根大学, 大学院総合理工学研究科, 助教 (50529609)
|
Co-Investigator(Renkei-kenkyūsha) |
BABA Kensuke 九州大学, 附属図書館, 准教授 (70380681)
|
Project Period (FY) |
2012-04-01 – 2015-03-31
|
Project Status |
Completed (Fiscal Year 2014)
|
Budget Amount *help |
¥9,230,000 (Direct Cost: ¥7,100,000、Indirect Cost: ¥2,130,000)
Fiscal Year 2014: ¥2,860,000 (Direct Cost: ¥2,200,000、Indirect Cost: ¥660,000)
Fiscal Year 2013: ¥3,640,000 (Direct Cost: ¥2,800,000、Indirect Cost: ¥840,000)
Fiscal Year 2012: ¥2,730,000 (Direct Cost: ¥2,100,000、Indirect Cost: ¥630,000)
|
Keywords | 例外文字列パタン / 純度の高いパタン / purity measure / テキストマイニング / 稀少パタン発見 / 例外パタン / 近似文字列照合 / purity / 希少パタン発見 |
Outline of Final Research Achievements |
This research is devoted to finding infrequent patterns of frequent sub-patterns from large text data. Because the text data follows Zipf's law, there exist so many infrequent patterns. Therefore, the goal is quite challenging. Among so many candidates of infrequent patterns, we try to find relatively many, but absolutely few, composite patterns of frequent patterns. To do so, our two basic approaches are to extend the framework of peculiar patterns we have already developed and to create a new framework based on pure patterns. For both approaches, we evaluated their effectiveness using bacterial genome sequences. In addition to them, we developed mining methods for data in various fields, such as clustering geotagged blogs, context-aware information retrieval, and query expansion for academic theses.
|