2006 Fiscal Year Final Research Report Summary
Automated synthesis of frequent event-sequences corpus from large-scale textual data and its application to WEB content tracking
Project/Area Number |
16500078
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | University of Yamanashi |
Principal Investigator |
IWANUMA Koji University of Yamanashi, Department of Research Interdisciplinary Graduate School of Medicine and Engineering, Professor, 大学院医学工学総合研究部, 教授 (30176557)
|
Project Period (FY) |
2004 – 2006
|
Keywords | sequential data mining / frequent sequence / text / WEB / online algorithm / newspaper article / relaxation method / event sequence corpus |
Research Abstract |
This research we studied and developed the following technologies: 1. a novel and rational frequency measure, called Total Frequency Measure, which satisfies anti-monotonic property and never causes duplicated counting within a very long single data sequence. 2. a online fast sequential data mining algorithm for extracting frequent subsequences within the framework of a infinite-length window. 3. a fast sequential mining algorithm based on the relaxation method which is intended for use for the framework of a finite-length window. 4. a intelligent sequential data mining method which uses an integrated occurrence criteria of frequency and information gain for subsequences. 5. a sequential pattern mining method for WEB access logs, which enables us to analyze access log data with considering page-staying time sequences 6. a new method for extracting important key words and/or phrases from newspaper articles in a huge newspaper corpus. We showed the significance of the above technologies throughout huge amounts of experiments for evaluation.
|
Research Products
(18 results)