2005 Fiscal Year Final Research Report Summary
STUDIES ON KNOWLEDGE DISCOVERY FROM DATA STREAM
Project/Area Number |
16500070
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Media informatics/Database
|
Research Institution | Hosei University |
Principal Investigator |
MIURA Takao Hosei University, Dept. of Elect., Elect. & Comp. Engr., Prof., 工学部, 教授 (00219586)
|
Co-Investigator(Kenkyū-buntansha) |
SHIOYA Isamu SANNO University, Dept. of Mgmt & Informatics, Prof., 経営情報学部, 教授 (70170850)
|
Project Period (FY) |
2004 – 2005
|
Keywords | time estimation of incomplete data / skewed projection / temporal clustering / topic generation / topic tracking / Extensible Grid File / Verification of UML consistency / Colony Network |
Research Abstract |
In this research investigation, we have discussed and analyzed event detection, topic analysis and tracking issues to news stream and Web pages of current topics for the purpose of knowledge discovery from data stream. At the same time, we have proposed a general framework for modeling and manipulation of these information and examined prototype systems. Especially we have proposed several key approaches to discover important knowledge from data streams and to obtain the underlying directions within. (1)Information Retrieval for Data Stream We have examined several aspects of basic properties about projection techniques for stream information retrieval, and clarified Random Projection is robust for dimensionality reduction and dynamic environment of information. However, we see the technique isn't fully efficient so that new approaches are indispensable. We have proposed a sophisticated projection, called Skewed Projection, based on term distribution and shown the usefulness. (2)Extractio
… More
n and Estimation of Events based on Temporal Semantics We have extracted content time of data stream in Web pages and shown we can detect events by means of clustering along with the time. Also we have proposed how to estimate time correctly with the stream probabilistically manner when we can't obtain time information. (3)Labeling Events for Topic Generation and Summarization We need labeling for the extracted events, which is moving-head as text summarization. We have taken heuristic approach. In fact, we have extracted important words from the stream and put scores to the sentences by the words. By ranking the sentences, we have selected some of them according to the compression rates. We have discussed some experimental results. (4)Topic Tracking of Stream Events As Topic Extraction, we also have taken heuristic approach. We have examined adjacent events and detected the similarity between them by evaluating frequent words within, we have shown the possibility to track topics. (5)Extensible Grid File Structure for Multidimensional Information We need very high dimensionality to news streams and truly efficient structure on secondary data storage. We have proposed a new device, called EGF, and shown that it is possible to manage data of hundreds thousand dimensionality efficiently and precisely. Also we have discussed the foundation of advanced applications using our techniques, such as UML verification using Description Logic and Modeling trend propagation in a cooperative manner. Less
|
Research Products
(65 results)