2005 Fiscal Year Final Research Report Summary
A Study on Information Acquisition and Extraction from Dynamic Information Sources Based on Knowledge Discovery and Learning
Project/Area Number |
15300027
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Media informatics/Database
|
Research Institution | University of Tsukuba |
Principal Investigator |
KITAGAWA Hiroyuki University of Tsukuba, Graduate School of Systems and Information Engineering, professor, 大学院・システム情報工学研究科, 教授 (00204876)
|
Co-Investigator(Kenkyū-buntansha) |
ISHIKAWA Yoshiharu University of Tsukuba, Graduate School of Systems and Information Engineering, Associate professor, 大学院・システム情報工学研究科, 助教授 (80263440)
AMAGASA Toshiyuki University of Tsukuba, Graduate School of Systems and Information Engineering, Assistant professor, 大学院・システム情報工学研究科, 講師 (70314531)
SHOUJI Isao University of Tsukuba, Graduate School of Systems and Information Engineering, professor, 大学院・システム情報工学研究科, 教授 (20282329)
MORISHIMA Atsuyuki University of Tsukuba, Graduate School of Library, Information, and Media Studies, Associate professor, 大学院・図書館情報メディア研究科, 助教授 (70338309)
|
Project Period (FY) |
2003 – 2005
|
Keywords | Web / Knowledge discovery / Knowledge extraction / Information acquisition / Time series / XML / Outlier detection |
Research Abstract |
In this research project, we studied information extraction from information sources in the web environment and user-friendly data operation facilities based on knowledge discovery and machine learning techniques. The research results can be summarized as follows. 1.Extraction of information from hidden web sites is important. We developed a scheme to extract new topic contents form hidden web sites including document databases. 2.We developed some new information analysis methods. In particular, new methods for topic detection from text steams, outlier detection, and correlation analysis were developed. 3.We studied information extraction from the web using classification techniques. In particular, a system for web information retrieval was developed which utilizes existing taxonomy hierarchies. We also developed a new web information extraction method which combines database-oriented focused crawling and information extraction from web pages. 4.Basic schemes for information sharing in distributed environments were studied. 5.We developed a knowledge discovery technique for rule mining from transaction databases containing noise. 6.We developed a smart facility which automatically generates XML queries based on data manipulation examples given by a user. 7.Basic schemes to map binary data in original information sources into XML views were developed. 8. We studied system architecture which can integrate a variety of information extracted from different information sources taking properties of original sources into consideration.
|
Research Products
(128 results)