2005 Fiscal Year Final Research Report Summary
Study on Information Utilization System for Heterogeneous Contents
Project/Area Number |
13224087
|
Research Category |
Grant-in-Aid for Scientific Research on Priority Areas
|
Allocation Type | Single-year Grants |
Review Section |
Science and Engineering
|
Research Institution | National Institute of Informatics |
Principal Investigator |
ADACHI Jun National Institute of Informatics, Software Research Division, Professor, ソフトウェア研究系, 教授 (80143551)
|
Co-Investigator(Kenkyū-buntansha) |
AIZAWA Akiko National Institute of Informatics, Research Center for Information Resources, Professor, 情報学資源研究センター, 教授 (90222447)
KANDO Noriko National Institute of Informatics, Software Research Division, Professor, ソフトウェア研究系, 教授 (80270445)
KAGEURA Kyo Tokyo University, Graduate school of Education, Associate Professor, 教育学研究科, 助教授 (00211152)
TAKASU Atsuhiro National Institute of Informatics, Research Center for Testbeds and Prototyping, Professor, 実証研究センター, 教授 (90216648)
AIHARA Kenro National Institute of Informatics, Software Research Division, Associate Professor, ソフトウェア研究系, 助教授 (90300706)
|
Project Period (FY) |
2001 – 2005
|
Keywords | Informatics / Information Retrieval / Text Processing / Text Mining / Multimedia Processing / Data Engineering |
Research Abstract |
This project aims at developing technology for utilizing the heterogeneous contents. We studied link and structural analysis of Webs, cross-media processing technology, epistemological framework of the Web and developed corpora for evaluating information utilization methods for the Web. 1) We developed an information extraction and organization methods using the textual and graphical structure of the Web -Web page clustering methods based on the link structure -Topic tracking using non-linear time-content analysis 2) We proposed some advanced methods for processing and utilizing multimedia as follows, focusing on media heterogeneity: -topic detection from multilingual text collection -user adaptive text summarization based on content types -crossmedia search by enhancing annotation-based image retrieval model with content-based features -JuNii+: user interface for image retrieval -utilizing interview video archives for learning 3) We organized a series of evaluation workshops "NTCIR", in which a number of researchers participated to develop new testbeds, each of which consists of a common test data for research on heterogeneous digital content. As the results, for instance, we built up a terabyte-scale dataset by crawling the -jp domain, and established evaluation methodologies to meet the practical situation. These contributed to the progress of the research in this area 4) We analyzed the epistemological framework within which engineers process and model the Web information sources, contrasting it with the modern system of printed books. On the basis of the analysis, we concluded that it is hard to directly apply the model defined by the quintessentially modern concept of information accumulation as represented in the ideal of libraries, and showed that "information editing" would be necessary to explore fully the potential of web information sources.
|
Research Products
(34 results)