2010 Fiscal Year Final Research Report
Web Application Technology in Retrieval System
Project/Area Number |
20500086
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Media informatics/Database
|
Research Institution | The University of Electro-Communications |
Principal Investigator |
ONAI Rikio The University of Electro-Communications, 大学院・情報理工学研究科, 教授 (70323871)
|
Co-Investigator(Kenkyū-buntansha) |
HAYASHI Takahiro 新潟大学, 自然科学系, 准教授 (60342490)
|
Project Period (FY) |
2008 – 2010
|
Keywords | 情報検索 |
Research Abstract |
Results of the study, the entire system was composed from a document collection・registration units (newly acquired crawler, updated crawler, and a document registration module) and a search unit (back-end search, indexer, and scoring module) .We consider adopting open source software in the two crawlers, Heritrix crawler was adopted as newly acquired crawler, and the updated crawler and the document registration module were implemented. Load reduction, scalability, and fault tolerance examined, Hadoop and HDFS were introduced. Aims to speed up the indexing with MapReduce, compared to our conventional method, improved to about 15 times faster (size of the index was the same). The validity of this method was gotten as a result.
|