Project/Area Number |
12558038
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 展開研究 |
Research Field |
情報システム学(含情報図書館学)
|
Research Institution | Keio University |
Principal Investigator |
UEDA Shuichi Keio University, Faculty of Letters, Professor, 文学部, 教授 (50134218)
|
Co-Investigator(Kenkyū-buntansha) |
WATANABE Michiko Railway Technical Research Institute, Technical Support Center, Researcher, 技術支援部, 技師
KUNO Takashi Sakushin Gakuin University, Women's College, lecturer, 女子短期大学部, 専任講師 (30310212)
AGATA Teru Asia University, Faculty of International Relations, lecturer, 国際関係学部, 専任講師 (80306505)
|
Project Period (FY) |
2000 – 2001
|
Project Status |
Completed (Fiscal Year 2001)
|
Budget Amount *help |
¥5,600,000 (Direct Cost: ¥5,600,000)
Fiscal Year 2001: ¥2,900,000 (Direct Cost: ¥2,900,000)
Fiscal Year 2000: ¥2,700,000 (Direct Cost: ¥2,700,000)
|
Keywords | World Wide Web / Search engine / Web page / Automatic Classification / サーチエンジン / 自動格付け |
Research Abstract |
The amount of World Wide Web (WWW) pages has grown dramatically over the last few years with the growth of internet. It is estimated that there are currently over 3,200 million WWW pages. In order to satisfy the requirement for new search engines for WWW pages, it is necessary to develop automatic mechanisms for the deletion of less important pages, judgment of usefulness of pages, and subject classification for Web pages. The first year, the automatic judging procedure for page type was developed. Web page were typed manually to standard pages, top pages, contents pages, bulletin boards, chat pages, link pages, diary pages, and input forms. The automatic judgment method based on quantitative analysis of judged pages was developed. The algorithm of a type judgment was based on the frequency of appearance of HTML tags, page length or words in titles and file names obtained from Web pages in Japanese. In the second year, the total amount of a Web page was estimated, and automatic judgment system of useful Web pages and automatic classification system were developed. The algorithm of automatic judgment system is based on the morphological analysis of pages which obtained the high score by the judgment of "being good sources of information". In order to classify WWW pages in Japanese by subject, we present two classification algorithms based on relative frequencies of terms and information retrieval technique using vector-space model. These methods are included in the search engine and it participated in the 2nd NTCIR workshop Web task.
|