2009 Fiscal Year Final Research Report
Exploring Automatic Non-Topical Classification
Project/Area Number |
19700232
|
Research Category |
Grant-in-Aid for Young Scientists (B)
|
Allocation Type | Single-year Grants |
Research Field |
情報図書館学・人文社会情報学
|
Research Institution | Surugadai University |
Principal Investigator |
ISHITA Emi Surugadai University, 文化情報学部, 准教授 (50364815)
|
Project Period (FY) |
2007 – 2009
|
Keywords | テキスト自動分類 / 非主題カラゴリ / ウェブページ |
Research Abstract |
The goal of this research project is to explore the potential for automatic Web page classification based on non-topical categories (in addition to topical categories). Two kinds of classification have been explored in this project. The first was to develop a search engine to automatically detect academic articles on the Web, this was classification by document type. PDF files were collected from the Web and classified using attributes such as terms in PDF files. Second was the development and use of a new test collection for automatic labeling of sentences with ten human values. Experiment results appear promising in this preliminary study, clearly pointing to productive directions for future work.
|