Project/Area Number |
13680452
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Kyushu Institute Of Technology |
Principal Investigator |
ENDO Tsutomu Kyushu Institute Of Technology, Faculty of Computer Science and Systems Engineering, Professor, 情報工学部, 教授 (10112294)
|
Co-Investigator(Kenkyū-buntansha) |
SHIMADA Kazutaka Kyushu Institute Of Technology, Faculty of Computer Science and Systems Engineering, Research Associate, 情報工学部, 助手 (50346863)
徳久 雅人 九州工業大学, 情報工学部, 助手 (10274557)
|
Project Period (FY) |
2001 – 2003
|
Project Status |
Completed (Fiscal Year 2003)
|
Budget Amount *help |
¥2,400,000 (Direct Cost: ¥2,400,000)
Fiscal Year 2003: ¥600,000 (Direct Cost: ¥600,000)
Fiscal Year 2002: ¥600,000 (Direct Cost: ¥600,000)
Fiscal Year 2001: ¥1,200,000 (Direct Cost: ¥1,200,000)
|
Keywords | multimedia document / WWW / summarization / information integration / information retrieval / information extraction / Kansei information / document generation |
Research Abstract |
This research intends to develop a system which summarizes product (PCs) information retrieved from Web sites based on relational structure between text, tables and images, and presents products suitable for a user's request. 1.Extraction of product specifications from HTML documents. We proposed a method for extracting specifications from HTML documents using TSVMs (Transductive Support Vector Machines). The elements of a feature vector are keywords with normalized TF-DF weighting. We achieved 95% recall with 99% precision. 2.Characteristic-data extraction and support system for PC selection. The specifications written in HTML are converted into normal form called table structure. The quantitative attributes are extracted by comparing them with the mean or mode of all sample data, and the qualitative ones are extracted using knowledge provided manually. The recommended PCs are dynamically determined from the extracted data by a user's request and relevance feedback. Moreover, a radar chart and Japanese sentences are generated from specifications. 3.Classification of images and feature extraction. We proposed a method for classifying the contents of images using weighted keywords extracted from their neighboring sentences. We achieved 79% accuracy by TF-IDF weighting. We also developed a system which eliminates the background from a PC image, and classify the color of PC using C4.5.
|