2000 Fiscal Year Final Research Report Summary
RESEARCH ON KEY WORDS SYSTEMATIZATION FOR INFORMATION RETRIEVAL OF EDUCATIONAL RESOURCE ON THE INTERNET
Project/Area Number |
11680210
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Educational technology
|
Research Institution | TOKYO INSTITUTE OF TECHNOLOGY |
Principal Investigator |
NAKAYAMA Minoru TOKYO INSTITUTE OF TECHNOLOGY, THE CENTER FOR RESEARCH AND DEVELOPMENT OF EDUCATIONAL TECHNOLOGY, Assoc.Professor, 教育工学開発センター, 助教授 (40221460)
|
Co-Investigator(Kenkyū-buntansha) |
MASAO Murota TOKYO INSTITUTE OF TECHNOLOGY, GRADUATE SCHOOL OF DECISION SCIENCE AND TECHNOLOGY, Assoc.Professor, 大学院・社会理工学研究科, 助教授 (30222342)
NISHIKATA Atsuhiro TOKYO INSTITUTE OF TECHNOLOGY, THE CENTER FOR RESEACH AND DEVELOPMENT OF EDUCATIONAL TECHNOLOGY, Assoc.Professor, 教育工学開発センター, 助教授 (60260535)
SHIMIZU Yasutaka TOKYO INSTITUTE OF TECHNOLOGY, GRADUATE SCHOOL OF DECISION SCIENCE AND TECHNOLOGY, Professor, 大学院・社会理工学研究科, 教授 (10016561)
AOYAGI Takahiro TOKYO INSTITUTE OF TECHNOLOGY, THE CENTER FOR RESEACH AND DEVELOPMENT OF EDUCATIONAL TECHNOLOGY, RESEACH ASSOCIATE, 教育工学開発センター, 助手 (10302944)
|
Project Period (FY) |
1999 – 2000
|
Keywords | EDUCATIONAL RESOURCE / WEB INFORMATION / SINGULAR VALUE DECOMPOSITION / INDEXING / SELF-ORGANIZING MAP / DOCUMENT RETRIEVAL / DOCUMENT CLASSIFICATION |
Research Abstract |
The aim of this research is to efficient categorization of Internet information for school teaching. This thesis conducts categorizing websites to subjects-units, based on the analysis of term-frequencies in teaching textbooks. A term-document matrix is generated by extracting terms from unit-documents with morphological analysis and summarizing terms' frequency. Each element of this matrix is the frequency of every term in every class unit. Methods to extract feature vectors of terms and documents from the adjusted matrix using SVD (Singular Value Decomposition), and to composite feature vectors of arbitrary documents from the terms' vectors are established. A suitable class-unit for the document is estimated based on the similarity of these feature vectors. A test-collection including over 120 websites, each of which has an answer made by high-school teachers, was used for the evaluation. Parameters and variables in the procedures, such as, which word classes to use, how to adjust the
… More
term-document matrix, the dimension size of feature vectors, etc. were optimized based on this evaluation. Results were given as a list of unit-titles in descending order of similarity between the site's feature vector and the units' vectors. To illustrate the structure of websites, SOM (Self Organizing Map), which is a kind of a non-supervised neural network learning, is used. As a result, the textbook concept space was visualized, and mapping every page that constructs an arbitrary site showed how the site-pages were distributed in this space. The validity of this map is also discussed and guidelines in reading these maps are shown. In conclusion, a document classification specialized to educational information was done. Methods to extract feature vectors from descriptions in textbooks and to composite vectors of arbitrary documents are established. Matching class-units of the document is acquired by comparing similarities between the feature vectors. Visualization of categorizing results and website-structures are also done using SOM.It is shown that using these methods, categorization and visualization of websites or documents is feasible. Less
|
Research Products
(9 results)