2007 Fiscal Year Final Research Report Summary
A Study for Knowledge ExtractionAid System from Web Text
Project/Area Number |
17200007
|
Research Category |
Grant-in-Aid for Scientific Research (A)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | The University of Tokyo |
Principal Investigator |
NAKAGAWA Hiroshi The University of Tokyo, Information Technology Center, Professor (20134893)
|
Co-Investigator(Kenkyū-buntansha) |
YONEZAWA Akinori The University of Tokyo, Graduate School of Information Science and Technology, Professor (00133116)
TAURA Kenjiro The University of Tokyo, Graduate School of Information Science and Technology, Assistant Professor (90282714)
NINOMIYA Takashi The University of Tokyo, Information Technology Center, Lecturer (20444094)
YOSHIDA Minoru The University of Tokyo, Information Technology Center, Assistant Professor (40361688)
KIYOTA Youji The University of Tokyo, Information Technology Center, Assistant Professor (10401316)
|
Project Period (FY) |
2005 – 2007
|
Keywords | WWW / Knowledge / Text / Mining / Usage Retrieval / People Name Search / Terminology Extraction / Machine Learning |
Research Abstract |
We aimed at a system that extracts texts or part of texts including knowledge which various users are interested in from huge amount of Web pages in this research. We developed the following systems for this purpose. (1) A system which extracts terms that characterize a search engine result web pages using the term extraction system "Gensen Web" which we have already developed. (2) A system which extracts definition of terms which we extract by the system of (1) and relations among these terms. To accomplish this task, we utilize the usage consultation system via Web search engine called "Kiwi." (3) In order to make more efficient system of (2), we employed a suffix array technology and use the web pages crawled in advance. We named the system as "UT-Kiwi" and made it publically available from the Internet. (4) To enhance the above described systems, we developed a people name search engine named "Nayose." When we search pages for given people name, we get pages indicating distinct person even though they have the same name. Our system clusters those web pages according to the real person. (5) Aiming at more innovative knowledge extraction, we also studied new machine learning algorithms based on non-parametric Bayes theory. (6) Utilize web page in English more, we developed the Sakumon system which is an assisting system for English cloze test using English web pages.
|
Research Products
(56 results)
-
-
-
-
[Journal Article] Fast and scalable HPSG parsing2006
Author(s)
Ninomiya, Takashi, Yoshimasa Tsuruoka, Yusuke Miyao, Kenjiro Taura and Jun'ichi Tsujii.
-
Journal Title
Journal of Traitement Automatique des Langues(TAL). 46(2)
Pages: 91-114
Description
「研究成果報告書概要(和文)」より
Peer Reviewed
-
-
[Journal Article] Fast and scalable HPSG parsing2006
Author(s)
Ninomiya, Takashi, Yoshimasa Tsuruoka, Yusuke Miyao, Kenjiro Taura, Jun'ichi Tsujii
-
Journal Title
Journal of Traitement Automatique des Langues (TAL) 46(2)
Pages: 91-114
Description
「研究成果報告書概要(欧文)」より
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-