A Study for Knowledge ExtractionAid System from Web Text
Project/Area Number |
17200007
|
Research Category |
Grant-in-Aid for Scientific Research (A)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | The University of Tokyo |
Principal Investigator |
NAKAGAWA Hiroshi The University of Tokyo, Information Technology Center, Professor (20134893)
|
Co-Investigator(Kenkyū-buntansha) |
YONEZAWA Akinori The University of Tokyo, Graduate School of Information Science and Technology, Professor (00133116)
TAURA Kenjiro The University of Tokyo, Graduate School of Information Science and Technology, Assistant Professor (90282714)
NINOMIYA Takashi The University of Tokyo, Information Technology Center, Lecturer (20444094)
YOSHIDA Minoru The University of Tokyo, Information Technology Center, Assistant Professor (40361688)
KIYOTA Youji The University of Tokyo, Information Technology Center, Assistant Professor (10401316)
辻井 潤一 東京大学, 大学院情報学環, 教授 (20026313)
|
Project Period (FY) |
2005 – 2007
|
Project Status |
Completed (Fiscal Year 2007)
|
Budget Amount *help |
¥43,420,000 (Direct Cost: ¥33,400,000、Indirect Cost: ¥10,020,000)
Fiscal Year 2007: ¥13,910,000 (Direct Cost: ¥10,700,000、Indirect Cost: ¥3,210,000)
Fiscal Year 2006: ¥13,910,000 (Direct Cost: ¥10,700,000、Indirect Cost: ¥3,210,000)
Fiscal Year 2005: ¥15,600,000 (Direct Cost: ¥12,000,000、Indirect Cost: ¥3,600,000)
|
Keywords | WWW / Knowledge / Text / Mining / Usage Retrieval / People Name Search / Terminology Extraction / Machine Learning / 検索 / テキストマイニング / 半構造テキスト / ブログ / Trie / n-gram / 情報検索 / インデキシング / 自然言語処理 / 用例 |
Research Abstract |
We aimed at a system that extracts texts or part of texts including knowledge which various users are interested in from huge amount of Web pages in this research. We developed the following systems for this purpose. (1) A system which extracts terms that characterize a search engine result web pages using the term extraction system "Gensen Web" which we have already developed. (2) A system which extracts definition of terms which we extract by the system of (1) and relations among these terms. To accomplish this task, we utilize the usage consultation system via Web search engine called "Kiwi." (3) In order to make more efficient system of (2), we employed a suffix array technology and use the web pages crawled in advance. We named the system as "UT-Kiwi" and made it publically available from the Internet. (4) To enhance the above described systems, we developed a people name search engine named "Nayose." When we search pages for given people name, we get pages indicating distinct person even though they have the same name. Our system clusters those web pages according to the real person. (5) Aiming at more innovative knowledge extraction, we also studied new machine learning algorithms based on non-parametric Bayes theory. (6) Utilize web page in English more, we developed the Sakumon system which is an assisting system for English cloze test using English web pages.
|
Report
(4 results)
Research Products
(78 results)
-
-
-
-
-
-
[Journal Article] Fast and scalable HPSG parsing2006
Author(s)
Ninomiya, Takashi, Yoshimasa Tsuruoka, Yusuke Miyao, Kenjiro Taura and Jun'ichi Tsujii.
-
Journal Title
Journal of Traitement Automatique des Langues(TAL). 46(2)
Pages: 91-114
Description
「研究成果報告書概要(和文)」より
Related Report
Peer Reviewed
-
-
[Journal Article] Fast and scalable HPSG parsing2006
Author(s)
Ninomiya, Takashi, Yoshimasa Tsuruoka, Yusuke Miyao, Kenjiro Taura, Jun'ichi Tsujii
-
Journal Title
Journal of Traitement Automatique des Langues (TAL) 46(2)
Pages: 91-114
Description
「研究成果報告書概要(欧文)」より
Related Report
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-