• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2007 Fiscal Year Final Research Report Summary

A Study for Knowledge ExtractionAid System from Web Text

Research Project

Project/Area Number 17200007
Research Category

Grant-in-Aid for Scientific Research (A)

Allocation TypeSingle-year Grants
Section一般
Research Field Intelligent informatics
Research InstitutionThe University of Tokyo

Principal Investigator

NAKAGAWA Hiroshi  The University of Tokyo, Information Technology Center, Professor (20134893)

Co-Investigator(Kenkyū-buntansha) YONEZAWA Akinori  The University of Tokyo, Graduate School of Information Science and Technology, Professor (00133116)
TAURA Kenjiro  The University of Tokyo, Graduate School of Information Science and Technology, Assistant Professor (90282714)
NINOMIYA Takashi  The University of Tokyo, Information Technology Center, Lecturer (20444094)
YOSHIDA Minoru  The University of Tokyo, Information Technology Center, Assistant Professor (40361688)
KIYOTA Youji  The University of Tokyo, Information Technology Center, Assistant Professor (10401316)
Project Period (FY) 2005 – 2007
KeywordsWWW / Knowledge / Text / Mining / Usage Retrieval / People Name Search / Terminology Extraction / Machine Learning
Research Abstract

We aimed at a system that extracts texts or part of texts including knowledge which various users are interested in from huge amount of Web pages in this research. We developed the following systems for this purpose.
(1) A system which extracts terms that characterize a search engine result web pages using the term extraction system "Gensen Web" which we have already developed.
(2) A system which extracts definition of terms which we extract by the system of (1) and relations among these terms. To accomplish this task, we utilize the usage consultation system via Web search engine called "Kiwi."
(3) In order to make more efficient system of (2), we employed a suffix array technology and use the web pages crawled in advance. We named the system as "UT-Kiwi" and made it publically available from the Internet.
(4) To enhance the above described systems, we developed a people name search engine named "Nayose." When we search pages for given people name, we get pages indicating distinct person even though they have the same name. Our system clusters those web pages according to the real person.
(5) Aiming at more innovative knowledge extraction, we also studied new machine learning algorithms based on non-parametric Bayes theory.
(6) Utilize web page in English more, we developed the Sakumon system which is an assisting system for English cloze test using English web pages.

  • Research Products

    (56 results)

All 2008 2007 2006 2005 Other

All Journal Article (15 results) (of which Peer Reviewed: 8 results) Presentation (39 results) Remarks (2 results)

  • [Journal Article] Dirichlet Process Unigram Mixture Modelに対するCollapsed変分ベイズ法の適用2007

    • Author(s)
      佐藤一誠、中川裕志
    • Journal Title

      情報処理学会論文誌 48 TOM19

      Pages: 107-116

    • Description
      「研究成果報告書概要(和文)」より
    • Peer Reviewed
  • [Journal Article] Semi-structure Mining Method for Text Mining with a Chunk-based Dependency Structure2007

    • Author(s)
      Issei Sato, Hiroshi Nakagawa
    • Journal Title

      Springer LNAI 4426

      Pages: 777-784

    • Description
      「研究成果報告書概要(和文)」より
    • Peer Reviewed
  • [Journal Article] Application of Variational Bayse to Dirichlet Process Unigram Mixture Model2007

    • Author(s)
      Issei Sato, Hiroshi Nakagawa
    • Journal Title

      IPSJ Transaction 48

      Pages: 107-116

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Fast and scalable HPSG parsing2006

    • Author(s)
      Ninomiya, Takashi, Yoshimasa Tsuruoka, Yusuke Miyao, Kenjiro Taura and Jun'ichi Tsujii.
    • Journal Title

      Journal of Traitement Automatique des Langues(TAL). 46(2)

      Pages: 91-114

    • Description
      「研究成果報告書概要(和文)」より
    • Peer Reviewed
  • [Journal Article] NAYOSE: A System fbr Reference Disambiguation of Proper Nouns Appearing on Web Pages2006

    • Author(s)
      Shingo Ono, Minoru Yoshida, Hiroshi Nakagawa
    • Journal Title

      Springer LNCS 4182

      Pages: 338-349

    • Description
      「研究成果報告書概要(和文)」より
    • Peer Reviewed
  • [Journal Article] Fast and scalable HPSG parsing2006

    • Author(s)
      Ninomiya, Takashi, Yoshimasa Tsuruoka, Yusuke Miyao, Kenjiro Taura, Jun'ichi Tsujii
    • Journal Title

      Journal of Traitement Automatique des Langues (TAL) 46(2)

      Pages: 91-114

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] NAYOSE : A System for Reference Disambiguation of Proper Nouns Appearing on Web Pages2006

    • Author(s)
      Shingo Ono, Minoru Yoshida, Hiroshi Nakagawa
    • Journal Title

      Springer LNCS 4182

      Pages: 338-349

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Specification Retrieval-How to Find Attribute-Value Information on the Web2005

    • Author(s)
      Minoru Yoshida and Hirhoshi Nakagawa
    • Journal Title

      Springer LNCS 3248

      Pages: 338-347

    • Description
      「研究成果報告書概要(和文)」より
    • Peer Reviewed
  • [Journal Article] Webと携帯端末向けの新聞記事の対応コーパスからの文末言い換え抽出2005

    • Author(s)
      岩越守孝、増田英孝、中川裕志
    • Journal Title

      自然言語処理 12(4)

      Pages: 157-184

    • Description
      「研究成果報告書概要(和文)」より
    • Peer Reviewed
  • [Journal Article] Kiwi:多言語用例検索システム2005

    • Author(s)
      中川裕志
    • Journal Title

      漢字文献情報処理研究 6

      Pages: 116-123

    • Description
      「研究成果報告書概要(和文)」より
    • Peer Reviewed
  • [Journal Article] Extracting Paraphrases of Japanese Action Word of Sentence Ending Part From Web and Mobile News Articles2005

    • Author(s)
      Hiroshi Nakagawa and Hidetaka Masuda
    • Journal Title

      Springer LNCS 3411

      Pages: 94-105

    • Description
      「研究成果報告書概要(和文)」より
    • Peer Reviewed
  • [Journal Article] Specification Retrieval-How to Find Attribute-Value Information on the Web2005

    • Author(s)
      Minoru Yoshida and Hiroshi Nakagawa
    • Journal Title

      Springer LNCS 3248

      Pages: 338-347

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] Sentence Final Paraphrase Extraction from Aligned Corpus of News Articles for Web and Mobile Terminals2005

    • Author(s)
      Moritaka Iwakoshi, Hidetaka Masuda, Hiroshi Nakagawa
    • Journal Title

      Natural Language Processing (in Japanese) 12(4)

      Pages: 157-184

    • Description
      「研究成果報告書概要(欧文)」より
  • [Journal Article] 「研究成果報告書概要(欧文)」より2005

    • Author(s)
      Hiroshi Nakagawa, Kiwi
    • Journal Title

      Kanji-Bunken-Jouhou-Shori (in Japan) 6

      Pages: 116-123

  • [Journal Article] Extracting Paraphrases of Japanese Action Word of Sentence Ending Part From Web and Mobile News Articles2005

    • Author(s)
      Hiroshi Nakagawa, Hidetaka Masuda
    • Journal Title

      Springer LNCS 3411

      Pages: 94-105

    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] Gram-Free Synonym Extraction via Suffix Arrays2008

    • Author(s)
      Minoru Yoshida, Hiroshi Nakagawa
    • Organizer
      AIRS2008 (Asia Information Retrieval Symposium 2008)
    • Place of Presentation
      Harbin, China
    • Year and Date
      20080115-18
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] Gram-Free Synonym Extraction via Suffix Arrays2008

    • Author(s)
      Minoru Yoshida, Hiroshi Nakagawa
    • Organizer
      AIRS2008
    • Place of Presentation
      Harbin, China
    • Year and Date
      20080115-18
    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] Web Document Parsing: A New Approach to Modeling Layout-Language Relations2007

    • Author(s)
      Minoru Yoshida, Hiroshi Nakagawa
    • Organizer
      ICDAR2007 (The 9th International Conference on Document Analysis and Recognition)
    • Place of Presentation
      Curitiba, Brazil
    • Year and Date
      20070923-26
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] Web Document Parsing A New Approach to Modeling Layout-Language Relations2007

    • Author(s)
      Minoru Yoshida, Hiroshi Nakagawa
    • Organizer
      ICDAR2007
    • Place of Presentation
      Curitiba, Brazil
    • Year and Date
      20070923-26
    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] Knowledge Discovery of Multiple-topic Document using Parametric Mixture Model with Dirichlet Prior2007

    • Author(s)
      Issei Sato, Hiroshi Nakagawa
    • Organizer
      Thirteenth ACM SIGKDD
    • Place of Presentation
      SanJose, USA
    • Year and Date
      20070815-18
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] Kmowledge Discovery of Multiple-topic Document using Paametric Mixture Modle with Dirichlet Prior2007

    • Author(s)
      Issei Sato, Hiroshi Nakagawa
    • Organizer
      Thirteenth ACM SIGKDD
    • Place of Presentation
      SanJose, USA
    • Year and Date
      20070815-18
    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] A Cloze Test Authoring System and its Automation2007

    • Author(s)
      Ayako Hoshino, Hiroshi Nakagawa
    • Organizer
      ICWL2007-The 6th International Conference on Web-based Learning
    • Place of Presentation
      Edingburgh, Scotland
    • Year and Date
      20070815-17
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] A Cloze Test Authoring System and its Automation2007

    • Author(s)
      Ayako Hoshino, Hiroshi Nakagawa
    • Organizer
      ICWL2007
    • Place of Presentation
      Edingburgh, Scotland
    • Year and Date
      20070815-17
    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] Cross-Lingual Concern Analysisi from Multiingual Weblog Articles2007

    • Author(s)
      Tomohiro Fukuhara, Takehito Utsuro, Hiroshi Nakagawa
    • Organizer
      The 6th International Workshop on Social Intelligence Design (SID 2007)
    • Place of Presentation
      Trent, Italy
    • Year and Date
      20070702-04
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] Cross-Lingual Concern Analysisi from Multiingual Weblog Articles2007

    • Author(s)
      Tomohiro Fukuhara, Takehito Utsuro, Hiroshi Nakagawa
    • Organizer
      6th SID 2007
    • Place of Presentation
      Trent, Italy
    • Year and Date
      20070702-04
    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] Bayesian Document Generative Model with Explicit Multiple Topics2007

    • Author(s)
      Issei Sato, Hiroshi Nakagawa
    • Organizer
      EMNLP-CoNLL2007
    • Place of Presentation
      Prague, Czech
    • Year and Date
      20070625-28
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] Structural Correspondence Learning for Dependency Parsing2007

    • Author(s)
      Nobuyuki Shimizu, Hiroshi Nakagawa
    • Organizer
      EMNLP-CoNLL-ST
    • Place of Presentation
      Prague, Czech
    • Year and Date
      20070625-28
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] Alog-linear model with an n-gram reference distribution for accurate HPSG parsing2007

    • Author(s)
      Ninomiya, Takashi, Takuya Matsuzaki, Yusuke Miyao and Jun'ichi Tsujii
    • Organizer
      IWPT-2007
    • Place of Presentation
      Prague, Czech
    • Year and Date
      20070623-24
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] A log-linear model with an n-gram reference distribution for accurate HPSG parsing2007

    • Author(s)
      Ninomiya, Takashi, Takura Matsuzaki, Yusuke Miyao, Jun'ichi Tsujii
    • Organizer
      IWPT-2007
    • Place of Presentation
      Prague, Czech
    • Year and Date
      20070623-24
    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] Semi-structure Mining Method for Text Mining with a Chunk-based Dependency Structure2007

    • Author(s)
      Issei Sato, Hiroshi Nakagawa
    • Organizer
      PAKDD'07 (The llth Pacific-Asia Conference on Knowledge Discovery and Data Mining)
    • Place of Presentation
      Nanjin, China
    • Year and Date
      20070522-25
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] Semi-structure Mining Method for Text Mining with a Chunk-based Dependency Structure2007

    • Author(s)
      Issei Sato, Hiroshi Nakagawa
    • Organizer
      PAKDD'07
    • Place of Presentation
      Nanjin, China
    • Year and Date
      20070522-25
    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] Assisting cloze test making with a web application2007

    • Author(s)
      Ayako Hoshino, Hiroshi Nakagawa
    • Organizer
      SITE 2007-Society for Information Technology & Teacher Education International Conference
    • Place of Presentation
      San Antonio, Texas, USA
    • Year and Date
      20070326-30
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] Sakumon: An assisting system for English cloze test2007

    • Author(s)
      Ayako Hoshino, Htroshi Nakagawa
    • Organizer
      SITE 2007-Society for Information Technology & Teacher Education International Conference (Demo)
    • Place of Presentation
      Texas, USA
    • Year and Date
      20070326-30
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] Assisting cloze test making with a web application2007

    • Author(s)
      Ayako Hoshino, Hiroshi Nakagawa
    • Organizer
      SITE 2007
    • Place of Presentation
      San Antonio, Texas, USA
    • Year and Date
      20070326-30
    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] Sakumon : An assisting system for English cloze test2007

    • Author(s)
      Ayako Hoshino, Hiroshi Nakagawa
    • Organizer
      SITE 2007 (Demo)
    • Place of Presentation
      San Antonio, Texas, USA
    • Year and Date
      20070326-30
    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] Understanding Sentiment of People from News Articles: Temporal Sentiment Analysis of Social Events2007

    • Author(s)
      Tomohiro Fukuhara, Hiroshi Nakagawa, Toyoaki Nishida
    • Organizer
      ICWSM-2007-Int. Conf. on Weblogs and Social Media
    • Place of Presentation
      Boulder, Colorado, USA
    • Year and Date
      20070326-28
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] Understanding Sentiment Analysis of Social Events2007

    • Author(s)
      Tomohiro Fukuhara, Hiroshi Nakagawa, Toyoaki Nishida
    • Organizer
      ICWSM-2007
    • Place of Presentation
      Boulder, Colorado, USA
    • Year and Date
      20070326-28
    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] NAYOSE: A System for Reference Disambiguation of Proper Nouns Appearing on Web Pages2006

    • Author(s)
      Shingo Ono, Minoru Yoshida, Hiroshi Nakagawa
    • Organizer
      AIRS2006 (Asia Information Retrieval Symposium)
    • Place of Presentation
      Singapore
    • Year and Date
      20061016-18
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] NAYOSE : A system for Reference Disambiguation of Proper Nouna Appearing on Web Pages2006

    • Author(s)
      Shingo Ono, Minoru Yoshida, Hiroshi Nakagawa
    • Organizer
      AIRS2006
    • Place of Presentation
      Singapore
    • Year and Date
      20061016-18
    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] A Domain Ontology Production Tool Kit Based on Automatically Constructed Case Frames2006

    • Author(s)
      Youji Kiyota, Hiroshi Nakagawa
    • Organizer
      LREC2006
    • Place of Presentation
      Genova, Italy
    • Year and Date
      20060525-27
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] A Domain Ontology Production Tool Kit Based ON Automatically Constructed Case Frames2006

    • Author(s)
      Youji Kiyta, Hiroshi Nakagawa
    • Organizer
      LREC2006
    • Place of Presentation
      Genova, Italy
    • Year and Date
      20060525-27
    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] Browsing System for Weblog Articles based on Automated Folksonomy2006

    • Author(s)
      Tsutomu Ohkura, Youji Kiyota, Hiroshi Nakagawa
    • Organizer
      WWW2006 Workshop on the Weblogging Ecosystem
    • Place of Presentation
      Edinburgh, Scotland
    • Year and Date
      2006-05-24
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] Browsing System for Weblog Articles based on Automated Foolksonomy2006

    • Author(s)
      Tsutomu Ohkura, Youji Kiyata, Hiroshi Nakagawa
    • Organizer
      WWW2006 Workshop on the Weblogging Ecosystem
    • Place of Presentation
      Edinburgh, Scotland
    • Year and Date
      2006-05-24
    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] Automatic Term Extraction based on Perplexity of Compound Words2005

    • Author(s)
      Minoru Yoshida, Hiroshi Nakagawa
    • Organizer
      IJCNLP 2005
    • Place of Presentation
      Juje, Korea
    • Year and Date
      20051011-13
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] WebExperimenter for Multiple Choice Question Generation2005

    • Author(s)
      Ayako Hoshino, Hiroshi Nakagawa
    • Organizer
      HLT-EMNLP-05 Interactive Demonstrations
    • Place of Presentation
      Vancouver, B.C., Canada
    • Year and Date
      20051006-08
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] WebExperimenter for Multiple Choice Question Generation2005

    • Author(s)
      Ayako Hoshino, Hiroshi Nakagawa
    • Organizer
      HLT-EMNLP-05 Interactive. Demo
    • Place of Presentation
      Vancouver, B. C., Canada
    • Year and Date
      20051006-08
    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] Web-based Acquisition of Japanese Katakana Variants2005

    • Author(s)
      Takeshi Masuyama, Hiroshi Nakagawa
    • Organizer
      SIGIR2005(The 28th Annual International ACM SIGIR Conference)
    • Place of Presentation
      Salvador, Brazil
    • Year and Date
      20050815-18
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] Web-based Acquisition of Japanese Katakana Variants2005

    • Author(s)
      Takeshi Masuyama, Hiroshi Nakagawa
    • Organizer
      ACM SIGIR2005
    • Place of Presentation
      Salvador, Brazil
    • Year and Date
      20050815-18
    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] Reformatting Web Documents via Header Trees2005

    • Author(s)
      Minoru Yoshida and Hiroshi Nakagawa
    • Organizer
      43rd ACL2005 Poster/Demo
    • Place of Presentation
      Ann Arbor, USA
    • Year and Date
      20050724-30
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] A real-time multiple-choice question generation for language testing-a preliminary study-2005

    • Author(s)
      Ayako Hoshino and Hiroshi Nakagawa
    • Organizer
      43rd ACL2005 Second Workshop on Building Educational Applications Using Natural Language Processing
    • Place of Presentation
      Ann Arbor, USA
    • Year and Date
      20050724-30
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] Rerofmatting Web Documents via Header Trees2005

    • Author(s)
      Minoru Yoshida, Hiroshi Nakagawa
    • Organizer
      ACL2005 Poster/Demo
    • Place of Presentation
      Ann Arbor, USA
    • Year and Date
      20050724-30
    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] A real-time multiple-choice question generation for language testing-a preliminary study2005

    • Author(s)
      Ayako Hoshino, Hiroshi Nakagawa
    • Organizer
      ACL2005 Second Workshop on Building Educational Applications Using Natural Language Processing
    • Place of Presentation
      Ann Arbor, USA
    • Year and Date
      20050724-30
    • Description
      「研究成果報告書概要(欧文)」より
  • [Presentation] A Multilingual Usage Consultation Tool based on Internet Searching-More than search engine, Less than QA2005

    • Author(s)
      Kumiko Tanaka-Ishii, Hiroshi Nakagawa
    • Organizer
      The 14th International World Wide Web Conference (WWW2005)
    • Place of Presentation
      Chiba, Japan
    • Year and Date
      20050510-14
    • Description
      「研究成果報告書概要(和文)」より
  • [Presentation] A Multilingual Usage Consultation Tool based on INternet Searching-More than search engine, Less than QA2005

    • Author(s)
      Kumiko Tanaka-Ishii, Hiroshi Nakagawa
    • Organizer
      WWW2005
    • Place of Presentation
      Chiba, Japan
    • Year and Date
      20050510-14
    • Description
      「研究成果報告書概要(欧文)」より
  • [Remarks] 「研究成果報告書概要(和文)」より

    • URL

      http://gensen.dl.itc.u-tokyo.ac.jp/

  • [Remarks] 「研究成果報告書概要(和文)」より

    • URL

      http://kiwi.r.dl.itc.u-tokyo.ac.jp/ut-kiwi/

URL: 

Published: 2010-02-04  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi