• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2001 Fiscal Year Final Research Report Summary

A Study about automatic domain term extraction from corpus

Research Project

Project/Area Number 12680368
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field Intelligent informatics
Research InstitutionThe University of Tokyo

Principal Investigator

NAKAGAWA Hiroshi  Information Technology Center, The University of Tokyo, Professor, 情報基盤センター, 教授 (20134893)

Co-Investigator(Kenkyū-buntansha) TANAKA Kumiko (ISHII Kumiko)  Interfaculty Information Initiative, The University of Tokyo, Lecturer, 大学院・情報学環, 講師 (10323528)
Project Period (FY) 2000 – 2001
KeywordsTerm Extraction / Information Extraction / domain term / Corpus / Translation / NICIR / Natural Language Processing / Information Retrieval
Research Abstract

We mainly grappled with automatic term extraction methods which extracts domain specific terms from domain corpora that were distributed by NTCIR1 TMREC task group. Among various works in automatic term extraction, the majority of them are concerned with statistics like frequency in corpora, and few focused on the characteristics of space which consists of extracted terms. In this work, we mainly focus on the latter. We propose the method which uses the statistical relation between compound nouns, that are up to 85% of all terms and the remaining 15% of simple nouns. For instance, if we have many compound nouns such as "human information system", "social information system" and so on, the importance of "information" is defined as how many kinds of nouns adjoin or are adjoined with "information." Then, the importance of compound noun is defined as the geometric means of its component nouns. Our system consists of (l) morphological analysis, (2)extracting candidate terms, (3) assign each candidate term its importance value and (4) evaluation with NTCIR1 TMREC test collection. The proposed method shows the high score among methods participating NTCIR1. We also localize our method to English in order for translation extraction to be investigated the next year.

  • Research Products

    (8 results)

All Other

All Publications (8 results)

  • [Publications] Hiroshi Nakagawa: "Automatic Term Recognition based on Statistics of Compound Nouns"Terminology. Vol.6 No.2. 195-210 (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Tatsunori Mori, Mamoru Matsuo, Hiroshi Nakagawa: "Zero pronoun resolution by Linguistic Constraints and Defaults --The case of Japanese Instruction Manuals--"SPECIAL ISSUE ON ANAPHORA RESOLUTION IN MACHINE TRANSLATION, (Ruslan Mitkov editor), Machine Translation. 14. 231-245 (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Hiroshi Nakagawa: "Disambiguation of Lexical Translations Based on Bilingual Comparable Corpora"2nd International Conference on Language Resources and Evaluation : LREC2000 Workshop of Terminology Resorces and Computation :. WTRC2000. 33-38 (2000)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Hiroshi Nakagawa: ""Experimental evaluation of ranking and selection methods in term extraction", "Recent Advances in Computational Terminology", D.Bourigault, C.Jacquemin, M.-C. L'Homme (editors)"John Benjamins. (303-325) (2001)

    • Description
      「研究成果報告書概要(和文)」より
  • [Publications] Hiroshi Nakagawa: "Automatic Term Recognition based on Statistics of Compound Nouns"Terminology. Vol. 6,No. 2. 195-210 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Tatsunori Mori, Mamoru Matsuo, Hiroshi Nakagawa: "Zero pronoun resolution by Linguistic Constraints and Defaults The case of Japanese Instruction Manuals"SPECIAL ISSUE ON ANAPHORA RESOLUTION IN MACHINE TRANSLATION, (Ruslan Mitkov editor), Machine Translation. Vol. 14. 231-245 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Hiroshi Nakagawa: "Disambiguation of Lexical Translations Based on Bilingual Comparable Corpora", 2nd International Conference on Language Resources and Evaluation : LREC2000 Workshop of Terminology Resources and Computation"WTRC2000 Athens. 33-38 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
  • [Publications] Hiroshi Nakagawa: ""Experimental evaluation of ranking and selection methods in term extraction", "Recent Advances in Computational Terminology", D. Bourigault, C. Jacquemin, M.-C. L'Homme (editors),"John Benjamins. 303-325 (2001)

    • Description
      「研究成果報告書概要(欧文)」より

URL: 

Published: 2003-09-17   Modified: 2021-12-08  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi