• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

A Study about automatic domain term extraction from corpus

Research Project

Project/Area Number 12680368
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field Intelligent informatics
Research InstitutionThe University of Tokyo

Principal Investigator

NAKAGAWA Hiroshi  Information Technology Center, The University of Tokyo, Professor, 情報基盤センター, 教授 (20134893)

Co-Investigator(Kenkyū-buntansha) TANAKA Kumiko (ISHII Kumiko)  Interfaculty Information Initiative, The University of Tokyo, Lecturer, 大学院・情報学環, 講師 (10323528)
Project Period (FY) 2000 – 2001
Project Status Completed (Fiscal Year 2001)
Budget Amount *help
¥3,600,000 (Direct Cost: ¥3,600,000)
Fiscal Year 2001: ¥1,500,000 (Direct Cost: ¥1,500,000)
Fiscal Year 2000: ¥2,100,000 (Direct Cost: ¥2,100,000)
KeywordsTerm Extraction / Information Extraction / domain term / Corpus / Translation / NICIR / Natural Language Processing / Information Retrieval / 情報検索 / 自動用語抽出 / 対訳辞書 / 索引語
Research Abstract

We mainly grappled with automatic term extraction methods which extracts domain specific terms from domain corpora that were distributed by NTCIR1 TMREC task group. Among various works in automatic term extraction, the majority of them are concerned with statistics like frequency in corpora, and few focused on the characteristics of space which consists of extracted terms. In this work, we mainly focus on the latter. We propose the method which uses the statistical relation between compound nouns, that are up to 85% of all terms and the remaining 15% of simple nouns. For instance, if we have many compound nouns such as "human information system", "social information system" and so on, the importance of "information" is defined as how many kinds of nouns adjoin or are adjoined with "information." Then, the importance of compound noun is defined as the geometric means of its component nouns. Our system consists of (l) morphological analysis, (2)extracting candidate terms, (3) assign each candidate term its importance value and (4) evaluation with NTCIR1 TMREC test collection. The proposed method shows the high score among methods participating NTCIR1. We also localize our method to English in order for translation extraction to be investigated the next year.

Report

(3 results)
  • 2001 Annual Research Report   Final Research Report Summary
  • 2000 Annual Research Report
  • Research Products

    (22 results)

All Other

All Publications (22 results)

  • [Publications] Hiroshi Nakagawa: "Automatic Term Recognition based on Statistics of Compound Nouns"Terminology. Vol.6 No.2. 195-210 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] Tatsunori Mori, Mamoru Matsuo, Hiroshi Nakagawa: "Zero pronoun resolution by Linguistic Constraints and Defaults --The case of Japanese Instruction Manuals--"SPECIAL ISSUE ON ANAPHORA RESOLUTION IN MACHINE TRANSLATION, (Ruslan Mitkov editor), Machine Translation. 14. 231-245 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] Hiroshi Nakagawa: "Disambiguation of Lexical Translations Based on Bilingual Comparable Corpora"2nd International Conference on Language Resources and Evaluation : LREC2000 Workshop of Terminology Resorces and Computation :. WTRC2000. 33-38 (2000)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] Hiroshi Nakagawa: ""Experimental evaluation of ranking and selection methods in term extraction", "Recent Advances in Computational Terminology", D.Bourigault, C.Jacquemin, M.-C. L'Homme (editors)"John Benjamins. (303-325) (2001)

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] Hiroshi Nakagawa: "Automatic Term Recognition based on Statistics of Compound Nouns"Terminology. Vol. 6,No. 2. 195-210 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] Tatsunori Mori, Mamoru Matsuo, Hiroshi Nakagawa: "Zero pronoun resolution by Linguistic Constraints and Defaults The case of Japanese Instruction Manuals"SPECIAL ISSUE ON ANAPHORA RESOLUTION IN MACHINE TRANSLATION, (Ruslan Mitkov editor), Machine Translation. Vol. 14. 231-245 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] Hiroshi Nakagawa: "Disambiguation of Lexical Translations Based on Bilingual Comparable Corpora", 2nd International Conference on Language Resources and Evaluation : LREC2000 Workshop of Terminology Resources and Computation"WTRC2000 Athens. 33-38 (2000)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] Hiroshi Nakagawa: ""Experimental evaluation of ranking and selection methods in term extraction", "Recent Advances in Computational Terminology", D. Bourigault, C. Jacquemin, M.-C. L'Homme (editors),"John Benjamins. 303-325 (2001)

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2001 Final Research Report Summary
  • [Publications] 大畑 博一, 中川 裕志: "連接異なり語数による専門用語抽出"情報処理学会 研究報告. NL-136. 199-126 (2000)

    • Related Report
      2001 Annual Research Report
  • [Publications] 鈴木正史, 中川 裕志: "2言語コーパスからの複合語の対訳曖昧さ解消"言語処理学会第7回大会. 66-69 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] 湯本 紘彰, 大畑 博一, 森 辰則, 中川 裕志: "語基の連接情報を用いた専門語抽出"言語処理学会第7回大会. 161-164 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] Hiroshi Nakagawa: "Disambiguation of lexical Translations Based on Bilingual Comparable Corpora"2nd International Conference of Language Resources and Evaluation : LREC2000 Workshop of Terminology Resources and computation : WTRC2000. 33-38 (2000)

    • Related Report
      2001 Annual Research Report
  • [Publications] Hiroshi Nakagawa: "Automatic Term Recognition based on Statistics of Compound Nouns"Terminology. Vol.6 No.2. 195-210 (2000)

    • Related Report
      2001 Annual Research Report
  • [Publications] Hiroshi Nakagawa: "Disambiguation of Compound Noun Translations Extracted from Bilingual Comparable Corpora"6th Natural Language Processing Pacific Rim Symposium (NLPRS'01). 67-74 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] HIroshi Nakagawa: "Experimental evaluation of ranking and selection methods in term extraction Recent Advances in Computational Terminology"D. Bouringault, C. Jacquemin, M.-C. L'Homme (editors) John Bengamins. 23 (2001)

    • Related Report
      2001 Annual Research Report
  • [Publications] 中川裕志,木村浩康,三瓶光司,松本勉: "辞書変換法に基づく日本語テキストへの情報ハイディング"情報処理学会 論文誌. 41巻8号. 2272-2280 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] Tatsunori Mori,Mamoru Matsuo,Hiroshi Nakagawa: "Zero pronoun rsolution by Linguistic Constraints and Defaults-The Case of Japanese Instruction Manual-"The Machine Translation Journal. 14-2-3. (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] Hiroshi Nakagawa: "Disambiguation of Lexical Translations Based on Bilingual Comparable Corpora"LREC2000 Workshop of Terminology Resorces and Computation : WTRC2000. 33-38 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] Hiroshi Nakagawa: "Automatic Term Recognition based on Statistics of Compound Nouns"Terminology. Vol6(To be published). (2001)

    • Related Report
      2000 Annual Research Report
  • [Publications] Kumiko Tanaka-Ishii,Ian Frank,Katsuto Arai: "Trying to Understand RoboCup"Artificial Intelligence Magazine. 21-Winter. 19-25 (2000)

    • Related Report
      2000 Annual Research Report
  • [Publications] Hiroshi Nakagawa: "Recent Advances in Computational Terminology"Experimental evaluation of ranking and selection methods in term extraction. 303-325 (2001)

    • Related Report
      2000 Annual Research Report
  • [Publications] Kumiko Tanaka-Ishii,Ian Frank: "2000 Annual Meeting for Association of Computational Linguistics"Multi-Agent Explanation Strategiew in Real-Time Domains. 158-165 (2000)

    • Related Report
      2000 Annual Research Report

URL: 

Published: 2000-04-01   Modified: 2021-12-08  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi