2001 Fiscal Year Final Research Report Summary

A Study about automatic domain term extraction from corpus

Research Project

Project/Area Number	12680368
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	The University of Tokyo
Principal Investigator	NAKAGAWA Hiroshi Information Technology Center, The University of Tokyo, Professor, 情報基盤センター, 教授 (20134893)
Co-Investigator(Kenkyū-buntansha)	TANAKA Kumiko (ISHII Kumiko) Interfaculty Information Initiative, The University of Tokyo, Lecturer, 大学院・情報学環, 講師 (10323528)
Project Period (FY)	2000 – 2001
Keywords	Term Extraction / Information Extraction / domain term / Corpus / Translation / NICIR / Natural Language Processing / Information Retrieval
Research Abstract	We mainly grappled with automatic term extraction methods which extracts domain specific terms from domain corpora that were distributed by NTCIR1 TMREC task group. Among various works in automatic term extraction, the majority of them are concerned with statistics like frequency in corpora, and few focused on the characteristics of space which consists of extracted terms. In this work, we mainly focus on the latter. We propose the method which uses the statistical relation between compound nouns, that are up to 85% of all terms and the remaining 15% of simple nouns. For instance, if we have many compound nouns such as "human information system", "social information system" and so on, the importance of "information" is defined as how many kinds of nouns adjoin or are adjoined with "information." Then, the importance of compound noun is defined as the geometric means of its component nouns. Our system consists of (l) morphological analysis, (2)extracting candidate terms, (3) assign each candidate term its importance value and (4) evaluation with NTCIR1 TMREC test collection. The proposed method shows the high score among methods participating NTCIR1. We also localize our method to English in order for translation extraction to be investigated the next year.

Research Products
(8 results)

All Other

All Publications (8 results)

[Publications] Hiroshi Nakagawa: "Automatic Term Recognition based on Statistics of Compound Nouns"Terminology. Vol.6 No.2. 195-210 (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Tatsunori Mori, Mamoru Matsuo, Hiroshi Nakagawa: "Zero pronoun resolution by Linguistic Constraints and Defaults --The case of Japanese Instruction Manuals--"SPECIAL ISSUE ON ANAPHORA RESOLUTION IN MACHINE TRANSLATION, (Ruslan Mitkov editor), Machine Translation. 14. 231-245 (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Hiroshi Nakagawa: "Disambiguation of Lexical Translations Based on Bilingual Comparable Corpora"2nd International Conference on Language Resources and Evaluation : LREC2000 Workshop of Terminology Resorces and Computation :. WTRC2000. 33-38 (2000)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Hiroshi Nakagawa: ""Experimental evaluation of ranking and selection methods in term extraction", "Recent Advances in Computational Terminology", D.Bourigault, C.Jacquemin, M.-C. L'Homme (editors)"John Benjamins. (303-325) (2001)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Hiroshi Nakagawa: "Automatic Term Recognition based on Statistics of Compound Nouns"Terminology. Vol. 6,No. 2. 195-210 (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Tatsunori Mori, Mamoru Matsuo, Hiroshi Nakagawa: "Zero pronoun resolution by Linguistic Constraints and Defaults The case of Japanese Instruction Manuals"SPECIAL ISSUE ON ANAPHORA RESOLUTION IN MACHINE TRANSLATION, (Ruslan Mitkov editor), Machine Translation. Vol. 14. 231-245 (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Hiroshi Nakagawa: "Disambiguation of Lexical Translations Based on Bilingual Comparable Corpora", 2nd International Conference on Language Resources and Evaluation : LREC2000 Workshop of Terminology Resources and Computation"WTRC2000 Athens. 33-38 (2000)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Hiroshi Nakagawa: ""Experimental evaluation of ranking and selection methods in term extraction", "Recent Advances in Computational Terminology", D. Bourigault, C. Jacquemin, M.-C. L'Homme (editors),"John Benjamins. 303-325 (2001)
- Description
  「研究成果報告書概要(欧文)」より

2001 Fiscal Year Final Research Report Summary

A Study about automatic domain term extraction from corpus

Principal Investigator

NAKAGAWA Hiroshi Information Technology Center, The University of Tokyo, Professor, 情報基盤センター, 教授 (20134893)

Research Products

[Publications] Hiroshi Nakagawa: "Automatic Term Recognition based on Statistics of Compound Nouns"Terminology. Vol.6 No.2. 195-210 (2000)

Description

[Publications] Tatsunori Mori, Mamoru Matsuo, Hiroshi Nakagawa: "Zero pronoun resolution by Linguistic Constraints and Defaults --The case of Japanese Instruction Manuals--"SPECIAL ISSUE ON ANAPHORA RESOLUTION IN MACHINE TRANSLATION, (Ruslan Mitkov editor), Machine Translation. 14. 231-245 (2000)

Description

[Publications] Hiroshi Nakagawa: "Disambiguation of Lexical Translations Based on Bilingual Comparable Corpora"2nd International Conference on Language Resources and Evaluation : LREC2000 Workshop of Terminology Resorces and Computation :. WTRC2000. 33-38 (2000)

Description

[Publications] Hiroshi Nakagawa: ""Experimental evaluation of ranking and selection methods in term extraction", "Recent Advances in Computational Terminology", D.Bourigault, C.Jacquemin, M.-C. L'Homme (editors)"John Benjamins. (303-325) (2001)

Description

[Publications] Hiroshi Nakagawa: "Automatic Term Recognition based on Statistics of Compound Nouns"Terminology. Vol. 6,No. 2. 195-210 (2000)

Description

[Publications] Tatsunori Mori, Mamoru Matsuo, Hiroshi Nakagawa: "Zero pronoun resolution by Linguistic Constraints and Defaults The case of Japanese Instruction Manuals"SPECIAL ISSUE ON ANAPHORA RESOLUTION IN MACHINE TRANSLATION, (Ruslan Mitkov editor), Machine Translation. Vol. 14. 231-245 (2000)

Description

[Publications] Hiroshi Nakagawa: "Disambiguation of Lexical Translations Based on Bilingual Comparable Corpora", 2nd International Conference on Language Resources and Evaluation : LREC2000 Workshop of Terminology Resources and Computation"WTRC2000 Athens. 33-38 (2000)

Description

[Publications] Hiroshi Nakagawa: ""Experimental evaluation of ranking and selection methods in term extraction", "Recent Advances in Computational Terminology", D. Bourigault, C. Jacquemin, M.-C. L'Homme (editors),"John Benjamins. 303-325 (2001)

Description