Development of an Integrated Tool System for Technical Term Auto-Extraction and Knowledge Acquisition from Corpora

Research Project

Project/Area Number	08558027
Research Category	Grant-in-Aid for Scientific Research (A)
Allocation Type	Single-year Grants
Section	展開研究
Research Field	Intelligent informatics
Research Institution	The University of Tokyo
Principal Investigator	TSUJI Junichi The University of Tokyo, Graduate School of Science, Professor, 大学院・理学系研究科, 教授 (20026313)
Co-Investigator(Kenkyū-buntansha)	IKEHARA Satoru The University of Tottori, Faculty of Engineering, Professor, 工学部, 教授 (70283968) KAGEURA Kyo National Center for Science Information Systems, Associate Professor, 助教授 (00211152) KOYAMA Teruo National Center for Science Information Systems, Professor, 教授 (80124410) KIYONO Masaki Matsushita Electric Industrial company, Research institute of Tokyo, research worker, 東京研究所, 研究員
Project Period (FY)	1996 – 1998
Project Status	Completed (Fiscal Year 1998)
Budget Amount *help	¥13,200,000 (Direct Cost: ¥13,200,000) Fiscal Year 1998: ¥2,600,000 (Direct Cost: ¥2,600,000) Fiscal Year 1997: ¥3,200,000 (Direct Cost: ¥3,200,000) Fiscal Year 1996: ¥7,400,000 (Direct Cost: ¥7,400,000)
Keywords	knowledge acquisition / semantic classification / database / technical term extraction / 専門用語 / オントロジー / 係り受け解析 / 分布モデル / コーパス / 自動抽出 / 記号処理プログラム / 言語の統計的処理 / タ-ミノロジー / 知識表現 / 情報検索
Research Abstract	The goal of this project was to provide the systems that can acquire knowledge on terminology from texts in a semi-automatic manner. In order to accomplish the goal, we have developed the following three systems. 1. Central Database for Terminology : We have created a database system for terminology by integrating the text/lexicon database developed by EDR and the programming language LiLFeS, which was developed at University of Tokyo for easy and flexible treatment linguistic entities By this system, we can perform a systematic maintenance of the knowledge acquired by the following two systems. 2. Systems for term recognition : The research group in the NACSIS introduced a statistical metric to identify technical terminology in texts, and built the programs that can recognize terms using this metric. The group in University of Tokyo attacked the same problem in a different perspective, and succeeded in providing a term recognition method based on character n-grams. Those programs are integrated so that they can work as a front end of the database system described in 1. 3. Systems for acquiring ontological knowledge on terms : The research group in University of Tokyo developed the programs for obtaining semantic classifications of words according to surface clues appearing in texts. The Matsushita research group developed a similar technique using deeper syntactic structures of texts. Those systems were applied to the documents in Genome texts, the news articles about stock markets and so on.

Report

(4 results)

1998 Annual Research Report Final Research Report Summary
1997 Annual Research Report
1996 Annual Research Report

Research Products
(29 results)

All Other

All Publications (29 results)

[Publications] T.Koyama: "Research on Natural Low Database"Proc.JCKBSE'96. 242-245 (1996)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] K.Kageura: "Some Statistical Characterizations of Terminological and Non-Terminological Elements Evaluation and Examination in Tepanese Technical Abstiacts"TKE'96. 131-138 (1996)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] J.Tsujii: "Analysis of Word Structure of Medical Synonyms"TKE'96. 190-196 (1996)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] K.Kageura: "A Statistical Analysis of Morphemes in Japanese Terminorogy"COLING-ACL'98. 638-645 (1998)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] T.Makino,K.Torisawa,J.Tsujii: "LiLFeS-Practical Programming Language for Typed Feature Structures"Proc.NLPRS'97. 239-244 (1997)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] T.Seki*,H.S.Park,J.Tsujii: "Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts"Genome Informatics. 9. 62-71 (1998)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Teruo Koyama: "Research on Natural Law Database"Proceedings of JCKBSE'96. 242-245 (1996)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Kyo Kageura: "Some Statistical Characterizations of Terminological and Non-Terminological Elements : Evaluation and Examination in Japanese Technical Abstracts"Proceedings of TKE'96. 131-138 (1996)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Junichi Tsujii: "Analysis of World Structure of Medical Synonyms"Proceedings of TKE'96. 190-196 (1996)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Kyo Kageura: "A Statistical Analysis of Morphemes In Japanese Terminology"Proceedings of COLING'98. 638-645 (1998)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Takaki Makino, Kentaro Torisawa, Junichi Tsujii: "LiLFeS-Practical Programming Language for Typed Feature Structures"Proceedings of NLPRS'97. 239-244 (1997)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Tsuyoshi Sekimizu, H. S. Park, Junichi Tsujii: "Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs In Medline Abstracts"Proceedings of Genome Informatics. Vol.9. 62-71 (1998)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1998 Final Research Report Summary
[Publications] Kageura,K.: "A Statistical Analysis of Morphemes in Japanese Terminology" COLING-ACL'98. 638-645 (1998)
- Related Report
  1998 Annual Research Report
[Publications] Kageura,K.: "Some Characteristics of Bibliometric Samples" Annals of Japan Society of Library Science. 443. 97-110 (1999)
- Related Report
  1998 Annual Research Report
[Publications] T.Sekimizu,H.S.Park,J.Tsujii: "Identifying the Interaction between Genes and Gene Products Based on Fruquently Seen Verbs in Medline Abstracts" Genome Informatics. 9. 62-71 (1998)
- Related Report
  1998 Annual Research Report
[Publications] T.Hishiki,C.Nigel,C.Nobata,T.Ohta,N.Ogata,T.Sekimizu,R.Stener: "Developing NLP tools for Genome Informatics : An Information Extraction perspective" Genome Informatics.
- Related Report
  1998 Annual Research Report
[Publications] 中井、池原、白井: "「の型名詞句」に対する名詞句間の意味的係り受け規則の自動生成" 電子情報通信学会、NLC研究会、信学技報. NLC98-3. 15-22 (1998)
- Related Report
  1998 Annual Research Report
[Publications] 緒方典裕: "型理論に基づいた特定領域テキストからの動的なTaxonomy,Mereology構成" 情報処理学会研究報告. 98-NL-127. 133-140 (1998)
- Related Report
  1998 Annual Research Report
[Publications] J.Tsujii et al: "Towards a Sublanguage-Based Semantics Clustering Algorithm" Recent Adtnces in Natural Language Processing. 377-392 (1997)
- Related Report
  1997 Annual Research Report
[Publications] 緒方典裕: "Dynamic Canstructive Thesaurus" 第5回国立国語研究所国際シンポジウム第1専門部会論文集. 182-189 (1997)
- Related Report
  1997 Annual Research Report
[Publications] ToMakino,K.Torisawa & J.Tsujii: "LiLFeS-Practical Progamiming Lauguage for Typed Feature Structures" Proc.NLPRS 97. 239-244 (1997)
- Related Report
  1997 Annual Research Report
[Publications] 緒方典裕、高橋るり子: "形式談話理論基づいたテキスト中の因果関係抽出に向けて" 人工知能学会言語音声理解と対話処理研究会. SIG SLUD 9703. 13-20 (1998)
- Related Report
  1997 Annual Research Report
[Publications] 辻井潤一: "視点の変換-言語の理論から設計の理論へ" 人工知能学会誌. 11・4. 530-541 (1996)
- Related Report
  1996 Annual Research Report
[Publications] 小山照夫: "複数論文比較によるキーワード推定の試み" 情報知能学会第4回研究報告会講演論文集. 43-46 (1996)
- Related Report
  1996 Annual Research Report
[Publications] T.Koyama: "Research on Natural Law Database" Proc.JCKBSE'96. 242-245 (1996)
- Related Report
  1996 Annual Research Report
[Publications] K.Kageura: "Some Statistical Characterizations of Terminological and Non-Terminological Elements:Evaluation and Examination in Japanese Technical Abstracts" TKE'96. 131-138 (1996)
- Related Report
  1996 Annual Research Report
[Publications] K.Tsuji: "Analysis of Word Structure of Medical Synonyms" TKE'96. 190-196 (1996)
- Related Report
  1996 Annual Research Report
[Publications] K.Kageura: "Methods of Automatic Term Recognition-A Review" Terminology. 3・2(to appear).
- Related Report
  1996 Annual Research Report
[Publications] 影浦峡: "文字単位のbigram尺度に基づく複合漢字列の単位切り手法" 言語処理学会第3回年次大会. (発表予定).
- Related Report
  1996 Annual Research Report

Development of an Integrated Tool System for Technical Term Auto-Extraction and Knowledge Acquisition from Corpora

Principal Investigator

TSUJI Junichi The University of Tokyo, Graduate School of Science, Professor, 大学院・理学系研究科, 教授 (20026313)

¥13,200,000 (Direct Cost: ¥13,200,000)

Report

Research Products

[Publications] T.Koyama: "Research on Natural Low Database"Proc.JCKBSE'96. 242-245 (1996)

Description

Related Report

[Publications] K.Kageura: "Some Statistical Characterizations of Terminological and Non-Terminological Elements Evaluation and Examination in Tepanese Technical Abstiacts"TKE'96. 131-138 (1996)

Description

Related Report

[Publications] J.Tsujii: "Analysis of Word Structure of Medical Synonyms"TKE'96. 190-196 (1996)

Description

Related Report

[Publications] K.Kageura: "A Statistical Analysis of Morphemes in Japanese Terminorogy"COLING-ACL'98. 638-645 (1998)

Description

Related Report

[Publications] T.Makino,K.Torisawa,J.Tsujii: "LiLFeS-Practical Programming Language for Typed Feature Structures"Proc.NLPRS'97. 239-244 (1997)

Description

Related Report

[Publications] T.Seki*,H.S.Park,J.Tsujii: "Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts"Genome Informatics. 9. 62-71 (1998)

Description

Related Report

[Publications] Teruo Koyama: "Research on Natural Law Database"Proceedings of JCKBSE'96. 242-245 (1996)

Description

Related Report

[Publications] Kyo Kageura: "Some Statistical Characterizations of Terminological and Non-Terminological Elements : Evaluation and Examination in Japanese Technical Abstracts"Proceedings of TKE'96. 131-138 (1996)

Description

Related Report

[Publications] Junichi Tsujii: "Analysis of World Structure of Medical Synonyms"Proceedings of TKE'96. 190-196 (1996)

Description

Related Report

[Publications] Kyo Kageura: "A Statistical Analysis of Morphemes In Japanese Terminology"Proceedings of COLING'98. 638-645 (1998)

Description

Related Report

[Publications] Takaki Makino, Kentaro Torisawa, Junichi Tsujii: "LiLFeS-Practical Programming Language for Typed Feature Structures"Proceedings of NLPRS'97. 239-244 (1997)

Description

Related Report

[Publications] Tsuyoshi Sekimizu, H. S. Park, Junichi Tsujii: "Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs In Medline Abstracts"Proceedings of Genome Informatics. Vol.9. 62-71 (1998)

Description

Related Report

[Publications] Kageura,K.: "A Statistical Analysis of Morphemes in Japanese Terminology" COLING-ACL'98. 638-645 (1998)

Related Report

[Publications] Kageura,K.: "Some Characteristics of Bibliometric Samples" Annals of Japan Society of Library Science. 443. 97-110 (1999)

Related Report

[Publications] T.Sekimizu,H.S.Park,J.Tsujii: "Identifying the Interaction between Genes and Gene Products Based on Fruquently Seen Verbs in Medline Abstracts" Genome Informatics. 9. 62-71 (1998)

Related Report

[Publications] T.Hishiki,C.Nigel,C.Nobata,T.Ohta,N.Ogata,T.Sekimizu,R.Stener: "Developing NLP tools for Genome Informatics : An Information Extraction perspective" Genome Informatics.

Related Report

[Publications] 中井、池原、白井: "「の型名詞句」に対する名詞句間の意味的係り受け規則の自動生成" 電子情報通信学会、NLC研究会、信学技報. NLC98-3. 15-22 (1998)

Related Report

[Publications] 緒方 典裕: "型理論に基づいた特定領域テキストからの動的なTaxonomy,Mereology構成" 情報処理学会研究報告. 98-NL-127. 133-140 (1998)

Related Report

[Publications] J.Tsujii et al: "Towards a Sublanguage-Based Semantics Clustering Algorithm" Recent Adtnces in Natural Language Processing. 377-392 (1997)

Related Report

[Publications] 緒方典裕: "Dynamic Canstructive Thesaurus" 第5回国立国語研究所国際シンポジウム第1専門部会論文集. 182-189 (1997)

Related Report

[Publications] ToMakino,K.Torisawa & J.Tsujii: "LiLFeS-Practical Progamiming Lauguage for Typed Feature Structures" Proc.NLPRS 97. 239-244 (1997)

Related Report

[Publications] 緒方典裕、高橋るり子: "形式談話理論 基づいたテキスト中の因果関係抽出に向けて" 人工知能学会言語音声理解と対話処理研究会. SIG SLUD 9703. 13-20 (1998)

Related Report

[Publications] 辻井 潤一: "視点の変換-言語の理論から設計の理論へ" 人工知能学会誌. 11・4. 530-541 (1996)

Related Report

[Publications] 小山 照夫: "複数論文比較によるキーワード推定の試み" 情報知能学会第4回研究報告会講演論文集. 43-46 (1996)

Related Report

[Publications] T.Koyama: "Research on Natural Law Database" Proc.JCKBSE'96. 242-245 (1996)

Related Report

[Publications] K.Kageura: "Some Statistical Characterizations of Terminological and Non-Terminological Elements:Evaluation and Examination in Japanese Technical Abstracts" TKE'96. 131-138 (1996)

Related Report

[Publications] K.Tsuji: "Analysis of Word Structure of Medical Synonyms" TKE'96. 190-196 (1996)

Related Report

[Publications] K.Kageura: "Methods of Automatic Term Recognition-A Review" Terminology. 3・2(to appear).

Related Report

[Publications] 影浦 峡: "文字単位のbigram尺度に基づく複合漢字列の単位切り手法" 言語処理学会第3回年次大会. (発表予定).

Related Report

[Publications] 緒方典裕: "型理論に基づいた特定領域テキストからの動的なTaxonomy,Mereology構成" 情報処理学会研究報告. 98-NL-127. 133-140 (1998)

[Publications] 緒方典裕、高橋るり子: "形式談話理論基づいたテキスト中の因果関係抽出に向けて" 人工知能学会言語音声理解と対話処理研究会. SIG SLUD 9703. 13-20 (1998)

[Publications] 辻井潤一: "視点の変換-言語の理論から設計の理論へ" 人工知能学会誌. 11・4. 530-541 (1996)

[Publications] 小山照夫: "複数論文比較によるキーワード推定の試み" 情報知能学会第4回研究報告会講演論文集. 43-46 (1996)

[Publications] 影浦峡: "文字単位のbigram尺度に基づく複合漢字列の単位切り手法" 言語処理学会第3回年次大会. (発表予定).