2003 Fiscal Year Final Research Report Summary

The Use of Internet Corpus in Natural Language Processing

Research Project

Project/Area Number	14580411
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Single-year Grants
Section	一般
Research Field	Intelligent informatics
Research Institution	The University of Electro-Communications
Principal Investigator	FURUGORI Teiji Computer Science, The Univ. of Electro-comm., Faculty of Electro-Communications, Professor, 電気通信学部, 教授 (80114932)
Project Period (FY)	2002 – 2003
Keywords	Natural Language Processing / Information Extraction / Internet Corpus / Automatic Summerization / Machine Translation / Structural Analysis / 複合語分析
Research Abstract	Textual materials on the Internet, or Internet corpus, is a language resource important for and valuable in natural language processing. In this research, we have tried to it in the process of devising a method for analyzing compound words in Japanese, a writer's aid program for translating Japanese into English, and an automatic summarization system for newspaper articles on sassho-jiken. The approach we use in natural language processing is statistical, not linguistic theoretical. We encounter a difficulty in this approach that require a solution to the spares date problem : whatever the result we may get, it will not be reliable one if it is attained from the analysis of insufficient amount of data. The data on the Internet are practically infinite, and our research has proven an effective use of Internet corpus in the areas we dealt with. At the same time, however, Iit has revealed a problem that the data are not well formed on the Internet and a device to eliminate "junk" data would be a necessary process for many language processing systems.

Research Products
(13 results)

All Other

All Publications (13 results)

[Publications] Dongli Han, Takeshi Ito, Teiji Furugori: "Structural Analysis of Compound Words in Japanese Using Semantic Dependency Relations"J.of Quautitative Linguistics. 9. 1-17 (2002)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Teiji Furugori, Lin Rihua, Takeshi Ito, Dongli Han: "Information Extraction and Summerization for Newspaper Articles on Sassho-Jiken"IEICE Transactions. E86-D. 1728-1735 (2003)
- Description
  「研究成果報告書概要(和文)」より
[Publications] 韓勅, 伊藤毅志, 古郡廷治: "要素間の依存関係に基づく複合語の構造分析"電子情報通信学会誌. J86-DII. 706-714 (2003)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Dongli Han, Takeshi Ito, Teiji Furugori: "A deterministic method for structural analysis of compound words in Japanese"Proc.of 16th Paciffic Asia Conf.on Language, Information and Computation. 79-91 (2002)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Dongli Han, Takeshi Ito, Teiji Furugori: "Rewritins Japanese Compound nouns into expressions usable effectively in machine translation System"Proc.of 2002 IEEE Int'l Conf.on System Man and Cybernatics. (2002)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Sawa Takakura, Takeshi Ito, Teiji Furugori: "Trans Aid : a writer's aid system for translating Japanese into English"Proc.of 2002 IEEE Int'l Conf.on System Man and Cybernatics. (2002)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Sawa Takakura, Dongli Han, Teiji Furugori: "Proc.of Winter Int'l Symp.on Info and Comm Technology"An experiment for determining semantic, relations between main and sobordinate clauses in Complex sentence seateuces. (2004)
- Description
  「研究成果報告書概要(和文)」より
[Publications] Dongli Han, Takeshi Ito, Teiji Furugori: "A deterministic method for structural analysis of compound words in Japanese"Proc. of the 16th Pacific Asia Conference Language, Information and Computation. 79-91 (2002)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Dongli Han, Takeshi Ito, Teiji Furugori: "Rewriting Japanese compound nouns into expressions usable effectively in machine translation system"Proc. of 2002 IEEE International Conference on System, Man and Cybernetics. (2002)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Sawa Takakura, Takeshi Ito, Teiji Furugori: "TransAid : a writer's aid system for translating Japanese into English"Proc. of 2002 IEEE International Conference on System, Man and Cybernetics. (2002)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Sawa Takakura, Dongli Han, Teiji Furugori: "An experiment for determining semantic relations between main and subordinate clauses in complex sentences"Proc. of Winter International Symposium on Information and Communication Technolog. (2004)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Dongli Han, Takeshi Ito, Teiji Furugori: "Structural Analysis of Compound Words in Japanese Using Semantic Dependency Releations"Journal of Quantitative Linguistics. Vol.9, No.1. 1-17 (2002)
- Description
  「研究成果報告書概要(欧文)」より
[Publications] Teiji Furugori, Lin Rihua, Takashi Ito, Dongli Han: "Information Extraction and Summarization for Newspaper Articles on Sassho-Jiken"IEICE Trans.. Vol.E86-D, No.9. 1728-35 (2003)
- Description
  「研究成果報告書概要(欧文)」より

2003 Fiscal Year Final Research Report Summary

The Use of Internet Corpus in Natural Language Processing

Principal Investigator

FURUGORI Teiji Computer Science, The Univ. of Electro-comm., Faculty of Electro-Communications, Professor, 電気通信学部, 教授 (80114932)

Research Products

[Publications] Dongli Han, Takeshi Ito, Teiji Furugori: "Structural Analysis of Compound Words in Japanese Using Semantic Dependency Relations"J.of Quautitative Linguistics. 9. 1-17 (2002)

Description

[Publications] Teiji Furugori, Lin Rihua, Takeshi Ito, Dongli Han: "Information Extraction and Summerization for Newspaper Articles on Sassho-Jiken"IEICE Transactions. E86-D. 1728-1735 (2003)

Description

[Publications] 韓勅, 伊藤毅志, 古郡廷治: "要素間の依存関係に基づく複合語の構造分析"電子情報通信学会誌. J86-DII. 706-714 (2003)

Description

[Publications] Dongli Han, Takeshi Ito, Teiji Furugori: "A deterministic method for structural analysis of compound words in Japanese"Proc.of 16th Paciffic Asia Conf.on Language, Information and Computation. 79-91 (2002)

Description

[Publications] Dongli Han, Takeshi Ito, Teiji Furugori: "Rewritins Japanese Compound nouns into expressions usable effectively in machine translation System"Proc.of 2002 IEEE Int'l Conf.on System Man and Cybernatics. (2002)

Description

[Publications] Sawa Takakura, Takeshi Ito, Teiji Furugori: "Trans Aid : a writer's aid system for translating Japanese into English"Proc.of 2002 IEEE Int'l Conf.on System Man and Cybernatics. (2002)

Description

[Publications] Sawa Takakura, Dongli Han, Teiji Furugori: "Proc.of Winter Int'l Symp.on Info and Comm Technology"An experiment for determining semantic, relations between main and sobordinate clauses in Complex sentence seateuces. (2004)

Description

[Publications] Dongli Han, Takeshi Ito, Teiji Furugori: "A deterministic method for structural analysis of compound words in Japanese"Proc. of the 16th Pacific Asia Conference Language, Information and Computation. 79-91 (2002)

Description

[Publications] Dongli Han, Takeshi Ito, Teiji Furugori: "Rewriting Japanese compound nouns into expressions usable effectively in machine translation system"Proc. of 2002 IEEE International Conference on System, Man and Cybernetics. (2002)

Description

[Publications] Sawa Takakura, Takeshi Ito, Teiji Furugori: "TransAid : a writer's aid system for translating Japanese into English"Proc. of 2002 IEEE International Conference on System, Man and Cybernetics. (2002)

Description

[Publications] Sawa Takakura, Dongli Han, Teiji Furugori: "An experiment for determining semantic relations between main and subordinate clauses in complex sentences"Proc. of Winter International Symposium on Information and Communication Technolog. (2004)

Description

[Publications] Dongli Han, Takeshi Ito, Teiji Furugori: "Structural Analysis of Compound Words in Japanese Using Semantic Dependency Releations"Journal of Quantitative Linguistics. Vol.9, No.1. 1-17 (2002)

Description

[Publications] Teiji Furugori, Lin Rihua, Takashi Ito, Dongli Han: "Information Extraction and Summarization for Newspaper Articles on Sassho-Jiken"IEICE Trans.. Vol.E86-D, No.9. 1728-35 (2003)

Description