Construction of Large Scale Japanese Text Database Based on Advanced Retrieval Method

Research Project

Project/Area Number	61880005
Research Category	Grant-in-Aid for Developmental Scientific Research
Allocation Type	Single-year Grants
Research Field	Informatics
Research Institution	University of Tokyo
Principal Investigator	FUJISAKI Hiroya Faculty of Engineering, University of Tokyo, 工学部, 教授 (80010776)
Co-Investigator(Kenkyū-buntansha)	KAMEDA Hiroyuki Faculty of Engineering, Tokyo Engineering University, 工学部, 講師 (00194994) 宮崎幸一 (株)朝日新聞社, 東京本社, 制作局局長 KURASHIMA Tokihisa Dictionary Department, Sanseido co., ltd., 国語辞書編集所所長 TANAKA Yasuhito Management and Information Science Department, Himeji College, 経営情報工学, 助教授 (00163585) OGINE Tsunao Institute of Literature and Linguistics, University of Tsukuba, 文芸・言語学系, 助教授 (00111443) MIYAZAKI Koichi Production Department, Tokyo Main Office, Asahi Shinbun Publish Company 広瀬啓吉東京大学, 工学部, 助教授 (50111472)
Project Period (FY)	1986 – 1987
Project Status	Completed (Fiscal Year 1987)
Budget Amount *help	¥9,500,000 (Direct Cost: ¥9,500,000) Fiscal Year 1987: ¥3,000,000 (Direct Cost: ¥3,000,000) Fiscal Year 1986: ¥6,500,000 (Direct Cost: ¥6,500,000)
Keywords	Retrieval With Advanced Functions / Large Scale Japanese Text Database / Retrieval of Linguistic Usages / Morpheme Analysis / 品詞情報自動付与 / 形態素解析 / 読み情報自動付与
Research Abstract	The study of construction of a large scale text database with advanced retrieval functions yielded the following outcomes. 1. Compilation of word dictionary for linguistic analysis: the dictionary, which was already generated from two computer-readable dictionaries, i<e., the Shinmeikai-Kokugojiten and the Nihongotango-Kikaijisho, was modified and expanded by adding proper nouns and other words frequently occurring in daily newspaper articles in orser for the dictionary to be applied to analyze a large amount of newspaper articles. The number of lexical items of this dictionary reachek about 200,000. Each item holds information on spellings, conjugations and declensions. 2. Algorithms for analysis of morphemes and part of speeches and their implementation on a computer: Relationship between part of speeches was exhaustively investigated. As the result, as 86 by 59 connection table of part of speech was obtained and the structure ob bunsetsu was described in a transition network. Furthermore, algorithms for analysis of morphemes and part of speeches, which are based on the grammatical knowledge mentioned above, were made up and implemented in FORTRAN77 on a large scale computer of the University of Tokyo. 3. Equipment of newspaper date: Articles of Japanese daily newspaper of 84 days were selected from the Asahi Newspaper in 1982, and were processed and stored on the large scale computer as a text data. 4. Construction of database system with advanced retrieval functions: Based on these outcomes, a large scale text database was constructed, which is furnished with the advanced retrieval functions such that character, sequence of characters, word, sequence of words, part of speech, sequence of part of speeches and arbitrary combinations of these can be utilized as retrieval keys. The database management system was built in FORTRAN77 on the large scale computer. As mentioned above, the aims of this study has fully accomplished.

Report

(2 results)

1987 Final Research Report Summary
1986 Annual Research Report

Research Products
(18 results)

All Other

All Publications (18 results)

[Publications] 藤崎博也: 情報処理学会第33回全国大会講演論文集. 1831-1832 (1986)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1987 Final Research Report Summary
[Publications] 藤崎博也: 情報処理学会第35回全国大会講演論文集. 1269-1270 (1987)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1987 Final Research Report Summary
[Publications] 藤崎博也: 情報処理学会第36回全国大会講演論文集. (1988)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1987 Final Research Report Summary
[Publications] 荻野綱男: 計量国語学. 16. 81-87 (1987)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1987 Final Research Report Summary
[Publications] 田中康仁: 情報処理学会第35回全国大会講演論文集. 1211-1212 (1987)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1987 Final Research Report Summary
[Publications] 亀田弘之: 情報処理学会論文誌. 28. 1103-1111 (1987)
- Description
  「研究成果報告書概要(和文)」より
- Related Report
  1987 Final Research Report Summary
[Publications] Organizatgion of Large Scale Japanese Text Database with advanced functions: Reports of the 33th Meeting of Information Processing Society of Japna. 1831-1832 (1986)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1987 Final Research Report Summary
[Publications] Hiroya Fujisaki: "Lexical Category Analysis for a lorge-scale Japanese Text Database with Advanced Functions" Reports of the 33th Meeting of Information Processing Society of Japan. 1269-1270 (1987)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1987 Final Research Report Summary
[Publications] Hiroya Fujisaki: "Morphemic and Syntactic Analysis for Constructing a Text Database with Advanced Functions" Reports of the 36th Meeting of Information Processing Society of Japan. (1988)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1987 Final Research Report Summary
[Publications] Tsunao Ogino: "Methodology to Evaluate the Performance of Kna-Kanji Conversion Systems" Computational Linguistics. 16. 81-87 (1987)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1987 Final Research Report Summary
[Publications] Yasuhito Tanaka: "Acquistition of Knowlledge Data by Analyzing Natural Language" Reports of the 35th Meeting of Information Processing Society of Japan. 1211-1212 (1987)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1987 Final Research Report Summary
[Publications] Hiroyuki Kameda: "Classification and Retrieval System for Newspaper Information Based on a Theme - Key Concept - Key Word Hierarchy" Transactions of Information Processing Society of Japan. 1103-1111 (1987)
- Description
  「研究成果報告書概要(欧文)」より
- Related Report
  1987 Final Research Report Summary
[Publications] 亀田弘之: 情報処理学会第33回全国大会講演論文集. 1831-1832 (1986)
- Related Report
  1986 Annual Research Report
[Publications] 亀田弘之: 情報処理学会第33回全国大会講演論文集. 1833-1834 (1986)
- Related Report
  1986 Annual Research Report
[Publications] 荻野綱男: マイ・ワープロ. (1987)
- Related Report
  1986 Annual Research Report
[Publications] 荻野綱男: 日本言語学会第93回研究発表会資料. 54 (1986)
- Related Report
  1986 Annual Research Report
[Publications] 田中康仁: 情報処理学会第34回全国大会講演論文集. (1987)
- Related Report
  1986 Annual Research Report
[Publications] 田中康仁: 情報処理学会自然言語研究会資料. (1987)
- Related Report
  1986 Annual Research Report

Construction of Large Scale Japanese Text Database Based on Advanced Retrieval Method

Principal Investigator

FUJISAKI Hiroya Faculty of Engineering, University of Tokyo, 工学部, 教授 (80010776)

¥9,500,000 (Direct Cost: ¥9,500,000)

Report

Research Products

[Publications] 藤崎 博也: 情報処理学会第33回全国大会講演論文集. 1831-1832 (1986)

Description

Related Report

[Publications] 藤崎 博也: 情報処理学会第35回全国大会講演論文集. 1269-1270 (1987)

Description

Related Report

[Publications] 藤崎 博也: 情報処理学会第36回全国大会講演論文集. (1988)

Description

Related Report

[Publications] 荻野 綱男: 計量国語学. 16. 81-87 (1987)

Description

Related Report

[Publications] 田中 康仁: 情報処理学会第35回全国大会講演論文集. 1211-1212 (1987)

Description

Related Report

[Publications] 亀田 弘之: 情報処理学会論文誌. 28. 1103-1111 (1987)

Description

Related Report

[Publications] Organizatgion of Large Scale Japanese Text Database with advanced functions: Reports of the 33th Meeting of Information Processing Society of Japna. 1831-1832 (1986)

Description

Related Report

[Publications] Hiroya Fujisaki: "Lexical Category Analysis for a lorge-scale Japanese Text Database with Advanced Functions" Reports of the 33th Meeting of Information Processing Society of Japan. 1269-1270 (1987)

Description

Related Report

[Publications] Hiroya Fujisaki: "Morphemic and Syntactic Analysis for Constructing a Text Database with Advanced Functions" Reports of the 36th Meeting of Information Processing Society of Japan. (1988)

Description

Related Report

[Publications] Tsunao Ogino: "Methodology to Evaluate the Performance of Kna-Kanji Conversion Systems" Computational Linguistics. 16. 81-87 (1987)

Description

Related Report

[Publications] Yasuhito Tanaka: "Acquistition of Knowlledge Data by Analyzing Natural Language" Reports of the 35th Meeting of Information Processing Society of Japan. 1211-1212 (1987)

Description

Related Report

[Publications] Hiroyuki Kameda: "Classification and Retrieval System for Newspaper Information Based on a Theme - Key Concept - Key Word Hierarchy" Transactions of Information Processing Society of Japan. 1103-1111 (1987)

Description

Related Report

[Publications] 亀田弘之: 情報処理学会第33回全国大会講演論文集. 1831-1832 (1986)

Related Report

[Publications] 亀田弘之: 情報処理学会第33回全国大会講演論文集. 1833-1834 (1986)

Related Report

[Publications] 荻野綱男: マイ・ワープロ. (1987)

Related Report

[Publications] 荻野綱男: 日本言語学会第93回研究発表会資料. 54 (1986)

Related Report

[Publications] 田中康仁: 情報処理学会第34回全国大会講演論文集. (1987)

Related Report

[Publications] 田中康仁: 情報処理学会自然言語研究会資料. (1987)

Related Report

[Publications] 藤崎博也: 情報処理学会第33回全国大会講演論文集. 1831-1832 (1986)

[Publications] 藤崎博也: 情報処理学会第35回全国大会講演論文集. 1269-1270 (1987)

[Publications] 藤崎博也: 情報処理学会第36回全国大会講演論文集. (1988)

[Publications] 荻野綱男: 計量国語学. 16. 81-87 (1987)

[Publications] 田中康仁: 情報処理学会第35回全国大会講演論文集. 1211-1212 (1987)

[Publications] 亀田弘之: 情報処理学会論文誌. 28. 1103-1111 (1987)