Project/Area Number |
61880005
|
Research Category |
Grant-in-Aid for Developmental Scientific Research
|
Allocation Type | Single-year Grants |
Research Field |
Informatics
|
Research Institution | University of Tokyo |
Principal Investigator |
FUJISAKI Hiroya Faculty of Engineering, University of Tokyo, 工学部, 教授 (80010776)
|
Co-Investigator(Kenkyū-buntansha) |
KAMEDA Hiroyuki Faculty of Engineering, Tokyo Engineering University, 工学部, 講師 (00194994)
宮崎 幸一 (株)朝日新聞社, 東京本社, 制作局局長
KURASHIMA Tokihisa Dictionary Department, Sanseido co., ltd., 国語辞書編集所所長
TANAKA Yasuhito Management and Information Science Department, Himeji College, 経営情報工学, 助教授 (00163585)
OGINE Tsunao Institute of Literature and Linguistics, University of Tsukuba, 文芸・言語学系, 助教授 (00111443)
MIYAZAKI Koichi Production Department, Tokyo Main Office, Asahi Shinbun Publish Company
広瀬 啓吉 東京大学, 工学部, 助教授 (50111472)
|
Project Period (FY) |
1986 – 1987
|
Project Status |
Completed (Fiscal Year 1987)
|
Budget Amount *help |
¥9,500,000 (Direct Cost: ¥9,500,000)
Fiscal Year 1987: ¥3,000,000 (Direct Cost: ¥3,000,000)
Fiscal Year 1986: ¥6,500,000 (Direct Cost: ¥6,500,000)
|
Keywords | Retrieval With Advanced Functions / Large Scale Japanese Text Database / Retrieval of Linguistic Usages / Morpheme Analysis / 品詞情報自動付与 / 形態素解析 / 読み情報自動付与 |
Research Abstract |
The study of construction of a large scale text database with advanced retrieval functions yielded the following outcomes. 1. Compilation of word dictionary for linguistic analysis: the dictionary, which was already generated from two computer-readable dictionaries, i<e., the Shinmeikai-Kokugojiten and the Nihongotango-Kikaijisho, was modified and expanded by adding proper nouns and other words frequently occurring in daily newspaper articles in orser for the dictionary to be applied to analyze a large amount of newspaper articles. The number of lexical items of this dictionary reachek about 200,000. Each item holds information on spellings, conjugations and declensions. 2. Algorithms for analysis of morphemes and part of speeches and their implementation on a computer: Relationship between part of speeches was exhaustively investigated. As the result, as 86 by 59 connection table of part of speech was obtained and the structure ob bunsetsu was described in a transition network. Furthermore, algorithms for analysis of morphemes and part of speeches, which are based on the grammatical knowledge mentioned above, were made up and implemented in FORTRAN77 on a large scale computer of the University of Tokyo. 3. Equipment of newspaper date: Articles of Japanese daily newspaper of 84 days were selected from the Asahi Newspaper in 1982, and were processed and stored on the large scale computer as a text data. 4. Construction of database system with advanced retrieval functions: Based on these outcomes, a large scale text database was constructed, which is furnished with the advanced retrieval functions such that character, sequence of characters, word, sequence of words, part of speech, sequence of part of speeches and arbitrary combinations of these can be utilized as retrieval keys. The database management system was built in FORTRAN77 on the large scale computer. As mentioned above, the aims of this study has fully accomplished.
|