Project/Area Number |
09044179
|
Research Category |
Grant-in-Aid for international Scientific Research
|
Allocation Type | Single-year Grants |
Section | Joint Research |
Research Field |
Intelligent informatics
|
Research Institution | Hiroshima City University |
Principal Investigator |
REN Fuji Faculty of Information Sciences, Hiroshima City University, Associate Professor, 情報科学部, 助教授 (20264947)
|
Co-Investigator(Kenkyū-buntansha) |
LUO Zhensheng Department of Chinese Lenguage and Literature, Tsinghua University, Professor, 中文学部, 教授
DAVIS Mark CRL,New Mexico State University, Researcher, CRL, 研究員
NIE Jianyun Informatique et Recherche oprationnelle, Universite de Montreal, Associate Profe, 情報研究学部, 助教授
CHEN Chunxiang Information Certer, Hiroshima Prefetural University, Associate Professor, 情報教育センター, 助教授 (90264944)
KITAKAMI Hajime Faculty of Information Sciences, Hiroshima City University Professor, 情報科学部, 教授 (50234240)
|
Project Period (FY) |
1997 – 1998
|
Keywords | Corpus / Chinese / Japanese / English / Machine Translation / Natural Language Processing / Information Retrival / knowledge acquisition |
Research Abstract |
In this project, We studied Japanese-Chinese-English Translation Corpus and Its Application. (l)We have proposed an algorithm for automatically aligning Chinese and Japanese parallel Corpora. The algorithm initially attempts to align the Chinese and Japanese sentences using measurements of the differences between sentence lengths in the two documents. (2)We have proposed a new concept called "Sensitive Word", and described the segmentation problem and the impact of the sensitive word on natural language processing. We have shown the result of our examination. (3)A crucial problem in rule-based machine translation is the acquisition of translation knowledge. Many studies have been conducted for automatic acquisition in the past, but they require a great deal of annotated examples. In this project, we have developed a semi-automatic mechanism to support knowledge acquisition from translation examples in Japanese-Chinese environment. (4) We have proposed a relaxed segmentation process for Chinese which extracts not noly the longest words as one usually does, but also all the short words implied. Special rules are also designed in order to recongnize and standardize sepcial words such as proper names and nominal pre-determiners. It is shown that IR based on this segmentation leads to a higher effectiveness than bigrams. (5) We have taken a new hybrid multi-engine approach to machine translation, which can take the advantages of the previously proposed methods and get rid of their disadvantages. (6) We will publisize our "Translation corpora" and "Chinese dictionary" on the Web after we improve the multi-lingual envionment.
|