Project/Area Number |
11480088
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
情報システム学(含情報図書館学)
|
Research Institution | NARA INSTITUTE OF SCIENCE AND TECHNOLOGY |
Principal Investigator |
UEMURA Shunsuke Nara Institute of Science and Technollogy, Graduate School of Information Science, Professor, 情報科学研究科, 教授 (00203480)
|
Co-Investigator(Kenkyū-buntansha) |
HATANO Kenji Nara Institute of Science and Technollogy, Graduate School of Information Science, Assitant Professor, 情報科学研究科, 助手 (80314532)
AMAGASA Toshiyuki Nara Institute of Science and Technollogy, Graduate School of Information Science, Assitant Professor, 情報科学研究科, 助手 (70314531)
YOSHIKAWA Masatoshi Nara Institute of Science and Technollogy, Graduate School of Information Science, Associate Professor, 情報科学研究科, 助教授 (30182736)
WATANABE Masahiro The National Institute of Special Education Center for Policy Research, International Collaboration and Special Education Information Services, Researcher, 総合政策情報センター, 研究員 (80321595)
MAEDA Akira Ritsumeikan University, Department of Computer Science, Associate Professor, 理工学部・情報学科, 助教授 (20351322)
石川 正敏 島根県立大学, 総合政策学部, 助手 (90332973)
|
Project Period (FY) |
1999 – 2002
|
Project Status |
Completed (Fiscal Year 2002)
|
Budget Amount *help |
¥14,800,000 (Direct Cost: ¥14,800,000)
Fiscal Year 2002: ¥1,800,000 (Direct Cost: ¥1,800,000)
Fiscal Year 2001: ¥4,500,000 (Direct Cost: ¥4,500,000)
Fiscal Year 2000: ¥5,000,000 (Direct Cost: ¥5,000,000)
Fiscal Year 1999: ¥3,500,000 (Direct Cost: ¥3,500,000)
|
Keywords | cross-lingual information retrieval / query term disambiguation / parallel corpus / WWW / CLIR / XMLデータベース / 言語横断検索 / 多言語ブラウザ / 適合性フィードバック / 問合せ拡張 / 多言語 / 知識 / 発掘 / データベース / 多言語処理 / 情報検索 / 単言語コーパス / 文字符号 / 相互情報量 |
Research Abstract |
With the growth of the Internet and WWW in recent years, documents written in various languages are being provided. Although 80% of current Web pages are written in English, it is estimated that over a half of Web documents will be non-English in 2003. Therefore, WWW can be regarded as a huge document database which contains a mixture of documents written in various languages. However, many problems remain to be solved in order to realize a retrieval system which can handle such multilingual documents in a unified way ; e.g.the diversity of document coding systems used in Web pages, the language barrier of a non-native user to formulate a query, and the limitation on inputting the query strings and displaying the search results. In this research project we have-studied key, technologies in order to realize cross-language information retrieval which supports conversion of cultural factors, Existing CLIR approaches require a parallel corpus or a comparable corpus for the disambiguation of translated query term, but these corpora are not readily available. Furthermore, bilingual dictionaries may not be readily available for a particular language pair (i.e.minor languages). Thus our approach focuses on a method which does not depend on available language resources as much as possible. For the disambiguation of translated query terms, we use co-occurrence statistics of two words in the target language corpus. The advantage of our approach is that it does not require rarely available language resources like a parallel corpus or a comparable corpus.
|