Project/Area Number |
19300047
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | Osaka University |
Principal Investigator |
HORI Kazunari Osaka University, 大学教育実践センター, 准教授 (80270346)
|
Co-Investigator(Kenkyū-buntansha) |
TAKEHARA Shin 大阪大学, 世界言語研究センター, 准教授 (20324874)
YAMAZAKI Naoki 関西大学, 外国語学部, 教授 (30230402)
KOJIMA Kazuhide 大阪大学, サイバーメディアセンター, 講師 (60372637)
UEHARA Junichi 大阪大学, 言語文化研究科, 准教授 (30252737)
SUZUKI Shingo 京都産業大学, 外国語学部, 助教 (20513360)
石島 悌 大阪府立産業技術総合研究所, 情報電子部, 主任研究員 (80359398)
|
Co-Investigator(Renkei-kenkyūsha) |
ISHIJIMA Dai 大阪府立産業技術総合研究所, 情報電子部, 主任研究員 (80359398)
TAKASHINA Yoshiyuki 大阪大学, 世界言語研究センター, 教授 (70144540)
HUZIIE Hiroaki 大阪大学, 世界言語研究センター, 准教授 (90283837)
TANIMURA Midori 京都外国語大学, 外国語学部, 講師 (00434647)
|
Project Period (FY) |
2007 – 2010
|
Project Status |
Completed (Fiscal Year 2010)
|
Budget Amount *help |
¥18,720,000 (Direct Cost: ¥14,400,000、Indirect Cost: ¥4,320,000)
Fiscal Year 2010: ¥4,420,000 (Direct Cost: ¥3,400,000、Indirect Cost: ¥1,020,000)
Fiscal Year 2009: ¥3,770,000 (Direct Cost: ¥2,900,000、Indirect Cost: ¥870,000)
Fiscal Year 2008: ¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000)
Fiscal Year 2007: ¥6,370,000 (Direct Cost: ¥4,900,000、Indirect Cost: ¥1,470,000)
|
Keywords | 自然言語処理 / 多言語資源 / 会話文データベース / 語彙データベース / 外国語 / 多言語処理 / コンテンツ・アーカイブ / 言語資源 / XML / GDA / LCTL |
Research Abstract |
The Multi-language Resources Research Group of Osaka University has been working on multi-language parallel corpora and XML annotation tools. The contents of the corpora include 5000-word lists (seven languages), 1000 plaintext sentences (12 languages), and XML-formatted sentences (five languages) that contain syntactic information. Each corpus has words and sentences which are organized in a tabular format. The type of XML format used for these corpora is Global Document Annotation (GDA), which allows computers to automatically recognize the semantic and pragmatic structures of texts. Three XML annotation tools are created especially for tagging words and sentences in these corpora, in forms of FLASH applications so that they can be used on Web browsers. The tools' GUI has a function to draw parse trees, which helps annotators who are not familiar with XML data construction. These corpora can be used as fundamental data for contrastive linguistics and comparative linguistics, and also be used as training data to verify the validity of statistical analysis in natural language processing researches.
|