Augmenting Terminologies through Proactive Extraction of Term Translation Pairs from the Web
Project/Area Number |
24650122
|
Research Category |
Grant-in-Aid for Challenging Exploratory Research
|
Allocation Type | Multi-year Fund |
Research Field |
Library and information science/Humanistic social informatics
|
Research Institution | The University of Tokyo |
Principal Investigator |
KAGEURA Kyo 東京大学, 大学院情報学環, 教授 (00211152)
|
Co-Investigator(Kenkyū-buntansha) |
TAKEUCHI Koichi 岡山大学, 大学院自然科学研究科, 講師 (80311174)
|
Project Period (FY) |
2012-04-01 – 2015-03-31
|
Project Status |
Completed (Fiscal Year 2014)
|
Budget Amount *help |
¥3,900,000 (Direct Cost: ¥3,000,000、Indirect Cost: ¥900,000)
Fiscal Year 2014: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Fiscal Year 2013: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2012: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
|
Keywords | 専門語彙 / Webクローリング / 対訳抽出 / 語彙成長 / 語彙ネットワーク |
Outline of Final Research Achievements |
How native and borrowed constituent elements contribute to the construction of technical terminology, how these elements are used when the terminology glows. By defining terminological network (with terms as vertices and shared constituents as edges) and constituent network (with constituent elements as vertices and co-occurrence in terms as edges), indices to evaluate consistency and coherency of terminology were defined. By using these observations, we developed a method of producing bilingual new term pair candidates from existing terminologies and validating them through monolingual and comparable domain corpora obtained from the web. Experiments have shown that the performance of bilingual term crawling is at least comparable with existing corpus-based extraction method, and complementary in the sense that they extract different types of pairs, which are more relevant to existing terminologies. Theoretical implications of this work was clarified in terms of lexicograpic issues.
|
Report
(4 results)
Research Products
(8 results)