研究課題/領域番号 |
15K12158
|
研究機関 | 京都大学 |
研究代表者 |
Adam Jatowt 京都大学, 情報学研究科, 准教授 (00415861)
|
研究期間 (年度) |
2015-04-01 – 2018-03-31
|
キーワード | language evolution / temporal counterparts / across-time similarity |
研究実績の概要 |
The project assumes multi-perspective study of word evolution as well as concept evolution and related studies. In the first year of this project, the framework for analyzing similarity of words across time has been advanced. It enables finding words similar to each other from two distant time periods. For example, for an input word ipod the system finds its semantic counterpart such as walkman. By this it is possible to track the evolution of concept (e.g., a portable music device) across time. This work has been evaluated on the New York Times Annotated news article dataset and was published in ACL2015. Subsequently, we have developed an algorithm for explaining across-time similarity. For a given input pair of semantically similar words that existed in different time periods (e.g., walkman and ipod) it shows the explanation of the similarity. In other words, the objective is to provide evidence (e.g., music, portable, device) that the two input words from different time periods are indeed similar. The approach that we propose finds evidence of similarity in the form of word pairs. Finally, we have been exploring the evolution of entire concepts such as technology related concepts. We have proposed an algorithm that finds causal relations within words over time and by this it explores the evolution of technology concepts. It extends the state-of-the-art causality related research towards token level causality detection in text document collections. Our work has been published in WWW2016.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
1: 当初の計画以上に進展している
理由
The research until now focused on concrete applications of language evolution technologies such as temporal counterpart finding and concept evolution study. Currently, the focus of the project is being shifted to creating online interface for studying the language evolution by interactive means. According to the plan, the objective is to develop a ready to use and rich web interface for users to interact with big data from which information on word evolution can be extracted. We are using Google Books Ngram dataset that spans from 1600s to current decade for this purpose. The implementation of this service requires many novel research solutions.
|
今後の研究の推進方策 |
Besides the planned online interface for interactive exploration of word evolution, we plan to propose additional technologies for understanding how our language changed over time. One such direction is classifying words according to evolution types that they underwent such as pejoration, amelioration, sense narrowing or widening, etc. This can be achieved with machine learning framework given enough training data. For this we will also need to determine the set of possible word meaning changes based on the prior literature in evolutionary linguistics. Another direction is finding similar events in the past based on the previously designed technology of detecting temporal counterparts and across-time similarity studies.
|
次年度使用額が生じた理由 |
The reason for not spending the total amount of budget in the last year is reasonably good state of hardware owned in the university. Thus no new computers had to be bought. Much of the effort went also for designing algorithms rather than for their implementation due to the initial phase of the project.
|
次年度使用額の使用計画 |
The money shifted to this year from the previous year will mainly be spend for programming jobs and for travelling for international conferences. The project requires handling large amount of data which should be processed using big data technologies. In addition, I am planning to attend international conferences to disseminate the results of the project.
|