2016 Fiscal Year Research-status Report
Framework for Studying Language Evolution using Large Scale Data
Project/Area Number |
15K12158
|
Research Institution | Kyoto University |
Principal Investigator |
Adam Jatowt 京都大学, 情報学研究科, 特定准教授 (00415861)
|
Project Period (FY) |
2015-04-01 – 2018-03-31
|
Keywords | term semantic change / historical linguistics |
Outline of Annual Research Achievements |
First, more detailed investigation has been carried on the methods of measuring across-time similarity. We have tested both global approach that is based on a single transformation matrix as well as local approaches which provide transformation points specific to a given query. The transformation matrix is used to transform query term representation from one vector space (e.g., trained on documents in 2010s) to another vector space (e.g., trained on documents published in 1980s). Using transformation matrix we can compare terms at different times based on their neural network representations. In addition, we have tested the effect of the choice of examples for training the transformation matrix. Our initial idea was to use frequent terms as training term pairs due to their expected low semantic drift over time. We extended this idea by selecting training term pairs based on the degree of their semantic evolution. The latter was computed using neural network representation over each year in the past. Finally, we tested our approach on long term document collection. The results have been published in TKDE journal.
Our another effort went into automatically providing explanations for term similarity across time. This is done by looking for pairs of explanatory terms in the neighbourhood of query and its counterpart pair. Thanks to our approach it is possible to understand why given term such as iPod in 2010s is considered similar to term in another time period such as walkman in 1980s. This work has been published in BIGDATA2016 conference.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
The current effort is into improving the development for across-time similarity computation through clustering mechanism. The idea is to train transformation matrices on subsets of training seed pairs rather than on the whole set of training seed pairs.
We are also working on the development of term evolution analysis's interactive system and its evaluation. The system has been enriched in several non-standard functionalities such as estimating and quantifying evolution of term context. It is also possible to evaluate the semantic change of entire documents.
|
Strategy for Future Research Activity |
In the future, we plan to write a survey paper on the computational approaches for measuring semantic term change as well as provide more detailed interactive services for analyzing and understanding such changes.
One more idea is to quantify the evolution of concepts rather than single words and to involve the study of sentiment over time to be compared with the study of the term's semantic change.
|
Causes of Carryover |
Funding is needed for carrying research in the next fiscal year. The planned expenses will involve travel to international conferences, purchase of hardware and hiring programmers. We will also buy several books related to data mining and natural language analysis.
|
Expenditure Plan for Carryover Budget |
We plan to use the money first for the purchase of fast machine to conduct the research as well as for carrying it for presentations during international events. We need also to hire a person for supporting the project with programming.
|