研究概要 |
The main target of this research is to effectively detect changes in high-dimensional data under model free settings. We anticipated two major challenges in our original research scheme : a) The curse of dimensionality, b) the time-dependent data samples. During the first year, we have developed a novel method that tackles issue a) using the structure of variables and such method is also applicable in solving b) at the same time. Both conference and journal paper describing this methodology has been published in this year. Although in the research plan, we have argued that the dimensionality reduction provides a shortcut in handling high-dimensional data. However, instead of searching for a subspace that "compresses" data, exploiting the internal structure of variables may also offer us a solution. The interaction between the random variables constructs a Markov Network, which can be regarded as a structure of interactions. When such additional information available (it is available in many applications), we are able to perform high-dimensional change detection more efficiently. During this year, we have developed efficient algorithms that detect changes in high-dimensional Markov Networks. It has been a major leap toward developing an algorithm that solves challenge a) and b) directly at the same time, since the time-dependency can be regarded as a chain-shaped Markov Network. Without introducing any additional steps (such as dimensionality reduction), we are able to detect changes in high-dimensional time-series in just one shot.
|
現在までの達成度 (区分) |
現在までの達成度 (区分)
1: 当初の計画以上に進展している
理由
We anticipated two challenges in our plan. However, the study in the last year demonstrated an elegant solution to both of the challenges, which is out of our expectation. We have started to investigate some applications of our methodology, which is the 2^<nd> year task in our original report. In general, the progress is nearly 65% finished.
|
今後の研究の推進方策 |
As it has been reported in the original plan, after developing the methodology, we are going to investigate several applications in the 2^<nd> year, so the performance of our method can be evaluated. Since we have already started doing experiments on gene expression and twitter datasets, we will continue to look into some more datasets in bioinformatics and social media. On the other hand, we are also going to scale up our method on big data, with much larger dimensions (e. g. >500), and develop an efficient algorithm that solves such problem set within a reasonable amount of time.
|