2014 Fiscal Year Annual Research Report
Project/Area Number |
14J09896
|
Research Institution | The University of Tokyo |
Principal Investigator |
KRISTIANTO GIOVANNIYOKO 東京大学, 情報理工学系研究科, 特別研究員(DC1)
|
Project Period (FY) |
2014-04-25 – 2017-03-31
|
Keywords | Mathematical knowledge / Description / Dependency graph / Math formulae search / MathML indexing |
Outline of Annual Research Achievements |
The goal of this research is to design an intelligent browsing system for mathematical information that helps researchers to explore mathematical concepts shared across different scientific disciplines. To pursue this goal, this research attempts to establish a general framework for knowledge extraction based on semantic understanding of mathematical formulae. Following issues will be addressed in the research period: (1) Extracting and analyzing textual descriptions of math entities. (2) Capturing relationships between formulae within a document. (3) Finding related and similar formulae across documents. This is important for utilizing external math resources to compensate for implicitly assumed domain specific knowledge.
Up to the end of the first year, we have developed a method to automatically extract textual descriptions of mathematical expressions. Furthermore, we also developed a method to capture relationships between similar or related mathematical expressions using a simple substring matching. We call a graph that depicts such relationships between mathematical expressions as "dependency graph". We then exploited the dependency graph to provide textual descriptions to symbols and sub-expressions inside mathematical expressions. Finally, we developed a mathematical search system to investigate the effectiveness of textual descriptions and dependency graph for retrieving mathematical expressions.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
The progress we achieved in the first year was consistent with the research plan. As we planned, we accomplished two tasks required to establish a math browsing system.
We have already developed a machine learning based method to automatically extract textual descriptions of mathematical expressions. This method was published in "The 3rd International Workshop on Mining Scientific Publications". Moreover, we also have developed a method to capture relationships between similar of related mathematical expressions (dependency graph of mathematical expressions). Subsequently, we built a mathematical search system that accepts both a mathematical expression and free text as query. The experimental results showed that the use of descriptions and dependency graph together delivered better retrieval performance than when no text provided in the queries. These results were published in "The 9th International Conference on Digital Information Management", where we won the "Best Paper Award".
Finally, we also evaluated our search system by participating in the NTCIR-11 Math-2 task. The results of our participation were published in the NTCIR-11 proceedings.
|
Strategy for Future Research Activity |
The research focus in the second year is to improve the number of mathematical expression relationships captured in current dependency graph. To achieve this goal, we need a heuristic method to capture relationships that were overlooked by substring matching method. Subsequently, we also need to detect the scope of math expressions, that is to investigate if the meaning of each mathematical expression is kept same within a document or not. Details of the next year plan are given as follows. (a) Performing manual annotation to create a gold-standard dataset of relationships between mathematical expressions. (b) Developing a heuristic method to create dependency graphs. We consider a heuristic that resembles generalization technique that is usually used in logic to do automated reasoning. (c) Evaluating the heuristic method using annotated data. The baseline method for the evaluation will be the substring matching method (d) Detecting the semantic scope of mathematical expressions within a document.
|
Research Products
(3 results)