2015 Fiscal Year Annual Research Report
Project/Area Number |
14J09896
|
Research Institution | The University of Tokyo |
Principal Investigator |
KRISTIANTO GIOVANNIYOKO 東京大学, 情報理工学系研究科, 特別研究員(DC1)
|
Project Period (FY) |
2014-04-25 – 2017-03-31
|
Keywords | Mathematical Knowledge / Dependency relationships / Math search system / MathML indexing / Learning-to-rank / Unification |
Outline of Annual Research Achievements |
The goal of this research is to design an intelligent browsing system for mathematical information that helps researchers to explore mathematical concepts shared across different scientific disciplines. To pursue this goal, this research attempts to establish a general framework for knowledge extraction based on semantic understanding of mathematical expressions. Following issues will be addressed in the research period. 1. Extracting and analyzing textual descriptions of math entities. 2. Capturing dependency relationships between math expressions within a document. 3. Building a concept graph using the relationships between expressions obtained from (2). The extraction of descriptions for mathematical expressions has already been done in the first year. Up to the end of the second year, we have developed a heuristic method to capture dependency relationships between similar or related mathematical expressions. This method successfully extracted the relationships with an accuracy of 82.44%. This accuracy is good compared to the baseline method (65.41% accuracy) and our initial method developed in the first year (73.38%). Furthermore, we exploited these dependency relationships to provide textual descriptions to symbols and sub-expressions inside mathematical expressions. We then utilized this information in a mathematical search system to investigate the effectiveness of exploiting dependency relationships between math expressions for retrieving math expressions.
|
Current Status of Research Progress |
Current Status of Research Progress
2: Research has progressed on the whole more than it was originally planned.
Reason
The progress we achieved in the second year is consistent with the research plan. As we planned, we successfully improve the extraction accuracy of dependency relationships between math expressions and also demonstrate its effectiveness in math search system.
Our heuristic method to extract the dependency relationships between math expressions was submitted to the "Information Retrieval Journal". However, we are still waiting for its review result. Meanwhile, we improved the accuracy of our math search system using dependency relationships and several other additional techniques. We first investigate the effectiveness of applying machine learning (learning-to-rank) algorithms to our system. The experimental results showing the effectiveness of combining score normalization and these learning-to-rank were published in "The 3rd International Workshop on Digitization and E-Inclusion in Mathematics and Science 2016".
Finally, we also evaluated our search system by participating in the NTCIR-12 MathIR Task, where our system came out as the top performer. The results of our participation will appear in the NTCIR-12 proceedings in June.
|
Strategy for Future Research Activity |
For constructing an intelligent browsing system for math information that helps researchers to explore math concepts, we need not only a math search system, but also a knowledge/concept graph. The research focus in the third year is to obtain a knowledge/concept graph from each scientific document. To achieve this goal, we plan to utilize the dependency graph of math expressions. We need to investigate if such concept graph can be directly obtained by applying any of the graph clustering or community detection methods to the dependency graph of math expressions. We also need to evaluate the accuracy of the obtained concept graph. Details of the next academic year plan are given as follows: (a) developing a gold-standard concept graph from Wikipedia, (b) performing graph clustering or community detection over the dependency graph of math expressions to obtain initial automatically constructed concept graph, (c) Depending on the result of step b, we may need to exploit other types of information, such as surrounding text or topics of each math expression, to obtain better concept graph, (d) Quantitatively evaluating the concept graph using the annotated data obtained from step a, and (e) Qualitatively evaluating the concept graph. We can interpret the concept graph obtained for each document as the requisite knowledge required to understand the document.
|
Research Products
(3 results)