2014 Fiscal Year Annual Research Report

文脈を考慮した数学的知識へのアクセスに関する研究

Research Project

Project/Area Number	14J09896
Research Institution	The University of Tokyo
Principal Investigator	KRISTIANTO GIOVANNIYOKO 東京大学, 情報理工学系研究科, 特別研究員(DC1)
Project Period (FY)	2014-04-25 – 2017-03-31
Keywords	Mathematical knowledge / Description / Dependency graph / Math formulae search / MathML indexing
Outline of Annual Research Achievements	The goal of this research is to design an intelligent browsing system for mathematical information that helps researchers to explore mathematical concepts shared across different scientific disciplines. To pursue this goal, this research attempts to establish a general framework for knowledge extraction based on semantic understanding of mathematical formulae. Following issues will be addressed in the research period: (1) Extracting and analyzing textual descriptions of math entities. (2) Capturing relationships between formulae within a document. (3) Finding related and similar formulae across documents. This is important for utilizing external math resources to compensate for implicitly assumed domain specific knowledge. Up to the end of the first year, we have developed a method to automatically extract textual descriptions of mathematical expressions. Furthermore, we also developed a method to capture relationships between similar or related mathematical expressions using a simple substring matching. We call a graph that depicts such relationships between mathematical expressions as "dependency graph". We then exploited the dependency graph to provide textual descriptions to symbols and sub-expressions inside mathematical expressions. Finally, we developed a mathematical search system to investigate the effectiveness of textual descriptions and dependency graph for retrieving mathematical expressions.
Current Status of Research Progress	Current Status of Research Progress 2: Research has progressed on the whole more than it was originally planned. Reason The progress we achieved in the first year was consistent with the research plan. As we planned, we accomplished two tasks required to establish a math browsing system. We have already developed a machine learning based method to automatically extract textual descriptions of mathematical expressions. This method was published in "The 3rd International Workshop on Mining Scientific Publications". Moreover, we also have developed a method to capture relationships between similar of related mathematical expressions (dependency graph of mathematical expressions). Subsequently, we built a mathematical search system that accepts both a mathematical expression and free text as query. The experimental results showed that the use of descriptions and dependency graph together delivered better retrieval performance than when no text provided in the queries. These results were published in "The 9th International Conference on Digital Information Management", where we won the "Best Paper Award". Finally, we also evaluated our search system by participating in the NTCIR-11 Math-2 task. The results of our participation were published in the NTCIR-11 proceedings.
Strategy for Future Research Activity	The research focus in the second year is to improve the number of mathematical expression relationships captured in current dependency graph. To achieve this goal, we need a heuristic method to capture relationships that were overlooked by substring matching method. Subsequently, we also need to detect the scope of math expressions, that is to investigate if the meaning of each mathematical expression is kept same within a document or not. Details of the next year plan are given as follows. (a) Performing manual annotation to create a gold-standard dataset of relationships between mathematical expressions. (b) Developing a heuristic method to create dependency graphs. We consider a heuristic that resembles generalization technique that is usually used in logic to do automated reasoning. (c) Evaluating the heuristic method using annotated data. The baseline method for the evaluation will be the substring matching method (d) Detecting the semantic scope of mathematical expressions within a document.

Research Products
(3 results)

All 2014

All Presentation (3 results)

[Presentation] The MCAT Math Retrieval System for NTCIR-11 Math Track2014
- Author(s)
  Giovanni Yoko Kristianto
- Organizer
  The 11th NTCIR Conference
- Place of Presentation
  国立情報学研究所、東京都
- Year and Date
  2014-12-09 – 2014-12-12
[Presentation] Exploiting Textual Descriptions and Dependency Graph for Searching Mathematical Expressions in Scientific Papers2014
- Author(s)
  Giovanni Yoko Kristianto
- Organizer
  The 9th International Conference on Digital Information Management
- Place of Presentation
  Bangkok, Thailand
- Year and Date
  2014-09-29 – 2014-10-01
[Presentation] Extracting Textual Descriptions of Mathematical Expressions in Scientific Papers2014
- Author(s)
  Giovanni Yoko Kristianto
- Organizer
  The 3rd International Workshop on Mining Scientific Publications
- Place of Presentation
  London, United Kingdom
- Year and Date
  2014-09-08 – 2014-09-12

2014 Fiscal Year Annual Research Report

文脈を考慮した数学的知識へのアクセスに関する研究

Principal Investigator

KRISTIANTO GIOVANNIYOKO 東京大学, 情報理工学系研究科, 特別研究員(DC1)

Current Status of Research Progress

Reason

Research Products

[Presentation] The MCAT Math Retrieval System for NTCIR-11 Math Track2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Exploiting Textual Descriptions and Dependency Graph for Searching Mathematical Expressions in Scientific Papers2014

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Extracting Textual Descriptions of Mathematical Expressions in Scientific Papers2014

Author(s)

Organizer

Place of Presentation

Year and Date