2014 Fiscal Year Final Research Report
Mining Numbers in Text for Various Kinds of Text Data
Project/Area Number |
24500162
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | The University of Tokushima (2013-2014) The University of Tokyo (2012) |
Principal Investigator |
YOSHIDA Minoru 徳島大学, ソシオテクノサイエンス研究部, 講師 (40361688)
|
Project Period (FY) |
2012-04-01 – 2015-03-31
|
Keywords | 数値情報抽出 / レイアウト解析 |
Outline of Final Research Achievements |
We studied a method for extracting contexts (i.e., attributes or topics) of numbers written in text. Our goal is to develop a system that accept numbers as queries and returns appropriate data from the various kinds of text data such as Wikipedia, Twitter, etc. To achieve this goal, we proposed a method for extracting numbers and their contexts applicable both to unstructured texts (e.g., sentences) and semi-structured texts (e.g., tables). Our method uses unsupervised learning algorithms based on probabilistic generative models for texts to extract attributes and hierarchical topics from Web documents. We also proposed a method to extract corpus-specific number expressions from any kind of text data. For number expressions, we found a coding scheme that can be used both for indexing and probabilistic generative models.
|
Free Research Field |
テキストマイニング
|