Mining Numbers in Text for Various Kinds of Text Data
Project/Area Number |
24500162
|
Research Category |
Grant-in-Aid for Scientific Research (C)
|
Allocation Type | Multi-year Fund |
Section | 一般 |
Research Field |
Intelligent informatics
|
Research Institution | The University of Tokushima (2013-2014) The University of Tokyo (2012) |
Principal Investigator |
YOSHIDA Minoru 徳島大学, ソシオテクノサイエンス研究部, 講師 (40361688)
|
Project Period (FY) |
2012-04-01 – 2015-03-31
|
Project Status |
Completed (Fiscal Year 2014)
|
Budget Amount *help |
¥5,070,000 (Direct Cost: ¥3,900,000、Indirect Cost: ¥1,170,000)
Fiscal Year 2014: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000)
Fiscal Year 2013: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Fiscal Year 2012: ¥2,860,000 (Direct Cost: ¥2,200,000、Indirect Cost: ¥660,000)
|
Keywords | 数値情報抽出 / レイアウト解析 / 表形式解析 / 数値表現解析 / テキストマイニング / 数値情報 / 表形式 / 数値表現 |
Outline of Final Research Achievements |
We studied a method for extracting contexts (i.e., attributes or topics) of numbers written in text. Our goal is to develop a system that accept numbers as queries and returns appropriate data from the various kinds of text data such as Wikipedia, Twitter, etc. To achieve this goal, we proposed a method for extracting numbers and their contexts applicable both to unstructured texts (e.g., sentences) and semi-structured texts (e.g., tables). Our method uses unsupervised learning algorithms based on probabilistic generative models for texts to extract attributes and hierarchical topics from Web documents. We also proposed a method to extract corpus-specific number expressions from any kind of text data. For number expressions, we found a coding scheme that can be used both for indexing and probabilistic generative models.
|
Report
(4 results)
Research Products
(11 results)