Mining Numbers in Text for Various Kinds of Text Data

Research Project

Project/Area Number	24500162
Research Category	Grant-in-Aid for Scientific Research (C)
Allocation Type	Multi-year Fund
Section	一般
Research Field	Intelligent informatics
Research Institution	The University of Tokushima (2013-2014) The University of Tokyo (2012)
Principal Investigator	YOSHIDA Minoru 徳島大学, ソシオテクノサイエンス研究部, 講師 (40361688)
Project Period (FY)	2012-04-01 – 2015-03-31
Project Status	Completed (Fiscal Year 2014)
Budget Amount *help	¥5,070,000 (Direct Cost: ¥3,900,000、Indirect Cost: ¥1,170,000) Fiscal Year 2014: ¥650,000 (Direct Cost: ¥500,000、Indirect Cost: ¥150,000) Fiscal Year 2013: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000) Fiscal Year 2012: ¥2,860,000 (Direct Cost: ¥2,200,000、Indirect Cost: ¥660,000)
Keywords	数値情報抽出 / レイアウト解析 / 表形式解析 / 数値表現解析 / テキストマイニング / 数値情報 / 表形式 / 数値表現
Outline of Final Research Achievements	We studied a method for extracting contexts (i.e., attributes or topics) of numbers written in text. Our goal is to develop a system that accept numbers as queries and returns appropriate data from the various kinds of text data such as Wikipedia, Twitter, etc. To achieve this goal, we proposed a method for extracting numbers and their contexts applicable both to unstructured texts (e.g., sentences) and semi-structured texts (e.g., tables). Our method uses unsupervised learning algorithms based on probabilistic generative models for texts to extract attributes and hierarchical topics from Web documents. We also proposed a method to extract corpus-specific number expressions from any kind of text data. For number expressions, we found a coding scheme that can be used both for indexing and probabilistic generative models.

Report

(4 results)

2014 Annual Research Report Final Research Report ( PDF )
2013 Research-status Report
2012 Research-status Report

Research Products
(11 results)

All 2014 2013 Other

All Journal Article (2 results) (of which Peer Reviewed: 2 results, Open Access: 1 results) Presentation (9 results)

[Journal Article] Extraction Japanese Slang from Weblog Data Based on Script Type and Stroke Count2014
- Author(s)
  Kazuyuki Matsumoto, Kyosuke Akita, Xielifuguli Keranmu, Minoru Yoshida and Kenji Kita
- Journal Title
  
  Procedia Computer Science
  
  Volume: 35 Pages: 464-473
- DOI
  10.1016/j.procs.2014.08.127
- Related Report
  2014 Annual Research Report
- Peer Reviewed / Open Access
[Journal Article] Analysis of Long-term Market Trend by Text-Mining of News Articles2013
- Author(s)
  藏本貴久, 和泉潔, 吉村　忍, 石田智也, 中嶋啓浩, 松井藤五郎, 吉田稔, 中川裕志
- Journal Title
  
  Transactions of the Japanese Society for Artificial Intelligence
  
  Volume: 28 Issue: 3 Pages: 291-296
- DOI
  10.1527/tjsai.28.291
- NAID
  130003362333
- ISSN
  1346-0714, 1346-8030
- Related Report
  2012 Research-status Report
- Peer Reviewed
[Presentation] Reranking the Search Results for Lyric Retrieval Based on the Songwriters' Specific Usage of Words,2014
- Author(s)
  Kazuyuki Matsumoto, Sasayama Manabu, Qingmei Xiao, Fujisawa Akira, Minoru Yoshida and Kenji Kita
- Organizer
  The proceedings of the 4th international conference on electronics, communications and networks (CECNet2014),
- Place of Presentation
  サンワールドホテル北京（北京、中国）
- Year and Date
  2014-12-14
- Related Report
  2014 Annual Research Report
[Presentation] Extracting Corpus-Specific Strings by Using Suffix Arrays Enhanced with Longest Common Prefix,2014
- Author(s)
  Minoru Yoshida, Kazuyuki Matsumoto, Qingmei Xiao, Xielifuguli Keranmu, Kenji Kita and Hiroshi Nakagawa
- Organizer
  Proceedings of the 10th Asia Information Retrieval Society Conference (AIRS 2014), LNCS 8870
- Place of Presentation
  グランドマルゲリータホテル（クチン、マレーシア）
- Year and Date
  2014-12-05
- Related Report
  2014 Annual Research Report
[Presentation] Emotion Predicting Method Based on Emotion State Change of Personae according to the Other's Utterance2014
- Author(s)
  Kazuyuki Matsumoto, Fuji Ren, Qingmei Xiao, Minoru Yoshida and Kenji Kita
- Organizer
  Proceedings of the 3rd IEEE International Conference on Cloud Computing and Intelligence Systems(CCIS2014),
- Place of Presentation
  香港理工大学（香港、中国）
- Year and Date
  2014-11-29
- Related Report
  2014 Annual Research Report
[Presentation] Unsupervised Analysis of Web Page Semantic Structures by Hierarchical Bayesian Modeling
- Author(s)
  Minoru Yoshida, Kazuyuki Matsumoto, Kenji Kita and Hiroshi Nakagawa
- Organizer
  Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2014
- Place of Presentation
  シャングリ･ラファーイースタンプラザホテル台南（台南市，中国）
- Related Report
  2013 Research-status Report
[Presentation] Identifying who drew the illustration focusing on the eyes of the characters
- Author(s)
  Akira Fujisawa, Kazuyuki Matsumoto, Minoru Yoshida and Kenji Kita
- Organizer
  Proceedings of 20th Korea-Japan Joint Workshop on Frontiers of Computer Vision
- Place of Presentation
  沖縄工業高等専門学校（沖縄県）
- Related Report
  2013 Research-status Report
[Presentation] ソーシャルメディアからの地域固有表現の抽出
- Author(s)
  加藤宏紀, 荒牧英治, 宮部真衣, 吉田稔, 佐藤一誠, 中川裕志
- Organizer
  第4 回集合知シンポジウム
- Place of Presentation
  東京
- Related Report
  2012 Research-status Report
[Presentation] 製品修理作業レポートと付随する数値データの関係性分析
- Author(s)
  山本忠, 吉田稔, 中川裕志, 渋谷久恵, 前田俊二
- Organizer
  第15 回情報論的学習理論ワークショップ(IBIS2012)
- Place of Presentation
  東京
- Related Report
  2012 Research-status Report
[Presentation] テキスト中の数値情報マイニングと情報編纂：MuST 参加から見えてきたもの
- Author(s)
  吉田稔, 杉浦隆博, 廣川敬真, 山田剛一, 増田英孝, 中川裕志
- Organizer
  人工知能学会第26 回全国大会(JSAI 2012)
- Place of Presentation
  山口
- Related Report
  2012 Research-status Report
[Presentation] 新聞記事のテキストマイニングによる長期市場動向の分析
- Author(s)
  蔵本貴久, 和泉潔, 吉村忍, 石田智也, 中嶋啓浩, 松井藤五郎, 吉田稔, 中川裕志
- Organizer
  人工知能学会第26 回全国大会(JSAI 2012)
- Place of Presentation
  山口
- Related Report
  2012 Research-status Report

Mining Numbers in Text for Various Kinds of Text Data

Principal Investigator

YOSHIDA Minoru 徳島大学, ソシオテクノサイエンス研究部, 講師 (40361688)

¥5,070,000 (Direct Cost: ¥3,900,000、Indirect Cost: ¥1,170,000)

Report

Research Products

[Journal Article] Extraction Japanese Slang from Weblog Data Based on Script Type and Stroke Count2014

Author(s)

Journal Title

DOI

Related Report

[Journal Article] Analysis of Long-term Market Trend by Text-Mining of News Articles2013

Author(s)

Journal Title

DOI

NAID

ISSN

Related Report

[Presentation] Reranking the Search Results for Lyric Retrieval Based on the Songwriters' Specific Usage of Words,2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Extracting Corpus-Specific Strings by Using Suffix Arrays Enhanced with Longest Common Prefix,2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Emotion Predicting Method Based on Emotion State Change of Personae according to the Other's Utterance2014

Author(s)

Organizer

Place of Presentation

Year and Date

Related Report

[Presentation] Unsupervised Analysis of Web Page Semantic Structures by Hierarchical Bayesian Modeling

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] Identifying who drew the illustration focusing on the eyes of the characters

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] ソーシャルメディアからの地域固有表現の抽出

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] 製品修理作業レポートと付随する数値データ の関係性分析

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] テキスト中の数値情報マイニン グと情報編纂：MuST 参加から見えてきたもの

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] 新聞記事の テキストマイニングによる長期市場動向の分析

Author(s)

Organizer

Place of Presentation

Related Report

[Presentation] 製品修理作業レポートと付随する数値データの関係性分析

[Presentation] テキスト中の数値情報マイニングと情報編纂：MuST 参加から見えてきたもの

[Presentation] 新聞記事のテキストマイニングによる長期市場動向の分析