2008 Fiscal Year Annual Research Report

時系列多重トピックモデルによる情報共有法の研究

Research Project

Project/Area Number	19300032
Research Institution	National Institute of Informatics
Principal Investigator	高須淳宏 National Institute of Informatics, コンテンツ科学研究系, 教授 (90216648)
Keywords	テキスト処理 / トピックモデル / 機械学習
Research Abstract	本研究は、複数の人間が係わるプロジェクトで生成・収集される各種情報を共有し活用するための情報共有システムの構築法を考案することを目的としている。特に時間情報を考慮した、時系列文書の処理技術に焦点をあてて、情報共有システムを構築することをめざしている。平成20年度は、まず、大規模な時系列文書モデルとして潜在トピックから文書のタイムスタンプと語彙の両方を同時に出力するモデルを構築した。タイプスタンプの情報を用いることによって、時間情報を考慮した文書生成モデルとなっている。また、モデルの精度を向上させるためには、文書中のすべての単語を用いる代わりに、より情報量の多い固有名の抽出法について検討を進めた。また、モデルの応用システムとして、ブログデータを対象とし、スパムブログを検出するための手法の研究を行った。この研究では、まず、ブログ中に重複して現れる比較的長い部分文字列を効率よく抽出する方法を提案した。そして、この部分文字列をスパムブログを検出するための重要な特徴として用いることによって、精度よく大規模データからスプログをフィルタリングするシステムを構築した。

Research Products
(4 results)

All 2009 2008

All Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (3 results)

[Journal Article] 複数文字列検知に基づいたSplogフィルタリング手法2009
- Author(s)
  竹田隆治, 高須淳宏
- Journal Title
  
  情報処理学会論文誌データベース Vol. 2, No. 1
  
  Pages: 93-103
- Peer Reviewed
[Presentation] Information Organization System for Duplicated Information Sources2009
- Author(s)
  Takaharu Takeda, Atsuhiro Takasu
- Organizer
  IADIS International Conference on Inforamtion Systems
- Place of Presentation
  スペイン、バルセロナ
- Year and Date
  2009-02-25
[Presentation] Bibliographic Element Extraction from Scanned Documents Using Conditional Random Field2008
- Author(s)
  Manabu Ohta, Atsuhiro Takasu
- Organizer
  International Conference on Digital Information Management
- Place of Presentation
  英国、ロンドン
- Year and Date
  2008-11-13
[Presentation] CRF-based Authors' Name Tagging for Scanned Documents2008
- Author(s)
  Manabu Ohta, Atsuhiro Takasu
- Organizer
  ACM IEEE Joint Conference on Digital Libraries
- Place of Presentation
  米国、ピッツバーグ
- Year and Date
  2008-06-18

2008 Fiscal Year Annual Research Report

時系列多重トピックモデルによる情報共有法の研究

Principal Investigator

高須 淳宏 National Institute of Informatics, コンテンツ科学研究系, 教授 (90216648)

Research Products

[Journal Article] 複数文字列検知に基づいたSplogフィルタリング手法2009

Author(s)

Journal Title

[Presentation] Information Organization System for Duplicated Information Sources2009

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Bibliographic Element Extraction from Scanned Documents Using Conditional Random Field2008

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] CRF-based Authors' Name Tagging for Scanned Documents2008

Author(s)

Organizer

Place of Presentation

Year and Date

高須淳宏 National Institute of Informatics, コンテンツ科学研究系, 教授 (90216648)