2009 Fiscal Year Annual Research Report

トピックの特性を言語間で比較・対照分析する多言語ウェブテキストマイニングの研究

Research Project

Project/Area Number	20300032
Research Institution	University of Tsukuba
Principal Investigator	宇津呂武仁 University of Tsukuba, 大学院・システム情報工学研究科, 准教授 (90263433)
Co-Investigator(Kenkyū-buntansha)	藤井敦東京工業大学, 大学院・情報理工学研究科, 准教授 (30302433)
Keywords	ディレクトリ・情報検索 / 多言語処理 / テキストマイニング / トピック分析 / ブログ / ニュース / スパムブログ / Wikipedia
Research Abstract	本研究では,ウェブ上で収集可能な多言語ニュース・ブログ・電子掲示板等の文書を情報源として,多言語での報道内容,関心動向や,意見の分布を分析し,国・文化・言語の間にどのような違いがあるのかを発見する過程を支援するテキストマイニング技術について研究を行った.平成21年度は,以下の研究を行った. (1)同一トピックの多言語ブログを高精度で収集し,文化間差異の発見支援を効果的に実現するためには,Wikipediaエントリを知識源として多言語ブログを索引付けする手法の性能が十分高いことが必須条件である.Wikipediaエントリから抽出した関連語の頻度をスコアとして用いる手法,および,それらの頻度を機械学習の素性として用いる手法の評価を行い,既存の検索エンジンAPIの性能を大幅に改善できることを実証した. (2)文化間差異の発見過程を支援するインタフェースを用いて,多数のトピックを対象として,文化間差異発見過程の定性的評価を行った.さらに,サンプルトピックを対象として,日英間の差異が大きいもの,中程度のもの,小さいものに大別できることを示した. (3)教師なし学習によるスパムブログ検出の枠組みにおいて,各ブログサイトの間のHTML構造の類似性を利用する方式を考案し,主要10ブログホストのうちの半数以上を対象として,その有効性を検証した.この方式により,年とともに変化するスパムブログの傾向に追随して,HTML構造の類似するスパムブログを発見することが容易になった.

Research Products
(5 results)

All 2009

All Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (4 results)

[Journal Article] Wikipedia概念体系とブログ空間の間のトピック対応の推定2009
- Author(s)
  川場真理子, 中崎寛之, 横本大輔, 宇津呂武仁, 福原知宏
- Journal Title
  
  日本データベース学会論文誌 8
  
  Pages: 17-22
- Peer Reviewed
[Presentation] Cross-Lingual Analysis of Concerns and Reports on Crimes in Blogs2009
- Author(s)
  Hiroyuki Nakasaki, Yusuke Abe, Takehito Utsuro, Yasuhide Kawada, Tomohiro Fukuhara, Noriko Kando, Masaharu Yoshioka, Hiroshi Nakagawa, Yoji Kiyota
- Organizer
  Workshop on Mining User-Generated Content for Security
- Place of Presentation
  ベニス,イタリア
- Year and Date
  2009-12-09
[Presentation] Towards Conceptual Indexing of the Blogosphere through Wikipedia Topic Hierarchy2009
- Author(s)
  Mariko Kawaba, Daisuke Yokomoto, Hiroyuki Nakasaki, Takehito Utsuro, Tomohiro Fukuhara
- Organizer
  23rd Pacific Asia Conference on Language, Information and Computation
- Place of Presentation
  香港,中華人民共和国
- Year and Date
  2009-12-05
[Presentation] Visualizing Cross-Lingual/Cross-Cultural Differences in Concerns in Multilingual Blogs2009
- Author(s)
  Hiroyuki Nakasaki, Mariko Kawaba, Sayuri Yamazaki, Takehito Utsuro, Tomohiro Fukuhara
- Organizer
  3rd International AAAI Conference on Weblogs and Social Media
- Place of Presentation
  サンノゼ,アメリカ
- Year and Date
  2009-05-19
[Presentation] An Empirical Study on Selective Sampling in Active Learning for Splog Detection2009
- Author(s)
  Taichi Katayama, Yuuki Sato, Takehito Utsuro, Takayuki Yoshinaka, Yasuhide Kawada, Tomohiro Fukuhara
- Organizer
  5th International Workshop on Adversarial Information Retrieval on the Web
- Place of Presentation
  マドリッド,スペイン
- Year and Date
  2009-04-21

2009 Fiscal Year Annual Research Report

トピックの特性を言語間で比較・対照分析する多言語ウェブテキストマイニングの研究

Principal Investigator

宇津呂 武仁 University of Tsukuba, 大学院・システム情報工学研究科, 准教授 (90263433)

Research Products

[Journal Article] Wikipedia概念体系とブログ空間の間のトピック対応の推定2009

Author(s)

Journal Title

[Presentation] Cross-Lingual Analysis of Concerns and Reports on Crimes in Blogs2009

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Towards Conceptual Indexing of the Blogosphere through Wikipedia Topic Hierarchy2009

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] Visualizing Cross-Lingual/Cross-Cultural Differences in Concerns in Multilingual Blogs2009

Author(s)

Organizer

Place of Presentation

Year and Date

[Presentation] An Empirical Study on Selective Sampling in Active Learning for Splog Detection2009

Author(s)

Organizer

Place of Presentation

Year and Date

宇津呂武仁 University of Tsukuba, 大学院・システム情報工学研究科, 准教授 (90263433)