共進化メカニズムに基づく語-文書クラスタリングに関する研究

研究課題

研究課題/領域番号	13680473
研究種目	基盤研究(C)
配分区分	補助金
応募区分	一般
研究分野	知能情報学
研究機関	国立情報学研究所
研究代表者	相澤彰子国立情報学研究所, 情報基盤研究系, 助教授 (90222447)
研究期間 (年度)	2001 – 2002
研究課題ステータス	完了 (2002年度)
配分額 *注記	4,100千円 (直接経費: 4,100千円) 2002年度: 2,200千円 (直接経費: 2,200千円) 2001年度: 1,900千円 (直接経費: 1,900千円)
キーワード	情報検索 / 双対的クラスタリング / テキスト自動分類 / 確率重み付き情報量 / マイクロクラスタリング / 共進化アルゴリズム / 進化論的計算 / 学会発表データベース
研究概要	本研究では、「クラスタ指向インデキシング」と呼ぶ情報検索の枠組みを提案し、代表的ないくつかの文書コレクションへの適用による実証面での評価を行った。提案手法は、申請者の提案による「確率重み付き情報量」を評価基準として語や文書の同時クラスタリングを行うもので、関連文書や語のマイニングによるグループ化を情報検索におけるインデキシング操作と対応付けて、検索用資源の自動構築および活用を目指す点が特徴である。また、現実的な規模の文書コレクションに対応するために、確率的に生成した初期クラスタに対して局所的な最適化を適用しており、遺伝的アルゴリズムにおける「共進化的な」アプローチを情報検索分野に適用したものであるといえる。本研究ではまた、NTCIRlから抽出した学会発表論文の抄録、毎日新聞・日経新聞のCD-ROM版、海外の新聞記事であるReutersやFinancial Times等、数万件から数十万件の規模の文書コレクションに対して提案手法を適用して有効性を調べた。テキスト分類問題の枠組みを用いた評価によって、分類の再現率は、やや落ちるものの優れた機械学習法として知られるサポートベクタマシンとほぼ互角であり、従来の自動分類では扱いがむずかしかったカテゴリ境界上の文書のグループ化が可能であることを確認した。

報告書

(3件)

2002 実績報告書研究成果報告書概要
2001 実績報告書

研究成果
(27件)

すべてその他

すべて文献書誌 (27件)

[文献書誌] Akiko Aizawa: "An Information-Theoretic Perspective of Tf-idf Measures"Information Processing & Management. 39. 45-65 (2003)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] Akiko Aizawa: "A Method of Cluster-Based Indexing of Textual Data"Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002). 1-7 (2002)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] Akiko Aizawa: "A Co-evolutionary Framework for Clustering in Information Retrieval Systems"Proc. of the IEEE 2002 Congress on Evolutionary Computation. 1787-1792 (2002)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] Akiko Aizawa: "An Approach to Microscopic Clustering of Terms and Documents"PRICAI 2002 : Trends in Artifitial Intelligence, M. Ishizuka, A. Satter (Eds). LNAI2417 Springer. 404-413 (2002)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] Akiko Aizawa: "Linguistic Techniques to Improve the Performance of Automatic Text Categorization"Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS2001). 307-314 (2001)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] Akiko Aizawa, Kyo Kageura: "Calculating Association between Technical Terms Based on Co-occurrences in Keyword Lists of Academic Papers"Systems and Computers in Japan. 34(3). 85-95 (2002)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] Akiko Aizawa: ""Designed Sampling with Crossover Operators", chapter of "Advances in Evolutionary Computing" edited by A. Ghosh and S. Tsutsui"Springer. 413-439 (2003)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] Akiko Aizawa: "An Information-Theoretic Perspective of Tf-idf Measures"Information Processing & Management. 39. 45-65 (2003)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] Akiko Aizawa: "A Method of Cluster-Based Indexing of Textual Data"Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002). 1-7 (2002)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] Akiko Aizawa: "A Co-evolutionary Framework for Clustering in Information Retrieval Systems"Proc. of the IEEE 2002 Congress on Evolutionary Computation. 1787-1792 (2002)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] Akiko Aizawa: "Linguistic Techniques to Improve the Performance of Automatic Text Categorization"Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS2001). 307-314 (2001)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] Akiko Aizawa and Kyo Kageura: "Calculating Association between Technical Terms Based on Co-occurrences in Keyword Lists of Academic Papers"Systems and Computers in Japan. 34(3). 85-95 (2002)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] Akiko Aizawa, M. Ishizuka, A. Satter (Eds.): "An Approach to Microscopic Clustering of Terms and Documents, in PRICAI 2002 : Trends in Artifitial Intelligence. LNAI2417"Springer. 404-413 (2002)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] Akiko Aizawa, edited by A. Ghosh and S. Tsutsui: "Designed Sampling with Crossover Operators, in chapter of Advances in Evolutionary Computing"Springer. (2003)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  2002 研究成果報告書概要
[文献書誌] Akiko Aizawa: "An Information-Theoretic Perspective of Tf-idf Measuress"Information Processing & Management. 39. 45-65 (2003)
- 関連する報告書
  2002 実績報告書
[文献書誌] Akiko Aizawa: "A Method of Cluster-Based Indexing of Textual Data"Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002). 1-7 (2002)
- 関連する報告書
  2002 実績報告書
[文献書誌] Akiko Aizawa: "A Co-evolutionary Framework for Clustering in Information Retrieval Systems"Proc. of the IEEE 20O2 Congress on Evolutionary Computation. 1787-1792 (2002)
- 関連する報告書
  2002 実績報告書
[文献書誌] Akiko Aizawa: "An Approach to Microscopic Clustering of Terms and Documents"PRICAI 2002 : Trends in Artifitial Intelligence, M.Ishizuka, A.Satter (Eds). LNAI2417 Springer. 404-413 (2002)
- 関連する報告書
  2002 実績報告書
[文献書誌] 相澤彰子: "テキスト文書のマイクロクラスタリングに関する検討"情報処理学会自然言語処理研究会. NL-150. 111-117 (2002)
- 関連する報告書
  2002 実績報告書
[文献書誌] Akiko Aizawa, Kyo Kageura: "Calculating Association between Technical Terms Based on Co-occurrences in Keyword Lists of Academic Papers"Systems and Computers in Japan. 34(3). 85-95 (2002)
- 関連する報告書
  2002 実績報告書
[文献書誌] Akiko Aizawa: ""Designed Sampling with Crossover Operators", chapter of "Advances in Evolutionary Computing" edited by A.Ghosh and S.Tsutsui"Springer. 413-439 (2003)
- 関連する報告書
  2002 実績報告書
[文献書誌] 相澤彰子: "Naive手法によるテキスト分類問題へのアプローチ"2001年情報論的学習理論ワークショップ予稿集. 123-128 (2001)
- 関連する報告書
  2001 実績報告書
[文献書誌] Akiko Aizawa: "Linguistic Techniques to Improve the Performance of Automatic Text Categorization"Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS2001). 307-314 (2001)
- 関連する報告書
  2001 実績報告書
[文献書誌] 相澤彰子: "Naive手法による大規模テキスト分類問題へのアプローチ"情報処理学会自然言語処理研究報告. 147-7. 41-46 (2002)
- 関連する報告書
  2001 実績報告書
[文献書誌] 相澤彰子: "情報空間における双対的クラスタリングの試み"人工知能学会人工知能基礎論研究会資料(第48回). SIG-FAI-A104. 85-90 (2002)
- 関連する報告書
  2001 実績報告書
[文献書誌] Akiko Aizawa: "An Information-Theoretic Perspective of Tf-idf Measures"Information Processing & Management. (accepted).
- 関連する報告書
  2001 実績報告書
[文献書誌] Akiko Aizawa: "A Co-evolutionary Framework for Clustering in Information Retrieval Systems"the IEEE 2002 Congress on Evolutionary Computation. (accepted).
- 関連する報告書
  2001 実績報告書

共進化メカニズムに基づく語-文書クラスタリングに関する研究

研究代表者

相澤 彰子 国立情報学研究所, 情報基盤研究系, 助教授 (90222447)

4,100千円 (直接経費: 4,100千円)

報告書

研究成果

[文献書誌] Akiko Aizawa: "An Information-Theoretic Perspective of Tf-idf Measures"Information Processing & Management. 39. 45-65 (2003)

説明

関連する報告書

[文献書誌] Akiko Aizawa: "A Method of Cluster-Based Indexing of Textual Data"Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002). 1-7 (2002)

説明

関連する報告書

[文献書誌] Akiko Aizawa: "A Co-evolutionary Framework for Clustering in Information Retrieval Systems"Proc. of the IEEE 2002 Congress on Evolutionary Computation. 1787-1792 (2002)

説明

関連する報告書

[文献書誌] Akiko Aizawa: "An Approach to Microscopic Clustering of Terms and Documents"PRICAI 2002 : Trends in Artifitial Intelligence, M. Ishizuka, A. Satter (Eds). LNAI2417 Springer. 404-413 (2002)

説明

関連する報告書

[文献書誌] Akiko Aizawa: "Linguistic Techniques to Improve the Performance of Automatic Text Categorization"Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS2001). 307-314 (2001)

説明

関連する報告書

[文献書誌] Akiko Aizawa, Kyo Kageura: "Calculating Association between Technical Terms Based on Co-occurrences in Keyword Lists of Academic Papers"Systems and Computers in Japan. 34(3). 85-95 (2002)

説明

関連する報告書

[文献書誌] Akiko Aizawa: ""Designed Sampling with Crossover Operators", chapter of "Advances in Evolutionary Computing" edited by A. Ghosh and S. Tsutsui"Springer. 413-439 (2003)

説明

関連する報告書

[文献書誌] Akiko Aizawa: "An Information-Theoretic Perspective of Tf-idf Measures"Information Processing & Management. 39. 45-65 (2003)

説明

関連する報告書

[文献書誌] Akiko Aizawa: "A Method of Cluster-Based Indexing of Textual Data"Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002). 1-7 (2002)

説明

関連する報告書

[文献書誌] Akiko Aizawa: "A Co-evolutionary Framework for Clustering in Information Retrieval Systems"Proc. of the IEEE 2002 Congress on Evolutionary Computation. 1787-1792 (2002)

説明

関連する報告書

[文献書誌] Akiko Aizawa: "Linguistic Techniques to Improve the Performance of Automatic Text Categorization"Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS2001). 307-314 (2001)

説明

関連する報告書

[文献書誌] Akiko Aizawa and Kyo Kageura: "Calculating Association between Technical Terms Based on Co-occurrences in Keyword Lists of Academic Papers"Systems and Computers in Japan. 34(3). 85-95 (2002)

説明

関連する報告書

[文献書誌] Akiko Aizawa, M. Ishizuka, A. Satter (Eds.): "An Approach to Microscopic Clustering of Terms and Documents, in PRICAI 2002 : Trends in Artifitial Intelligence. LNAI2417"Springer. 404-413 (2002)

説明

関連する報告書

[文献書誌] Akiko Aizawa, edited by A. Ghosh and S. Tsutsui: "Designed Sampling with Crossover Operators, in chapter of Advances in Evolutionary Computing"Springer. (2003)

説明

関連する報告書

[文献書誌] Akiko Aizawa: "An Information-Theoretic Perspective of Tf-idf Measuress"Information Processing & Management. 39. 45-65 (2003)

関連する報告書

[文献書誌] Akiko Aizawa: "A Method of Cluster-Based Indexing of Textual Data"Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002). 1-7 (2002)

関連する報告書

[文献書誌] Akiko Aizawa: "A Co-evolutionary Framework for Clustering in Information Retrieval Systems"Proc. of the IEEE 20O2 Congress on Evolutionary Computation. 1787-1792 (2002)

関連する報告書

[文献書誌] Akiko Aizawa: "An Approach to Microscopic Clustering of Terms and Documents"PRICAI 2002 : Trends in Artifitial Intelligence, M.Ishizuka, A.Satter (Eds). LNAI2417 Springer. 404-413 (2002)

関連する報告書

[文献書誌] 相澤彰子: "テキスト文書のマイクロクラスタリングに関する検討"情報処理学会自然言語処理研究会. NL-150. 111-117 (2002)

関連する報告書

[文献書誌] Akiko Aizawa, Kyo Kageura: "Calculating Association between Technical Terms Based on Co-occurrences in Keyword Lists of Academic Papers"Systems and Computers in Japan. 34(3). 85-95 (2002)

関連する報告書

[文献書誌] Akiko Aizawa: ""Designed Sampling with Crossover Operators", chapter of "Advances in Evolutionary Computing" edited by A.Ghosh and S.Tsutsui"Springer. 413-439 (2003)

関連する報告書

[文献書誌] 相澤 彰子: "Naive手法によるテキスト分類問題へのアプローチ"2001年情報論的学習理論ワークショップ予稿集. 123-128 (2001)

関連する報告書

[文献書誌] Akiko Aizawa: "Linguistic Techniques to Improve the Performance of Automatic Text Categorization"Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium (NLPRS2001). 307-314 (2001)

関連する報告書

[文献書誌] 相澤彰子: "Naive手法による大規模テキスト分類問題へのアプローチ"情報処理学会 自然言語処理研究報告. 147-7. 41-46 (2002)

関連する報告書

[文献書誌] 相澤彰子: "情報空間における双対的クラスタリングの試み"人工知能学会 人工知能基礎論研究会資料(第48回). SIG-FAI-A104. 85-90 (2002)

関連する報告書

[文献書誌] Akiko Aizawa: "An Information-Theoretic Perspective of Tf-idf Measures"Information Processing & Management. (accepted).

関連する報告書

[文献書誌] Akiko Aizawa: "A Co-evolutionary Framework for Clustering in Information Retrieval Systems"the IEEE 2002 Congress on Evolutionary Computation. (accepted).

関連する報告書

相澤彰子国立情報学研究所, 情報基盤研究系, 助教授 (90222447)

[文献書誌] 相澤彰子: "Naive手法によるテキスト分類問題へのアプローチ"2001年情報論的学習理論ワークショップ予稿集. 123-128 (2001)

[文献書誌] 相澤彰子: "Naive手法による大規模テキスト分類問題へのアプローチ"情報処理学会自然言語処理研究報告. 147-7. 41-46 (2002)

[文献書誌] 相澤彰子: "情報空間における双対的クラスタリングの試み"人工知能学会人工知能基礎論研究会資料(第48回). SIG-FAI-A104. 85-90 (2002)