大規模文書データベースにおける用例文の高速検索と絞り込み技法の研究

研究課題

研究課題/領域番号	07680432
研究種目	基盤研究(C)
配分区分	補助金
応募区分	一般
研究分野	情報システム学(含情報図書館学)
研究機関	徳島大学
研究代表者	青江順一徳島大学, 工学部, 教授 (90108853)
研究期間 (年度)	1995 – 1997
研究課題ステータス	完了 (1997年度)
配分額 *注記	2,500千円 (直接経費: 2,500千円) 1997年度: 800千円 (直接経費: 800千円) 1996年度: 900千円 (直接経費: 900千円) 1995年度: 800千円 (直接経費: 800千円)
キーワード	用例文検索 / 文書データベース / 絞り込み検索 / 文書検索 / 情報検索 / 文書処理 / キーワード抽出 / 文書管理
研究概要	大規模文書データベースの検索で問題となるのは,キ-に多彩なの検索条件(名詞+の+名詞,場所概念+を,を+飲むなど)のように複数の検索条件を同時に与えたとき,目的とする用例文が高速に絞り込まれること,また,キ-に対する文番号列が非常に長くなる場合でも,同様に高速な検索が必要であるので,本研究では,次の点を明らかにした. 1. 全用例に対する文番号を文番号ベクトルとして構成し,各索引に対する文番号列を(文番号に対応する位置のビットを1とする)で表現した. 2. 文番号の比較による絞り込みは,文番号ベクトルの効率的な論理積で実行できるが,このベクトルは非常に長く,しかもスパースであるので,多段階の圧縮するデータ構造を提案し,対応する検索法も提案した. 3. 1億用例の文書データベースに対して,数百万以上の文番号列が存在しても,約数秒以内で検索と絞り込みができることを実験で確認した. 従来の研究では,キ-に対応する文番号列を逐次比較する手法であったので,検索条件数に比例して,絞り込み時間が長くかかっていた.しかし,本研究では,この検索条件が多くなるほど絞り込み速度は加速されるので,従来の手法の欠点をまったく覆す特色と独創性をもつ検索手法が実現する.また,提案手法は,二次記憶から補助記憶に大量の文番号列を転送する必要がなくなり,ディスクアクセスの高速化も同時に実現でき,従来では数分必要であった検索時間が数秒で実現できた. これらの成果は,大量に構築された電子化文書から目的とする文書データを処理する特許文書,辞書管理,社内文書など高速に検索と絞り込みに広く利用できることができるので,研究成果が与える社会的意義は非常に大きいといえる.

報告書

(4件)

研究成果
(27件)

すべてその他

すべて文献書誌 (27件)

[文献書誌] S.Shishibori: "Design of a Compact Data Structure for the Patricia Trie" IECE Trans.on Information and Systems. (印刷中). (1998)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  1997 研究成果報告書概要
[文献書誌] H.Mochizuki: "A Substring Search Algorithm in Extendeble Hashing" International Journal of Information Science. (印刷中). (1998)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  1997 研究成果報告書概要
[文献書誌] 有田健: "特徴ベクトルによる全文検索の一改善法" 情報処理学会論文誌. (印刷中). (1998)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  1997 研究成果報告書概要
[文献書誌] 小山雅史: "格構造解析における概念階層の効率的判定アルゴリズム" 情報処理学会論文誌. 39・3(印刷中). (1998)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  1997 研究成果報告書概要
[文献書誌] M.Fuketa: "An Efficient Algorithm for Retrieving Example Sentences" Intermnational Journal of Information Sciences. (印刷中). (1998)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  1997 研究成果報告書概要
[文献書誌] 泓田正雄: "大規模文書データに対する用例文の効率的検索アルゴリズム" 情報処理学会論文誌. 38・10. 2004-2013 (1997)
- 説明
  「研究成果報告書概要(和文)」より
- 関連する報告書
  1997 研究成果報告書概要
[文献書誌] J.Aoe, K.Morimoto, M.Shishibori and K-H.Park: "A Trie Compaction Algorithm for a Large Set of Keys" IEEE Transactions on Knowledge and Data Engineering. Vol.8, No.3. 476-491 (1996)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  1997 研究成果報告書概要
[文献書誌] M.Shishibori and J.Aoe: "Fast Allocation of Diagrams without Backtracking Processes" International Journal of Information Sciences. Vol.92, No.1-4. 65-85 (1996)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  1997 研究成果報告書概要
[文献書誌] T.Arita, M.Shishibori and J.Aoe: "An Efficient Algorithm for Full Text Retrieval for Multiple Keywords" International Journal of Information Sciences. Vol.104. 345-362 (1988)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  1997 研究成果報告書概要
[文献書誌] Masao Fuketa and Jun-ichi Aoe: "A Fast Algorithm of Retrieving Common Sentences" International Journal of Information Sciences. (in press). (1998)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  1997 研究成果報告書概要
[文献書誌] Masao Fuketa, Shoji Mizobuchi, Masami Shishibori and Jun-ichi Aoe: "An Efficient Algorithm for Retrieving Example Sentences" International Journal of Computer Mathematics. Vol.66, No.3-4 (in press). (1998)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  1997 研究成果報告書概要
[文献書誌] Masao Fuketa, Shoji Mizofuchi, and Jun-ichi Aoe: "A Fast Method of Determining Weighted Indexes from Text Databases" An International Journal of Information Processing and Management. (in press). (1998)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  1997 研究成果報告書概要
[文献書誌] H.Mochizuki, M.Koyama, M.Shishibori and J.Aoe: "A Substring Search Algorithm in Extendible Hashing" International Journal of Information Sciences. (in press). (1998)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  1997 研究成果報告書概要
[文献書誌] M.Shishibori, M.Okada, T.Sumitomo and J.Aoe: "Design of a Compact Data Structure for the Patricia Trie" IEICE Transactions on Information and Systems. (in press). (1998)
- 説明
  「研究成果報告書概要(欧文)」より
- 関連する報告書
  1997 研究成果報告書概要
[文献書誌] S.Shishibori: "Design of a Compact Data Structure for the Patricia Trie" IECE Trans.on Information and Systems. (印刷中). (1998)
- 関連する報告書
  1997 実績報告書
[文献書誌] H.Mochizuki: "A Substring Search Algorithm in Extendible Hashing" International Journal of Information Science. (印刷中). (1998)
- 関連する報告書
  1997 実績報告書
[文献書誌] 有田健: "特徴ベクトルによる全文検索の一改善法" 情報処理学会論文誌. (印刷中). (1998)
- 関連する報告書
  1997 実績報告書
[文献書誌] 小山雅史: "格構造解析における概念階層の効率的判定アルゴリズム" 情報処理学会論文誌. 39・3(印刷中). (1998)
- 関連する報告書
  1997 実績報告書
[文献書誌] M.Fuketa: "An Efficient Algorithm for Retrieving Example Santences" International Journal of Information Sciences. 印刷中. (1998)
- 関連する報告書
  1997 実績報告書
[文献書誌] 泓田正雄: "大規模文書データに対する用例文の効率的検索アルゴリズム" 情報処理学会論文誌. 38・10. 2004-2013 (1997)
- 関連する報告書
  1997 実績報告書
[文献書誌] J.AOE: "A Trie Compaction Algorithm for A Large Set of Keys" IEEE Transactions on Knowledge and Data Eng.(1996)
- 関連する報告書
  1996 実績報告書
[文献書誌] H.Iriguchi: "A Fast Retrienal Teclcnique for Large Graph Strufures" International J.of Computer Matcematics. (1996)
- 関連する報告書
  1996 実績報告書
[文献書誌] M.Shishibori: "An Order Searching Algorithm of Extondible Hashing" International J.of Compiter Mattcematics. (1996)
- 関連する報告書
  1996 実績報告書
[文献書誌] J.AOE: "A Trie Compaction Algorithm for Large Set of keys" IEEE Transactions on Knowledge and Data Engineering. (発表予定). (1996)
- 関連する報告書
  1995 実績報告書
[文献書誌] H.Iriguchi: "A Fast Retrieval Technique for Large Graph Structures" International Journal of Computer Mathematics. (発表予定). (1996)
- 関連する報告書
  1995 実績報告書
[文献書誌] M.Shishibori: "An Order Searching Algorithm of Extensible Hashing" International Journal of Computer Mathematics. (発表予定). (1996)
- 関連する報告書
  1995 実績報告書
[文献書誌] K-H.Park: "An Automatic Selection Method of key Search Algorithms" IECE Transactions on Information and Systems. E78-D. 383-393 (1995)
- 関連する報告書
  1995 実績報告書

大規模文書データベースにおける用例文の高速検索と絞り込み技法の研究

研究代表者

青江 順一 徳島大学, 工学部, 教授 (90108853)

2,500千円 (直接経費: 2,500千円)

報告書

研究成果

[文献書誌] S.Shishibori: "Design of a Compact Data Structure for the Patricia Trie" IECE Trans.on Information and Systems. (印刷中). (1998)

説明

関連する報告書

[文献書誌] H.Mochizuki: "A Substring Search Algorithm in Extendeble Hashing" International Journal of Information Science. (印刷中). (1998)

説明

関連する報告書

[文献書誌] 有田 健: "特徴ベクトルによる全文検索の一改善法" 情報処理学会論文誌. (印刷中). (1998)

説明

関連する報告書

[文献書誌] 小山 雅史: "格構造解析における概念階層の効率的判定アルゴリズム" 情報処理学会論文誌. 39・3(印刷中). (1998)

説明

関連する報告書

[文献書誌] M.Fuketa: "An Efficient Algorithm for Retrieving Example Sentences" Intermnational Journal of Information Sciences. (印刷中). (1998)

説明

関連する報告書

[文献書誌] 泓田 正雄: "大規模文書データに対する用例文の効率的検索アルゴリズム" 情報処理学会論文誌. 38・10. 2004-2013 (1997)

説明

関連する報告書

[文献書誌] J.Aoe, K.Morimoto, M.Shishibori and K-H.Park: "A Trie Compaction Algorithm for a Large Set of Keys" IEEE Transactions on Knowledge and Data Engineering. Vol.8, No.3. 476-491 (1996)

説明

関連する報告書

[文献書誌] M.Shishibori and J.Aoe: "Fast Allocation of Diagrams without Backtracking Processes" International Journal of Information Sciences. Vol.92, No.1-4. 65-85 (1996)

説明

関連する報告書

[文献書誌] T.Arita, M.Shishibori and J.Aoe: "An Efficient Algorithm for Full Text Retrieval for Multiple Keywords" International Journal of Information Sciences. Vol.104. 345-362 (1988)

説明

関連する報告書

[文献書誌] Masao Fuketa and Jun-ichi Aoe: "A Fast Algorithm of Retrieving Common Sentences" International Journal of Information Sciences. (in press). (1998)

説明

関連する報告書

[文献書誌] Masao Fuketa, Shoji Mizobuchi, Masami Shishibori and Jun-ichi Aoe: "An Efficient Algorithm for Retrieving Example Sentences" International Journal of Computer Mathematics. Vol.66, No.3-4 (in press). (1998)

説明

関連する報告書

[文献書誌] Masao Fuketa, Shoji Mizofuchi, and Jun-ichi Aoe: "A Fast Method of Determining Weighted Indexes from Text Databases" An International Journal of Information Processing and Management. (in press). (1998)

説明

関連する報告書

[文献書誌] H.Mochizuki, M.Koyama, M.Shishibori and J.Aoe: "A Substring Search Algorithm in Extendible Hashing" International Journal of Information Sciences. (in press). (1998)

説明

関連する報告書

[文献書誌] M.Shishibori, M.Okada, T.Sumitomo and J.Aoe: "Design of a Compact Data Structure for the Patricia Trie" IEICE Transactions on Information and Systems. (in press). (1998)

説明

関連する報告書

[文献書誌] S.Shishibori: "Design of a Compact Data Structure for the Patricia Trie" IECE Trans.on Information and Systems. (印刷中). (1998)

関連する報告書

[文献書誌] H.Mochizuki: "A Substring Search Algorithm in Extendible Hashing" International Journal of Information Science. (印刷中). (1998)

関連する報告書

[文献書誌] 有田 健: "特徴ベクトルによる全文検索の一改善法" 情報処理学会論文誌. (印刷中). (1998)

関連する報告書

[文献書誌] 小山 雅史: "格構造解析における概念階層の効率的判定アルゴリズム" 情報処理学会論文誌. 39・3(印刷中). (1998)

関連する報告書

[文献書誌] M.Fuketa: "An Efficient Algorithm for Retrieving Example Santences" International Journal of Information Sciences. 印刷中. (1998)

関連する報告書

[文献書誌] 泓田 正雄: "大規模文書データに対する用例文の効率的検索アルゴリズム" 情報処理学会論文誌. 38・10. 2004-2013 (1997)

関連する報告書

[文献書誌] J.AOE: "A Trie Compaction Algorithm for A Large Set of Keys" IEEE Transactions on Knowledge and Data Eng.(1996)

関連する報告書

[文献書誌] H.Iriguchi: "A Fast Retrienal Teclcnique for Large Graph Strufures" International J.of Computer Matcematics. (1996)

関連する報告書

[文献書誌] M.Shishibori: "An Order Searching Algorithm of Extondible Hashing" International J.of Compiter Mattcematics. (1996)

関連する報告書

[文献書誌] J.AOE: "A Trie Compaction Algorithm for Large Set of keys" IEEE Transactions on Knowledge and Data Engineering. (発表予定). (1996)

関連する報告書

[文献書誌] H.Iriguchi: "A Fast Retrieval Technique for Large Graph Structures" International Journal of Computer Mathematics. (発表予定). (1996)

関連する報告書

[文献書誌] M.Shishibori: "An Order Searching Algorithm of Extensible Hashing" International Journal of Computer Mathematics. (発表予定). (1996)

関連する報告書

[文献書誌] K-H.Park: "An Automatic Selection Method of key Search Algorithms" IECE Transactions on Information and Systems. E78-D. 383-393 (1995)

関連する報告書

青江順一徳島大学, 工学部, 教授 (90108853)

[文献書誌] 有田健: "特徴ベクトルによる全文検索の一改善法" 情報処理学会論文誌. (印刷中). (1998)

[文献書誌] 小山雅史: "格構造解析における概念階層の効率的判定アルゴリズム" 情報処理学会論文誌. 39・3(印刷中). (1998)

[文献書誌] 泓田正雄: "大規模文書データに対する用例文の効率的検索アルゴリズム" 情報処理学会論文誌. 38・10. 2004-2013 (1997)

[文献書誌] 有田健: "特徴ベクトルによる全文検索の一改善法" 情報処理学会論文誌. (印刷中). (1998)

[文献書誌] 小山雅史: "格構造解析における概念階層の効率的判定アルゴリズム" 情報処理学会論文誌. 39・3(印刷中). (1998)

[文献書誌] 泓田正雄: "大規模文書データに対する用例文の効率的検索アルゴリズム" 情報処理学会論文誌. 38・10. 2004-2013 (1997)