• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2011 Fiscal Year Final Research Report

The Development of a Search Engine for Academic Papers in Web

Research Project

  • PDF
Project/Area Number 21300095
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypeSingle-year Grants
Section一般
Research Field Library and information science/Humanistic social informatics
Research InstitutionKeio University

Principal Investigator

UEDA Shuichi  慶應義塾大学, 文学部, 教授 (50134218)

Co-Investigator(Kenkyū-buntansha) AGATA Teru  亜細亜大学, 国際関係学部, 准教授 (80306505)
EIKEUCHI Atsushi  筑波大学, 図書館情報メディア研究科, 准教授 (80338607)
Co-Investigator(Renkei-kenkyūsha) ISHIDA Emi  九州大学, 附属図書館, 准教授 (50364815)
NOZUE Michiko  (財)鉄道総合技術研究所, その他部局等, 研究員 (40426044)
Project Period (FY) 2009 – 2011
Keywords学術論文 / 検索エンジン / ウェブ構造 / 情報検索 / 自動分類 / 機械学習
Research Abstract

Open access scientific papers available on the Web could be searched through several search engines. For example, Google scholar has higher coverage of literature, although it does not necessarily guarantee free access to full text. We have developed and evaluated the "Aletheia" search engine for full text academic papers. The system obtains PDF files on a broad range of topics and automatically detects academic papers using classifiers based on text and structure features. We have built PDF database collection containing 3 million Japanese PDF files, five types of Weka classifiers(AdaBoost, Decision Tree(C4. 5), Naive Bayes, Random Forest, and Support Vector Machine) were separately trained for 20, 000 test collection using 10-fold cross-validation to automatically detect academic papers. The features were generated using hand-built rules and consisted by the three types of features : structure, URL, and content.

  • Research Products

    (11 results)

All 2012 2011 2010 2009

All Journal Article (1 results) (of which Peer Reviewed: 1 results) Presentation (10 results)

  • [Journal Article] 深層ウェブの実態とその要因:機関リポジトリに登録された文献を用いた調査2012

    • Author(s)
      宮田洋輔, 安形輝, 池内淳, 石田栄美, 上田修一
    • Journal Title

      日本図書館情報学会誌

      Volume: Vol.58, No.2

    • Peer Reviewed
  • [Presentation] 学術論文の構成要素と構造宮田洋輔2012

    • Author(s)
      石田栄美, 池内淳, 安形輝, 上田修一
    • Organizer
      2012年度日本図書館情報学会春季研究集会
    • URL

      http://web.keio.jp/~uedas/papers/webir121.pdf

    • Place of Presentation
      三重大学
    • Year and Date
      2012-05-12
  • [Presentation] 学術論文に特化した検索エンジンの構築と評価2012

    • Author(s)
      石田栄美, 安形輝, 宮田洋輔, 池内淳, 上田修一
    • Organizer
      2012年度日本図書館情報学会春季研究集会
    • URL

      http://web.keio.jp/~uedas/papers/webir122.pdf

    • Place of Presentation
      三重大学
    • Year and Date
      2012-05-12
  • [Presentation] Detecting Academic Papers on the Web2011

    • Author(s)
      Emi Ishita, Teru Agata, Atsushi Ikeuchi, Yosuke Miyata, Shuichi Ueda
    • Organizer
      JCDL11
    • URL

      http://web.keio.jp/~uedas/papers/webir112.pdf

    • Place of Presentation
      Ontario, Canada
    • Year and Date
      20110613-17
  • [Presentation] 大規模日本語PDFファイル集合からの学術論文の自動判定2011

    • Author(s)
      石田栄美, 安形輝, 宮田洋輔, 池内淳, 上田修一
    • Organizer
      2011年度日本図書館情報学会春季研究集会
    • URL

      http://web.keio.jp/~uedas/papers/webir111.pdf

    • Place of Presentation
      東京学芸大学
    • Year and Date
      2011-05-14
  • [Presentation] The Deep Web in Institutional Repositories in Japan2010

    • Author(s)
      Teru Agata, Yosuke Miyata, Atsushi Ikeuchi, Shuichi Ueda
    • Organizer
      ASIST 2010
    • Place of Presentation
      Pittsburgh Pennsylvania, USA
    • Year and Date
      20101022-27
  • [Presentation] 学術情報に特化した検索エンジンの開発:機械学習による英語論文の自動判定2010

    • Author(s)
      安形輝, 池内淳, 石田栄美, 宮田洋輔, 上田修一
    • Organizer
      2009年日本図書館情報学会研究大会発表要綱
    • Place of Presentation
      藤女子大学
    • Year and Date
      20101009-10
  • [Presentation] A Search Engine for Japanese Academic Papers2010

    • Author(s)
      Emi Ishita, Teru Agata, Atsushi Ikeuchi, Michiko Nozue, Yosuke Miyata, Shuichi Ueda
    • Organizer
      JCDL 2010
    • Place of Presentation
      Gold Coast Queensland, Australia
    • Year and Date
      20100621-25
  • [Presentation] 学術論文PDFの自動判定:学習用集合が判定性能に与える影響2010

    • Author(s)
      宮田洋輔, 安形輝, 池内淳, 石田栄美, 上田修一
    • Organizer
      2010年度日本図書館情報学会春季研究集会
    • Place of Presentation
      同志社大学
    • Year and Date
      2010-05-29
  • [Presentation] Analyzing OPAC Use with Screen Views and Eye Tracking2009

    • Author(s)
      Ishita, Emi, Mine, Shinji ; Koizumi, Masanori ; Miyata, Yosuke ; Kunimoto, Chihiro ; Shiozaki, Junko ; Kurata, Keiko ; Ueda, Shuichi
    • Organizer
      ACM/IEEE Joint Conference on Digital libraries : Designing tomorrow, preserving the past-today(JCDL09)
    • Place of Presentation
      University of Texas
    • Year and Date
      20090615-19
  • [Presentation] 学術情報流通における深層ウェブの実態-機関リポジトリに登録された文献を用いた調査2009

    • Author(s)
      安形輝, 宮田洋輔, 池内淳, 上田修一
    • Organizer
      2009年度三田図書館・情報学会研究大会発表論文集
    • Place of Presentation
      慶應義塾大学
    • Year and Date
      2009-09-26

URL: 

Published: 2013-07-31  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi