The Development of a Search Engine for Academic Papers in Web
Project/Area Number |
21300095
|
Research Category |
Grant-in-Aid for Scientific Research (B)
|
Allocation Type | Single-year Grants |
Section | 一般 |
Research Field |
Library and information science/Humanistic social informatics
|
Research Institution | Keio University |
Principal Investigator |
UEDA Shuichi 慶應義塾大学, 文学部, 教授 (50134218)
|
Co-Investigator(Kenkyū-buntansha) |
AGATA Teru 亜細亜大学, 国際関係学部, 准教授 (80306505)
EIKEUCHI Atsushi 筑波大学, 図書館情報メディア研究科, 准教授 (80338607)
|
Co-Investigator(Renkei-kenkyūsha) |
ISHIDA Emi 九州大学, 附属図書館, 准教授 (50364815)
NOZUE Michiko (財)鉄道総合技術研究所, その他部局等, 研究員 (40426044)
|
Project Period (FY) |
2009 – 2011
|
Project Status |
Completed (Fiscal Year 2011)
|
Budget Amount *help |
¥17,940,000 (Direct Cost: ¥13,800,000、Indirect Cost: ¥4,140,000)
Fiscal Year 2011: ¥5,070,000 (Direct Cost: ¥3,900,000、Indirect Cost: ¥1,170,000)
Fiscal Year 2010: ¥6,890,000 (Direct Cost: ¥5,300,000、Indirect Cost: ¥1,590,000)
Fiscal Year 2009: ¥5,980,000 (Direct Cost: ¥4,600,000、Indirect Cost: ¥1,380,000)
|
Keywords | 学術論文 / 検索エンジン / ウェブ構造 / 情報検索 / 自動分類 / 機械学習 / 学術情報 / サーチエンジン / ウェブ |
Research Abstract |
Open access scientific papers available on the Web could be searched through several search engines. For example, Google scholar has higher coverage of literature, although it does not necessarily guarantee free access to full text. We have developed and evaluated the "Aletheia" search engine for full text academic papers. The system obtains PDF files on a broad range of topics and automatically detects academic papers using classifiers based on text and structure features. We have built PDF database collection containing 3 million Japanese PDF files, five types of Weka classifiers(AdaBoost, Decision Tree(C4. 5), Naive Bayes, Random Forest, and Support Vector Machine) were separately trained for 20, 000 test collection using 10-fold cross-validation to automatically detect academic papers. The features were generated using hand-built rules and consisted by the three types of features : structure, URL, and content.
|
Report
(4 results)
Research Products
(24 results)
-
-
-
-
-
-
[Presentation] 学術論文の構成要素と構造2012
Author(s)
上田修一, 安形輝, 池内淳, 石田栄美, 宮田洋輔
Organizer
2011年度日本図書館情報学会春季研究集会
Place of Presentation
三重大学
Year and Date
2012-05-12
Related Report
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
[Presentation] Analyzing OPAC Use with Screen Views and Eye Tracking2009
Author(s)
Ishita, Emi, Mine, Shinji ; Koizumi, Masanori ; Miyata, Yosuke ; Kunimoto, Chihiro ; Shiozaki, Junko ; Kurata, Keiko ; Ueda, Shuichi
Organizer
ACM/IEEE Joint Conference on Digital libraries : Designing tomorrow, preserving the past-today(JCDL09)
Place of Presentation
University of Texas
Related Report
-
-
-