• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Credibility Analysis of Web contents based on 10 billion Web pages

Research Project

Project/Area Number 17KT0085
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypeMulti-year Fund
Section特設分野
Research Field The Information Society and Trust
Research InstitutionWaseda University

Principal Investigator

YAMANA HAYATO  早稲田大学, 理工学術院, 教授 (40230502)

Project Period (FY) 2017-07-18 – 2022-03-31
Project Status Completed (Fiscal Year 2021)
Budget Amount *help
¥18,590,000 (Direct Cost: ¥14,300,000、Indirect Cost: ¥4,290,000)
Fiscal Year 2020: ¥2,080,000 (Direct Cost: ¥1,600,000、Indirect Cost: ¥480,000)
Fiscal Year 2019: ¥5,980,000 (Direct Cost: ¥4,600,000、Indirect Cost: ¥1,380,000)
Fiscal Year 2018: ¥6,110,000 (Direct Cost: ¥4,700,000、Indirect Cost: ¥1,410,000)
Fiscal Year 2017: ¥4,420,000 (Direct Cost: ¥3,400,000、Indirect Cost: ¥1,020,000)
KeywordsWebコンテンツ / 信憑性 / 信頼性 / フィッシング / Webクローラ / 検索エンジン / ビッグデータ
Outline of Final Research Achievements

In this research project, efficient web page crawlers (gathering programs), web page content analysis methods, methods for estimating web content reliability without accessing web contents (i.e., using only URLs), revealing the problems of previous benchmarks where the ground truth is usually based on human first-impression decisions, and distributing the related research survey of web content reliability have been completed. Especially, the crawler achieved a 10% improvement in efficiency compared to previous methods, and the method that can judge credibility using only URLs (achieving an accuracy of 99.4%) achieved significant results for future practical use, as it can judge credibility using only URLs without accessing content.

Academic Significance and Societal Importance of the Research Achievements

日々の暮らしに必要不可欠な存在となったWebコンテンツについて,その信頼性を判定する指標(判定手法)を考案することで,今後さらに巧妙となってくる信憑性・信頼性が低いWebコンテンツを自動判定する仕組みを構築することができた.構築された基盤技術を用いて今後ツールを構築していくことで,インターネット利用者が安心してWebコンテンツを利活用できる基盤を築くことができた.さらに,本分野の研究において欠くことのできないベンチマークの問題点を明らかにし,今後の本分野の研究のあり方を提言することができた.

Report

(6 results)
  • 2021 Annual Research Report   Final Research Report ( PDF )
  • 2020 Research-status Report
  • 2019 Research-status Report
  • 2018 Research-status Report
  • 2017 Research-status Report
  • Research Products

    (20 results)

All 2022 2021 2020 2019 2018 2017 Other

All Int'l Joint Research (1 results) Journal Article (8 results) (of which Int'l Joint Research: 2 results,  Peer Reviewed: 8 results,  Open Access: 2 results) Presentation (11 results)

  • [Int'l Joint Research] カセサート大学(タイ)

    • Related Report
      2018 Research-status Report
  • [Journal Article] A Survey on Explainable Fake News Detection2022

    • Author(s)
      Ken MISHIMA, Hayato YAMANA
    • Journal Title

      IEICE Transactions on Information and Systems

      Volume: E105.D Issue: 7 Pages: 1249-1257

    • DOI

      10.1587/transinf.2021EDR0003

    • ISSN
      0916-8532, 1745-1361
    • Year and Date
      2022-07-01
    • Related Report
      2021 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Segmentation-based Phishing URL Detection2021

    • Author(s)
      Ent Sandi Aung, Hayato YAMANA
    • Journal Title

      Proceedings of WI-IAT '21: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

      Volume: 1 Pages: 550-556

    • DOI

      10.1145/3486622.3493983

    • Related Report
      2021 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] URL-based Phishing Detection using the Entropy of Non-Alphanumeric Characters2019

    • Author(s)
      Eint Sandi Aung, Hayato Yamana
    • Journal Title

      Proc. of the 21st International Conference on Information Integration and Web-based Applications & Services

      Volume: 1 Pages: 385-392

    • DOI

      10.1145/3366030.3366064

    • Related Report
      2019 Research-status Report
    • Peer Reviewed
  • [Journal Article] Effectiveness of Usability & Performance Features for Web Credibility Evaluation2019

    • Author(s)
      Kenta Yamada, Hayato Yamana
    • Journal Title

      Proc. of IEEE BigData 2019

      Volume: 1 Pages: 6257-6259

    • DOI

      10.1109/bigdata47090.2019.9006419

    • Related Report
      2019 Research-status Report
    • Peer Reviewed
  • [Journal Article] Efficient Topical Focused Crawling Through Neighborhood Feature2018

    • Author(s)
      Tanaphol Suebchua, Bundit Manaskasemsak, Arnon Rungsawang, Hayato Yamana
    • Journal Title

      New Generation Computing

      Volume: 36-2 Issue: 2 Pages: 95-118

    • DOI

      10.1007/s00354-017-0029-8

    • Related Report
      2018 Research-status Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] External Content-dependent Features for Web Credibility Evaluation2018

    • Author(s)
      Kazuyoshi Ootani and Hayato Yamana
    • Journal Title

      Proc. of IEEE BigData 2018

      Volume: 1 Pages: 5314-5416

    • DOI

      10.1109/bigdata.2018.8622398

    • Related Report
      2018 Research-status Report
    • Peer Reviewed
  • [Journal Article] History-enhanced Focused Website Segment Crawler2018

    • Author(s)
      Tanaphol Suebchua, Bundit Manaskasemsak, Arnon Rungsawang, Hayato YAMANA
    • Journal Title

      Proc. of IEEE the 32nd International Conference on Information Networking

      Volume: - Pages: 80-85

    • DOI

      10.1109/icoin.2018.8343090

    • Related Report
      2017 Research-status Report
    • Peer Reviewed / Int'l Joint Research
  • [Journal Article] A Variable-Length Motifs Discovery Method in Time Series using Hybrid Approach2017

    • Author(s)
      Chaw Zan, Hayato YAMANA
    • Journal Title

      Proc. of the 19th International Conference on Information Integration and Web-based Applications & Services

      Volume: - Pages: 49-57

    • DOI

      10.1145/3151759.3151781

    • Related Report
      2017 Research-status Report
    • Peer Reviewed
  • [Presentation] Phishing URL Detection using Information-rich Domain and Path Features2021

    • Author(s)
      Eint Sandi Aung, Hayato Yamana
    • Organizer
      第13回データ工学と情報マネジメントに関するフォーラム(DEIM2021)(日本データベース学会、電子情報通信学会、情報処理学会)
    • Related Report
      2020 Research-status Report
  • [Presentation] 語彙の出現位置と頻度による文体類似度を用いた文章の執筆者数推定2021

    • Author(s)
      渡邉 充博, Eint Sandi Aung, 山名 早人
    • Organizer
      第13回データ工学と情報マネジメントに関するフォーラム(DEIM2021)(日本データベース学会、電子情報通信学会、情報処理学会)
    • Related Report
      2020 Research-status Report
  • [Presentation] Malicious URL detection : a survey2020

    • Author(s)
      Eint Sandi Aung, Hayato Yamana
    • Organizer
      第12回データ工学と情報マネジメントに関するフォーラム(DEIM2020)
    • Related Report
      2019 Research-status Report
  • [Presentation] Webサイトのユーザビリティとパフォーマンスに注目した信頼性評価手法の提案2020

    • Author(s)
      山田健太, Eint Sandi Aung, 山名早人
    • Organizer
      第12回データ工学と情報マネジメントに関するフォーラム(DEIM2020)
    • Related Report
      2019 Research-status Report
  • [Presentation] 文体変化と文体類似度を用いた文章の執筆者数推定2020

    • Author(s)
      渡邉充博, Eint Sandi Aung, 山名早人
    • Organizer
      第12回データ工学と情報マネジメントに関するフォーラム(DEIM2020)
    • Related Report
      2019 Research-status Report
  • [Presentation] 日本語の文章を対象にした執筆者数推定2019

    • Author(s)
      塩浦尚久, 山名早人
    • Organizer
      DEIM2019 第11回データ工学と情報マネジメントに関するフォーラム
    • Related Report
      2018 Research-status Report
  • [Presentation] 新聞記事における発言引用部分の特定手法および引用方法の分類手法の提案 -ファクトチェック支援への応用-2019

    • Author(s)
      山田健太, 真鍋智紀, 山名早人
    • Organizer
      DEIM2019 第11回データ工学と情報マネジメントに関するフォーラム
    • Related Report
      2018 Research-status Report
  • [Presentation] Enhancing Focused Crawler through Genre Detection,2019

    • Author(s)
      Qian Jiayi, Tanaphol Suebchua, Hayato Yamana
    • Organizer
      DEIM2019 第11回データ工学と情報マネジメントに関するフォーラム
    • Related Report
      2018 Research-status Report
  • [Presentation] A Survey of URL-based Phishing Detection2019

    • Author(s)
      Eint Sandi Aung, Chaw Thet Zan, Hayato Yamana
    • Organizer
      DEIM2019 第11回データ工学と情報マネジメントに関するフォーラム
    • Related Report
      2018 Research-status Report
  • [Presentation] 特定分野における単語重要度CrRvの提案と和英短文を対象とした著者専門性推定への応用2018

    • Author(s)
      滝川 真弘, 山名 早人
    • Organizer
      第10回データ工学と情報マネジメントに関するフォーラム
    • Related Report
      2017 Research-status Report
  • [Presentation] 特定分野における単語重要度計算手法の提案と短い文章における著者の専門性推定への適応2017

    • Author(s)
      滝川 真弘, 山名 早人
    • Organizer
      第233回自然言語処理研究会(情報処理学会)
    • Related Report
      2017 Research-status Report

URL: 

Published: 2017-07-21   Modified: 2023-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi