• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Development of a Filter of Unsolicited Bulk E-mail based on language independent method

Research Project

Project/Area Number 18500072
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field Media informatics/Database
Research InstitutionUniversity of Tsukuba

Principal Investigator

SAKAGUCHI Tetsuo  University of Tsukuba, Graduate School of Library, Information and Media Studies, Associate Professor (10225790)

Co-Investigator(Kenkyū-buntansha) SUGIMOTO Shigeo  University of Tsukuba, Graduate School of Library, 'Information and Media Studies, Professor (40154489)
NAGAMORI Mitsuharu  University of Tsukuba, Graduate School of Library, 'Information and Media Studies, Associate Professor (60272209)
Project Period (FY) 2006 – 2007
Project Status Completed (Fiscal Year 2007)
Budget Amount *help
¥2,870,000 (Direct Cost: ¥2,600,000、Indirect Cost: ¥270,000)
Fiscal Year 2007: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2006: ¥1,700,000 (Direct Cost: ¥1,700,000)
Keywordselectronic mail / spam / multilingual text processing / automatic classification / unsolicited bulk e-mail / unicode / 多言語処理
Research Abstract

Recently, the increase of unsolicited bulk e-mail (UBE) becomes one of great problems on the Internet. One of major method to decrease the number of UBE is spam filter which automatically classifies e-mail based on automatic learning of the characteristics of e-mail message. However, such filters ordinarily have language dependency because they use morpheme analyzers for some specific languages to extract words from messages. So they have weakness on classification accuracy of e-mail written in languages not supported by their morpheme analyzers.
This research develops a filter of spam e-mail based on language independent method. The filter does not use morpheme analyzers for some specific languages but develops a method for extract characteristics of messages which independent on languages.
In 2006, we developed methods that extract fixed length character strings from messages. Through the evaluation of accuracy, we found a disadvantage of the method on languages which use phonetic symbols such as English. So in 2007, we developed a method that extracts variable length character strings from messages based on the character properties of the Unicode standard. The accuracy of the method is better than previous methods, especially on English e-mail corpus. Through this research, we found a further problem on making corpus for evaluating spam filters. The corpus must consist of both UBE and non-UBE, but non-UBE are hard to collect because they usually have privacies. This problem has impact to evaluating spam filters at academic society of anti-spam technologies.

Report

(3 results)
  • 2007 Annual Research Report   Final Research Report Summary
  • 2006 Annual Research Report
  • Research Products

    (3 results)

All 2008

All Journal Article (3 results) (of which Peer Reviewed: 2 results)

  • [Journal Article] 分割・統合可能な組織内 Web アーカイブシステムの構成方法2008

    • Author(s)
      柊和佑・阪口哲男・杉本重雄
    • Journal Title

      情報知識学会誌 Vol.18,No.1

      Pages: 47-57

    • NAID

      110006647782

    • Description
      「研究成果報告書概要(和文)」より
    • Related Report
      2007 Final Research Report Summary
    • Peer Reviewed
  • [Journal Article] An Archiecture of Institutional Web Archiving System that hava functions to Merge and Split Archives2008

    • Author(s)
      Wasuke Hiiragi, Tetsuo Sakaguchi, Shigeo Sugimoto
    • Journal Title

      Journal of Japan Society of Information and Knowledge 18-1

      Pages: 47-57

    • Description
      「研究成果報告書概要(欧文)」より
    • Related Report
      2007 Final Research Report Summary
  • [Journal Article] 分割・統合可能な組織内Webアーカイブシステムの構成方法2008

    • Author(s)
      柊和佑, 阪口哲男, 杉本重雄
    • Journal Title

      情報知識学会誌 Vol.18, No.1

      Pages: 47-57

    • NAID

      110006647782

    • Related Report
      2007 Annual Research Report
    • Peer Reviewed

URL: 

Published: 2006-04-01   Modified: 2016-04-21  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi