• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to project page

2012 Fiscal Year Final Research Report

Text Mining for Languages of All Ages and Countries

Research Project

  • PDF
Project/Area Number 22500140
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field Intelligent informatics
Research InstitutionShonan Institute of Technology

Principal Investigator

SUZUKI Makoto  湘南工科大学, 工学部, 准教授 (80339796)

Co-Investigator(Renkei-kenkyūsha) OHSUGA Akihiko  電気通信大学, 大学院・情報システム学研究科, 教授 (90393842)
GOTO Masayuki  早稲田大学, 創造理工学部・経営システム工学科, 教授 (40287967)
SUKO Tota  早稲田大学, メディアネットワークセンター, 助教 (40409660)
Project Period (FY) 2010 – 2012
Keywords多言語処理 / 機械学習 / モデル化 / 文書自動分類 / N-gram
Research Abstract

We proposed the accumulation method, which is a language-independent text classification method that is based on the character N-gram. The accumulation method does not depend on the language structure, because this method uses the character N-gram to form index terms. If text documents are expressed in Unicode, then the accumulation method can classify documents using the same algorithm. Therefore, we classified English, Japanese, Korean, and Chinese text documents. As a result, the highest macro-averaged F-measures of the proposed method were 94.5% for the English Reuters-21578, 88.5% for the Japanese CD-Mainichi 2002 data set, 90.2% for the Korean Hankyoreh 2008 data set, and 92.6% for the People's Daily 2009-2010 data set. Thus, we obtained good results for these languages. Moreover, we were able to construct a mathematical model of the accumulation method and were able to clarify the mathematical meaning.

  • Research Products

    (11 results)

All 2013 2012 2011 2010 Other

All Journal Article (2 results) Presentation (7 results) Book (1 results) Remarks (1 results)

  • [Journal Article] 任意の外部記憶容量で動作するマージソート2013

    • Author(s)
      山岸直秀,鈴木誠,渡辺重佳
    • Journal Title

      電子情報通信学会論文誌

      Volume: Vol.J96-D,No.3 Pages: 441-451

  • [Journal Article] 高次元ベクトル空間モデルによるテキスト分類問題について-分類性能と距離構造の漸近解析-2010

    • Author(s)
      後藤正幸,石田崇,鈴木誠,平澤茂一
    • Journal Title

      日本経営工学会論文誌

      Volume: Vol.61,No.3 Pages: 97-106

  • [Presentation] 単語N-gramと文字N-gramを用いた日本語の文書分類に関する一考察2012

    • Author(s)
      鈴木誠,山岸直秀
    • Organizer
      第35回情報理論とその応用シンポジウム予稿集
    • Place of Presentation
      大分
    • Year and Date
      20120000
  • [Presentation] English and Japanese Text Categorization Using Word and Character N-grams2012

    • Author(s)
      M.Suzuki, N.Yamagishi, Y.C.Tsai and M.Goto
    • Organizer
      Proc. of Asia Pacific Industrial Engineering and Management Systems Conference (APIEMS2012)
    • Place of Presentation
      タイ
    • Year and Date
      20120000
  • [Presentation] Chinese Text Categorization Using the Character N-gram2012

    • Author(s)
      M.Suzuki, N.Yamagishi and Y.C.Tsai
    • Organizer
      Proc. of International Symposium on Information Theory and its Applications (ISITA 2012)
    • Place of Presentation
      アメリカ合衆国
    • Year and Date
      20120000
  • [Presentation] 状態遷移モデルへの適応による教師なし単語分割手法の提案2012

    • Author(s)
      山岸直秀,鈴木誠,渡辺重佳
    • Organizer
      日本経営工学会西関東支部第12回学生論文発表会予稿集
    • Place of Presentation
      早稲田大学
    • Year and Date
      20120000
  • [Presentation] Korean Text Categorization Using the Character N-gram2011

    • Author(s)
      M.Suzuki, N.Yamagishi and M.Goto
    • Organizer
      Proc. of International Conference on Information Technology and Applications (ICITA 2011)
    • Place of Presentation
      オーストラリア
    • Year and Date
      20110000
  • [Presentation] English And Taiwanese Text Categorization Using N-gram Based on Vector Space Model2010

    • Author(s)
      M.Suzuki, N.Yamagishi, Y.C.Tsai, T.Ishida and M.Goto
    • Organizer
      Proc. of International Symposium on Information Theory and itsApplications (ISITA 2010)
    • Place of Presentation
      台湾
    • Year and Date
      20100000
  • [Presentation] On a New Model for Automatic Text Categorization Based on Vector Space Model2010

    • Author(s)
      M.Suzuki, N.Yamagishi, T.Ishida, M.Goto and S.Hirasawa
    • Organizer
      Proc. of IEEE International Conference on Systems, Man, and Cybernetics 2010 (SMC 2010)
    • Place of Presentation
      トル コ
    • Year and Date
      20100000
  • [Book] 確率統計学2010

    • Author(s)
      須子統太,鈴木誠,浮田善文,小林学,後藤正幸
    • Publisher
      オーム社
  • [Remarks]

    • URL

      http://www.info.shonan-it.ac.jp/suzuki-lab/profile.html

URL: 

Published: 2014-08-29  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi