• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Text Mining for Languages of All Ages and Countries

Research Project

Project/Area Number 22500140
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeSingle-year Grants
Section一般
Research Field Intelligent informatics
Research InstitutionShonan Institute of Technology

Principal Investigator

SUZUKI Makoto  湘南工科大学, 工学部, 准教授 (80339796)

Co-Investigator(Renkei-kenkyūsha) OHSUGA Akihiko  電気通信大学, 大学院・情報システム学研究科, 教授 (90393842)
GOTO Masayuki  早稲田大学, 創造理工学部・経営システム工学科, 教授 (40287967)
SUKO Tota  早稲田大学, メディアネットワークセンター, 助教 (40409660)
Project Period (FY) 2010 – 2012
Project Status Completed (Fiscal Year 2012)
Budget Amount *help
¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000)
Fiscal Year 2012: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Fiscal Year 2011: ¥1,300,000 (Direct Cost: ¥1,000,000、Indirect Cost: ¥300,000)
Fiscal Year 2010: ¥1,430,000 (Direct Cost: ¥1,100,000、Indirect Cost: ¥330,000)
Keywords多言語処理 / 機械学習 / モデル化 / 文書自動分類 / N-gram / テキストマイニング
Research Abstract

We proposed the accumulation method, which is a language-independent text classification method that is based on the character N-gram. The accumulation method does not depend on the language structure, because this method uses the character N-gram to form index terms. If text documents are expressed in Unicode, then the accumulation method can classify documents using the same algorithm. Therefore, we classified English, Japanese, Korean, and Chinese text documents. As a result, the highest macro-averaged F-measures of the proposed method were 94.5% for the English Reuters-21578, 88.5% for the Japanese CD-Mainichi 2002 data set, 90.2% for the Korean Hankyoreh 2008 data set, and 92.6% for the People's Daily 2009-2010 data set. Thus, we obtained good results for these languages. Moreover, we were able to construct a mathematical model of the accumulation method and were able to clarify the mathematical meaning.

Report

(4 results)
  • 2012 Annual Research Report   Final Research Report ( PDF )
  • 2011 Annual Research Report
  • 2010 Annual Research Report
  • Research Products

    (23 results)

All 2013 2012 2011 2010 Other

All Journal Article (8 results) (of which Peer Reviewed: 6 results) Presentation (9 results) Book (2 results) Remarks (4 results)

  • [Journal Article] 任意の外部記憶容量で動作するマージソート2013

    • Author(s)
      山岸直秀,鈴木誠,渡辺重佳
    • Journal Title

      電子情報通信学会論文誌

      Volume: Vol.J96-D,No.3 Pages: 441-451

    • NAID

      110009593013

    • Related Report
      2012 Annual Research Report 2012 Final Research Report
  • [Journal Article] Chinese Text Categorization Using the Character N-gram2012

    • Author(s)
      M.Suzuki, N.Yamagishi and Y.C.Tsai
    • Journal Title

      Proc. of International Symposium on Information Theory and its Applications

      Volume: ISITA 2012 Pages: 722-726

    • Related Report
      2012 Annual Research Report
    • Peer Reviewed
  • [Journal Article] English and Japanese Text Categorization Using Word and Character N-grams2012

    • Author(s)
      M.Suzuki, N.Yamagishi, Y.C.Tsai and M.Goto
    • Journal Title

      Proc. of Asia Pacific Industrial Engineering and Management Systems Conference

      Volume: APIEMS 2012 Pages: 715-722

    • Related Report
      2012 Annual Research Report
    • Peer Reviewed
  • [Journal Article] Korean Text Categorization Using the Character N-gram2011

    • Author(s)
      M.Suzuki, N.Yamagishi, M.Goto
    • Journal Title

      Proc.of International Conference on Information Technology and Applications (ICITA 2011)

      Pages: 197-202

    • Related Report
      2011 Annual Research Report
    • Peer Reviewed
  • [Journal Article] 高次元ベクトル空間モデルによるテキスト分類問題について-分類性能と距離構造の漸近解析-2010

    • Author(s)
      後藤正幸,石田崇,鈴木誠,平澤茂一
    • Journal Title

      日本経営工学会論文誌

      Volume: Vol.61,No.3 Pages: 97-106

    • Related Report
      2012 Final Research Report
  • [Journal Article] 高次元ベクトル空間モデルによるテキスト分類問題について-分類性能と距離構造の漸近解析-2010

    • Author(s)
      後藤正幸, 石田崇, 鈴木誠, 平澤茂一
    • Journal Title

      日本経営工学会論文誌

      Volume: Vol.61 Pages: 97-106

    • Related Report
      2010 Annual Research Report
    • Peer Reviewed
  • [Journal Article] On a New Model for Automatic Text Categorization Based on Vector Space Model2010

    • Author(s)
      M.Suzuki, N.Yamagishi, T.Ishida, M.Goto, S.Hirasawa
    • Journal Title

      Proc.of IEEE International Conference on Systems, Man, and Cybernetics 2010

      Pages: 3152-3159

    • Related Report
      2010 Annual Research Report
    • Peer Reviewed
  • [Journal Article] English And Taiwanese Text Categorization Using N-gram Based on Vector Space Model2010

    • Author(s)
      M.Suzuki, N.Yamagishi, Y.C.Tsai, T.Ishida, M.Goto
    • Journal Title

      Proc.of International Symposium on Information Theory and its Applications 2010

      Pages: 106-111

    • Related Report
      2010 Annual Research Report
    • Peer Reviewed
  • [Presentation] 状態遷移モデルへの適応による教師なし単語分割手法の提案2012

    • Author(s)
      山岸直秀, 鈴木誠, 渡辺重佳
    • Organizer
      日本経営工学会西関東支部第12回学生論文発表会
    • Place of Presentation
      早稲田大学
    • Year and Date
      2012-02-19
    • Related Report
      2011 Annual Research Report
  • [Presentation] 単語N-gramと文字N-gramを用いた日本語の文書分類に関する一考察2012

    • Author(s)
      鈴木誠,山岸直秀
    • Organizer
      第35回情報理論とその応用シンポジウム予稿集
    • Place of Presentation
      大分
    • Related Report
      2012 Final Research Report
  • [Presentation] English and Japanese Text Categorization Using Word and Character N-grams2012

    • Author(s)
      M.Suzuki, N.Yamagishi, Y.C.Tsai and M.Goto
    • Organizer
      Proc. of Asia Pacific Industrial Engineering and Management Systems Conference (APIEMS2012)
    • Place of Presentation
      タイ
    • Related Report
      2012 Final Research Report
  • [Presentation] Chinese Text Categorization Using the Character N-gram2012

    • Author(s)
      M.Suzuki, N.Yamagishi and Y.C.Tsai
    • Organizer
      Proc. of International Symposium on Information Theory and its Applications (ISITA 2012)
    • Place of Presentation
      アメリカ合衆国
    • Related Report
      2012 Final Research Report
  • [Presentation] 状態遷移モデルへの適応による教師なし単語分割手法の提案2012

    • Author(s)
      山岸直秀,鈴木誠,渡辺重佳
    • Organizer
      日本経営工学会西関東支部第12回学生論文発表会予稿集
    • Place of Presentation
      早稲田大学
    • Related Report
      2012 Final Research Report
  • [Presentation] Korean Text Categorization Using the Character N-gram2011

    • Author(s)
      M.Suzuki, N.Yamagishi and M.Goto
    • Organizer
      Proc. of International Conference on Information Technology and Applications (ICITA 2011)
    • Place of Presentation
      オーストラリア
    • Related Report
      2012 Final Research Report
  • [Presentation] English And Taiwanese Text Categorization Using N-gram Based on Vector Space Model2010

    • Author(s)
      M.Suzuki, N.Yamagishi, Y.C.Tsai, T.Ishida and M.Goto
    • Organizer
      Proc. of International Symposium on Information Theory and itsApplications (ISITA 2010)
    • Place of Presentation
      台湾
    • Related Report
      2012 Final Research Report
  • [Presentation] On a New Model for Automatic Text Categorization Based on Vector Space Model2010

    • Author(s)
      M.Suzuki, N.Yamagishi, T.Ishida, M.Goto and S.Hirasawa
    • Organizer
      Proc. of IEEE International Conference on Systems, Man, and Cybernetics 2010 (SMC 2010)
    • Place of Presentation
      トル コ
    • Related Report
      2012 Final Research Report
  • [Presentation] 単語N-gramと文字N-gramを用いた日本語の文書分類に関する一考察

    • Author(s)
      鈴木誠, 山岸直秀
    • Organizer
      第35回情報理論とその応用シンポジウム
    • Place of Presentation
      別府湾ロイヤルホテル
    • Related Report
      2012 Annual Research Report
  • [Book] 確率統計学2010

    • Author(s)
      須子統太,鈴木誠,浮田善文,小林学,後藤正幸
    • Publisher
      オーム社
    • Related Report
      2012 Final Research Report
  • [Book] 確率統計学2010

    • Author(s)
      須子統太, 鈴木誠, 浮田善文, 小林学, 後藤正幸
    • Total Pages
      251
    • Publisher
      オーム社
    • Related Report
      2010 Annual Research Report
  • [Remarks]

    • URL

      http://www.info.shonan-it.ac.jp/suzuki-lab/profile.html

    • Related Report
      2012 Final Research Report
  • [Remarks] 鈴木研究室

    • URL

      http://www.info.shonan-it.ac.jp/suzuki-lab/profile.html

    • Related Report
      2012 Annual Research Report
  • [Remarks]

    • URL

      http://www.info.shonan-it.ac.jp/suzuki-lab/profile.html

    • Related Report
      2011 Annual Research Report
  • [Remarks]

    • URL

      http://www.info.shonan-it.ac.jp/suzuki-lab/profile.html

    • Related Report
      2010 Annual Research Report

URL: 

Published: 2010-08-23   Modified: 2019-07-29  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi