• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Development of CEFR Can-do Language Learning Materials by FS2vec Processing of Large-scale Spoken Language Corpus

Research Project

Project/Area Number 15H02794
Research Category

Grant-in-Aid for Scientific Research (B)

Allocation TypeSingle-year Grants
Section一般
Research Field Learning support system
Research InstitutionTokyo University of Foreign Studies

Principal Investigator

Mochizuki Hajime  東京外国語大学, 大学院総合国際学研究院, 准教授 (70313707)

Co-Investigator(Kenkyū-buntansha) 芝野 耕司  東京外国語大学, その他部局等, 名誉教授 (50216024)
佐野 洋  東京外国語大学, 大学院総合国際学研究院, 教授 (30282776)
藤村 知子  東京外国語大学, 大学院国際日本学研究院, 教授 (20229040)
Project Period (FY) 2015-04-01 – 2019-03-31
Project Status Completed (Fiscal Year 2018)
Budget Amount *help
¥15,340,000 (Direct Cost: ¥11,800,000、Indirect Cost: ¥3,540,000)
Fiscal Year 2018: ¥3,120,000 (Direct Cost: ¥2,400,000、Indirect Cost: ¥720,000)
Fiscal Year 2017: ¥3,640,000 (Direct Cost: ¥2,800,000、Indirect Cost: ¥840,000)
Fiscal Year 2016: ¥3,640,000 (Direct Cost: ¥2,800,000、Indirect Cost: ¥840,000)
Fiscal Year 2015: ¥4,940,000 (Direct Cost: ¥3,800,000、Indirect Cost: ¥1,140,000)
Keywords学習コンテンツ開発支援 / eラーニング / 日本語教育 / 自然言語処理 / Formulaic Sequences / Formulaic Sequence / 学習コンテンツ開発
Outline of Final Research Achievements

We developed a method for extracting formulaic sequences from Japanese closed caption TV Corpus. In this research we extract signifiant n-grams as candidates for formulaic sequences of continuous words from a CCTV corpus. To calculate n-gram frequencies we developed programs to sort, marge, and count based on the MapReduce algorithm. We examined clustering of discourse segments by topics and scenes and confirmed the existence of suitable can-do statements for them. We have been continuing to build the CCTV corpus.
The total number of words in our corpus has reached over 1,300 million morphemes. Regarding the research results, we presented peer-reviewed papers mainly on international academic societies such as AAAL, EDMEDIA, and E-Learn.

Academic Significance and Societal Importance of the Research Achievements

これまで存在していなかった大規模な日本語会話コーパスの構築を続け,6年以上にわたる日本のテレビ番組の字幕データを整備した。規模は35万番組,1億2千4百万文,13億3千6百万語超に達した。この大規模なコーパスから,日本語学習教材にも応用できる特別な意味を持つ複数単語のまとまりであるFormulaic Sequence(定型表現)を大量に抽出した。定型表現を核にして,コーパス内の会話セグメントを取り出し,セグメント内の定型表現が表す機能と,各セグメントの話題,場面をCan-doと対応づけることで有益な教材が作成できることを確認した。

Report

(5 results)
  • 2018 Annual Research Report   Final Research Report ( PDF )
  • 2017 Annual Research Report
  • 2016 Annual Research Report
  • 2015 Annual Research Report
  • Research Products

    (25 results)

All 2019 2018 2017 2016 2015

All Journal Article (1 results) (of which Peer Reviewed: 1 results,  Acknowledgement Compliant: 1 results) Presentation (24 results) (of which Int'l Joint Research: 20 results,  Invited: 1 results)

  • [Journal Article] Re-Mining Topics Popular in the Recent Past from a Large-Scale Closed Caption TV Corpus2015

    • Author(s)
      Hajme Mochizuki and Kohji Shibano
    • Journal Title

      International Joural of Future Computer and Communication

      Volume: 4 Pages: 98-103

    • Related Report
      2015 Annual Research Report
    • Peer Reviewed / Acknowledgement Compliant
  • [Presentation] Investigation of Words in Japanese Closed Caption TV Corpus2019

    • Author(s)
      Hajime Mochizuki
    • Organizer
      STEM & STEAM Education Conference, 2019
    • Related Report
      2018 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Analyzing Usefulness of Dialogues from Closed Caption TV Corpus as an Example of Can-do Statements for Language Learnin2018

    • Author(s)
      Hajime Mochizuki and Kohji Shibano
    • Organizer
      2018 Hawaii University Conference, Arts, Humanities, Social Sciences & Education (AHSE)
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Modification of word2vec by Formulaic Sequences and Extraction of Useful Expressions for Language Learning from Closed Caption TV Corpus2017

    • Author(s)
      Hajime Mochizuki and Kohji Shibano
    • Organizer
      The IAFOR International Conference on Language Learning Hawaii
    • Place of Presentation
      Honolulu, USA
    • Year and Date
      2017-01-08
    • Related Report
      2016 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Developing Intimacy by Style-shifting in Japanese: A TV Subtitle Corpus-based Study2017

    • Author(s)
      XIAO Tingting and Kohji Shibano
    • Organizer
      The 2017 conference of the American Association for Applied Linguistics (AAAL 2017)
    • Related Report
      2017 Annual Research Report 2016 Annual Research Report
    • Int'l Joint Research
  • [Presentation] The Acquisition of a Japanese Practical Formulaic Sequences List from a Closed Caption TV Corpus2017

    • Author(s)
      Hajime Mochizuki and Kohji Shibano
    • Organizer
      Hawaii University Conferences, STAM/STEAM Education Conference
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Augmented Reality Applications for Multilingual Learning with Intuitive Understanding2017

    • Author(s)
      Hajime Mochizuki
    • Organizer
      World Conference on Educational Media and Technology (EDMEDIA) 2017
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Analyzing formulaic sequences in spoken Japanese from a large Japanese TV closed caption corpus2017

    • Author(s)
      Kohji Shibano
    • Organizer
      The 18th World Congress of Applied Linguistics (AILA 2017)
    • Related Report
      2017 Annual Research Report 2016 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Discourse Segment Clustering with Word Embedding based on Formulaic Sequences for Language Education2017

    • Author(s)
      Hajime Mochizuki and Kohji Shibano
    • Organizer
      2017 International Conference on Education and Multimedia Technology (ICEMT 2017)
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Building a Very Large Spoken Language Corpus from Closed Caption TV and Extracting Practical Formulaic Sequences for Language Learning2017

    • Author(s)
      Hajime Mochizuki
    • Organizer
      The 10th International Conference on Advanced Computer Theory and Engineering
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research / Invited
  • [Presentation] Searching Discourse Segments for Formulaic Sequences in a Closed Caption TV Corpus for Language Learning2017

    • Author(s)
      Hajime Mochizuki and Kohji Shibano
    • Organizer
      World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2017
    • Related Report
      2017 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Extracting Formulaic Sequences Containing Useful Expressions for Language Learning from Closed Caption TV Corpus2016

    • Author(s)
      Hajime Mochizuki and Kohji Shibano
    • Organizer
      World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education, E-Learn 2016
    • Place of Presentation
      Alexandria, USA
    • Year and Date
      2016-11-14
    • Related Report
      2016 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Development of a Closed Caption TV Corpus Retrieval System for Language Learning2016

    • Author(s)
      Hajime Mochizuki
    • Organizer
      8th International Conference on Education Technology and Computers (ICETC 2016)
    • Place of Presentation
      Singapore
    • Year and Date
      2016-09-28
    • Related Report
      2016 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Straightforward Expansion of word2vec by Formulaic Sequences in CCTV corpus2016

    • Author(s)
      Hajime Mochizuki
    • Organizer
      Nineth International Conference on Advanced Computer Theory and Engineering, ICACTE 2016
    • Place of Presentation
      Hong Kong
    • Year and Date
      2016-08-19
    • Related Report
      2016 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Development of AR Materials for Understanding Roles of Japanese Particles2016

    • Author(s)
      Hajime Mochizuki
    • Organizer
      2016 STEM & STEAM Education Conference
    • Place of Presentation
      Honolulu, USA
    • Year and Date
      2016-06-10
    • Related Report
      2016 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Japanese Language Learning System for Understanding a Sentence that has Correct Syntax but has Semantic Errors2016

    • Author(s)
      Hajime Mochizuki
    • Organizer
      the 2nd International Conference on Information Technology (ICIT 2016)
    • Place of Presentation
      Melbourne, Australia
    • Year and Date
      2016-03-03
    • Related Report
      2015 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Analyzing Attractiveness of Specific Location Names of Tourist Destination from a Closed Caption TV Corpus2016

    • Author(s)
      Hajime Mochizuki and Kohji Shibano
    • Organizer
      Hawaii University Conferences, Arts, Humanities, Social Sciences & Education (AHSE)
    • Place of Presentation
      Hawaii, USA
    • Year and Date
      2016-01-08
    • Related Report
      2015 Annual Research Report
    • Int'l Joint Research
  • [Presentation] 事態把握の違いを利用した語学教材の提案(2)2016

    • Author(s)
      佐野洋
    • Organizer
      第135回CE研究発表会, 情報処理学会
    • Place of Presentation
      信州大学,長野県
    • Related Report
      2016 Annual Research Report
  • [Presentation] 事態把握の違いを用いた語学学習法(2)2016

    • Author(s)
      佐野洋
    • Organizer
      思考と言語研究会 (TL),電子通信学会
    • Place of Presentation
      早稲田大学,東京都
    • Related Report
      2016 Annual Research Report
  • [Presentation] 事態把握の違いを用いた語学学習法(3)2016

    • Author(s)
      佐野洋
    • Organizer
      思考と言語研究会 (TL),電子通信学会
    • Place of Presentation
      ポートアイランド,兵庫県
    • Related Report
      2016 Annual Research Report
  • [Presentation] 事態把握の違いを利用した語学教材の提案(3)2016

    • Author(s)
      佐野洋
    • Organizer
      第136回CE研究発表会,情報処理学会
    • Place of Presentation
      長崎県立大学シーボルト校,長崎県
    • Related Report
      2016 Annual Research Report
  • [Presentation] Detecting Topics Popular in the Recent Past from a Closed Caption TV Corpus as a Categorized Chronicle data2015

    • Author(s)
      Hajime Mochizuki and Kohji Shibano
    • Organizer
      the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (KMIS)
    • Place of Presentation
      Lisbon, Portgal
    • Year and Date
      2015-11-12
    • Related Report
      2015 Annual Research Report
    • Int'l Joint Research
  • [Presentation] 日本語話し言葉コーパスの構築と会話用例検索システム2015

    • Author(s)
      芝野耕司
    • Organizer
      6th CASTEL/J Hawaii 2015
    • Place of Presentation
      Hawaii, USA
    • Year and Date
      2015-08-07
    • Related Report
      2015 Annual Research Report
    • Int'l Joint Research
  • [Presentation] A Quantitative Formulaic Analysis of Large TV Closed Caption Corpus – Pragmatic Use of Utterance End in Japanese Animation Languages2015

    • Author(s)
      Kohji Shibano
    • Organizer
      14th International Pragmatics Conference
    • Place of Presentation
      Antwerp Belgium
    • Year and Date
      2015-07-26
    • Related Report
      2015 Annual Research Report
    • Int'l Joint Research
  • [Presentation] Development of a Closed Caption TV Corpus Retrieval System to Seek Video Scenes Containing Useful Expressions for Language Learning2015

    • Author(s)
      Hajime Mochizuki and Kohji Shibano
    • Organizer
      World Conference on Educational Media and Technology (EDMEDIA)
    • Place of Presentation
      Montreal, Canada
    • Year and Date
      2015-06-22
    • Related Report
      2015 Annual Research Report
    • Int'l Joint Research

URL: 

Published: 2015-04-16   Modified: 2020-03-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi