• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Machine Learning Methods for Cost Reduction in Label Collection by Crowdsourcing

Research Project

Project/Area Number 19K20277
Research Category

Grant-in-Aid for Early-Career Scientists

Allocation TypeMulti-year Fund
Review Section Basic Section 60080:Database-related
Research InstitutionUniversity of Yamanashi

Principal Investigator

Li Jiyi  山梨大学, 大学院総合研究部, 助教 (30726667)

Project Period (FY) 2019-04-01 – 2022-03-31
Project Status Completed (Fiscal Year 2021)
Budget Amount *help
¥4,160,000 (Direct Cost: ¥3,200,000、Indirect Cost: ¥960,000)
Fiscal Year 2021: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Fiscal Year 2020: ¥520,000 (Direct Cost: ¥400,000、Indirect Cost: ¥120,000)
Fiscal Year 2019: ¥2,080,000 (Direct Cost: ¥1,600,000、Indirect Cost: ¥480,000)
Keywordsクラウドソーシング / ラベル付与 / コスト削減 / 機械学習
Outline of Research at the Start

本研究は, クラウドソーシングサービスを利用した大規模データへの高精度ラベル付与タスクにおいて, ラベル付与が必要なデータとワーカーの特質に着目することにより, サービス利用時のコストを抑えることが可能な機械学習法を提案する. 本研究は, 大規模データと多数のラベルを対象としたラベル付与においてトレードオフの関係にあるコスト削減と品質向上を同時に目指す点が挑戦的であり, 独自性がある. 本研究の成果は, 近年脚光を浴びている深層学習などの教師付き機械学習において本質的な問題である学習データの作成に直接貢献することから, 産業界における多様な分野での人工知能技術の実用化と進展が期待できる.

Outline of Final Research Achievements

The objective of this study is to propose machine learning methods that can reduce the cost of using the crowdsourcing service in the task of accurately annotating large-scale data for various media processing, such as text and images. We proposed methods for disambiguating label assignment by refining data collected through crowdsourcing. We proposed methods to improve data quality by selecting instances and workers. In order to build models in various media, by incorporating the content of the instances, we extended the methods of answer aggregation with categorical labels so that it can handle diverse data types such as sequences. We have published 8 papers at international conferences including the top international conferences on artificial intelligence such as IJCAI, WWW, SIGIR, and MM.

Academic Significance and Societal Importance of the Research Achievements

本研究は,大規模データと多数のラベルを対象としたラベル付与においてトレードオフ関係にあるコスト削減と品質向上を同時に目指す点が挑戦的であり,独自性がある.テキストや画像など実用レベルで利用可能な機械学習モデルを提案することであり,ペアワイズラベル及びシーケンスラベルへの拡張にも挑戦する.ラベル付与で生じる問題点は,機械学習及び自然言語処理など人工知能分野にも還元することができることから,学術的意義は極めて大きい.近年脚光を浴びている深層学習などの教師付き機械学習において本質的な問題である学習データの作成に直接貢献することから,産業界における多様な分野での人工知能技術の実用化と進展が期待できる.

Report

(4 results)
  • 2021 Annual Research Report   Final Research Report ( PDF )
  • 2020 Research-status Report
  • 2019 Research-status Report
  • Research Products

    (12 results)

All 2022 2021 2020 2019 Other

All Presentation (9 results) (of which Int'l Joint Research: 6 results) Remarks (3 results)

  • [Presentation] Context-based Collective Preference Aggregation for Prioritizing Crowd Opinions in Social Decision-making2022

    • Author(s)
      Jiyi Li
    • Organizer
      Proceedings of the ACM Web Conference 2022 (WWW 2022)
    • Related Report
      2021 Annual Research Report
  • [Presentation] Label Aggregation for Crowdsourced Triplet Similarity Comparisons2021

    • Author(s)
      Jiyi Li, Lucas Ryo Endo and Hisashi Kashima
    • Organizer
      Proceedings of the 28th International Conference on Neural Information Processing (ICONIP 2021)
    • Related Report
      2021 Annual Research Report
  • [Presentation] 画像におけるゲームによる多様・信頼的・効率的な感情アノテーション2021

    • Author(s)
      左幸坤, 李吉屹, 茅暁陽
    • Organizer
      第13回データ工学と情報マネジメントに関するフォーラム (DEIM2021)
    • Related Report
      2020 Research-status Report
  • [Presentation] Performance as a Constraint: An Improved Wisdom of Crowds Using Performance Regularization2020

    • Author(s)
      Jiyi Li, Yasushi Kawase, Yukino Baba, Hisashi Kashima
    • Organizer
      Proceeding of the 29th International Joint Conference on Artificial Intelligence (IJCAI 2020)
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] Crowdsourced Text Sequence Aggregation based on Hybrid Reliability and Representation2020

    • Author(s)
      Jiyi Li
    • Organizer
      Proceeding of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020)
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] AffectI: A Game for Diverse, Reliable, and Efficient Affective Image Annotation2020

    • Author(s)
      Xingkun Zuo, Jiyi Li, Qili Zhou, Jianjun Li, Xiaoyang Mao
    • Organizer
      Proceeding of the 28th ACM International Conference on Multimedia (MM 2020)
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] CrowDEA: Multi-view Idea Prioritization with Crowds2020

    • Author(s)
      Yukino Baba, Jiyi Li, Hisashi Kashima
    • Organizer
      Proceeding of the eighth AAAI Conference on Human Computation and Crowdsourcing (HCOMP 2020)
    • Related Report
      2020 Research-status Report
    • Int'l Joint Research
  • [Presentation] A Dataset of Crowdsourced Word Sequences: Collections and Answer Aggregation for Ground Truth Creation2019

    • Author(s)
      Jiyi Li, Fumiyo Fukumoto
    • Organizer
      Workshop on Aggregating and analysing crowdsourced annotations for NLP (AnnoNLP 2019, conjunction with EMNLP-IJCNLP 2019)
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Presentation] Budget Cost Reduction for Label Collection with Confusability based Exploration2019

    • Author(s)
      Jiyi Li
    • Organizer
      the 26th International Conference on Neural Information Processing of the Asia-Pacific Neural Network Society (ICONIP 2019)
    • Related Report
      2019 Research-status Report
    • Int'l Joint Research
  • [Remarks] クラウドソーシングによるトリプレットの類似性比較データセット(CrowdTSC2021)

    • URL

      https://github.com/garfieldpigljy/CrowdTSC2021

    • Related Report
      2021 Annual Research Report
  • [Remarks] クラウドソーシングの研究に関する私たちのチームの論文や公開されているデータセット

    • URL

      https://github.com/garfieldpigljy/ljycrowd

    • Related Report
      2021 Annual Research Report
  • [Remarks] Crowdsourced Word Sequence Aggregation 2019

    • URL

      https://github.com/garfieldpigljy/CrowdWSA2019

    • Related Report
      2019 Research-status Report

URL: 

Published: 2019-04-18   Modified: 2023-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi