• Search Research Projects
  • Search Researchers
  • How to Use
  1. Back to previous page

Development of evaluation techniques for machine learning systems for software bug prediction

Research Project

Project/Area Number 20K11749
Research Category

Grant-in-Aid for Scientific Research (C)

Allocation TypeMulti-year Fund
Section一般
Review Section Basic Section 60050:Software-related
Research InstitutionOkayama University

Principal Investigator

Monden Akito  岡山大学, 環境生命自然科学学域, 教授 (80311786)

Project Period (FY) 2020-04-01 – 2024-03-31
Project Status Completed (Fiscal Year 2023)
Budget Amount *help
¥4,290,000 (Direct Cost: ¥3,300,000、Indirect Cost: ¥990,000)
Fiscal Year 2023: ¥1,170,000 (Direct Cost: ¥900,000、Indirect Cost: ¥270,000)
Fiscal Year 2022: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2021: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Fiscal Year 2020: ¥1,040,000 (Direct Cost: ¥800,000、Indirect Cost: ¥240,000)
Keywordsソフトウェア開発データ / ソフトウェアバグ予測 / ソフトウェアメトリクス / 機械学習 / 評価尺度 / 生成系AI / AIチャットボット / ChatGPT / データ品質 / データ矛盾 / データ生成 / データ品質評価 / 交差検証 / 機械学習システム
Outline of Research at the Start

機械学習システムの評価においては,機械学習の元となる(1)学習データ自体の品質の評価,および,(2)多様な入力に対するシステム出力の性能評価が重要となる.(1)については,本研究では,従来注目されてこなかった,学習データに含まれる「矛盾」に着目して学習データを評価する方法の開発を目指す. (2)については,従来,機械学習システムの評価のためによく用いられている「交差検証」の拡張として,MAHAKILオーバーサンプリング法と機密データ模倣技術を応用する方法の開発を目指す.

Outline of Final Research Achievements

In the evaluation of machine learning systems, it is important to (1) evaluate the quality of training data and (2) evaluate the performance of system output. For (1), we defined a data inconsistency measure, Similar Case Inconsistency Level (SCIL). Through evaluation experiments, we showed that the less inconsistent the dataset is, the better the prediction performance of the resulting machine learning model tends to be. For (2), we defined the expected values of performance measures for a two-class classification problem based on the neg/pos ratio of the dataset. Application experiments showed that there are cases in which conventional evaluation measures cannot correctly evaluate the prediction performance, indicating the usefulness of the proposed measures.

Academic Significance and Societal Importance of the Research Achievements

本研究の成果によって,ソフトウェア開発データを対象とした機械学習システムにおいて,学習データを事前に評価すること,および,性能評価をより適切に行うことが可能となり,ソフトウェア工学分野のさらなる発展に寄与できると期待される.また,提案方法は,機械学習を利用する様々な分野への応用が期待される.

Report

(5 results)
  • 2023 Annual Research Report   Final Research Report ( PDF )
  • 2022 Research-status Report
  • 2021 Research-status Report
  • 2020 Research-status Report
  • Research Products

    (12 results)

All 2023 2022 2021 2020

All Journal Article (6 results) (of which Peer Reviewed: 5 results,  Open Access: 3 results) Presentation (6 results) (of which Int'l Joint Research: 1 results)

  • [Journal Article] Outlier Removal Based on Third-Party Data in Fault-prone Module Prediction2023

    • Author(s)
      西浦 生成、門田 暁人
    • Journal Title

      Computer Software

      Volume: 40 Issue: 4 Pages: 4_22-4_28

    • DOI

      10.11309/jssst.40.4_22

    • ISSN
      0289-6540
    • Year and Date
      2023-10-25
    • Related Report
      2023 Annual Research Report
    • Peer Reviewed / Open Access
  • [Journal Article] Improvement and Evaluation of Data Consistency Metric CIL for Software Engineering Data Sets2022

    • Author(s)
      Maohua Gan, Zeynep Yucel, Akito Monden
    • Journal Title

      IEEE Access

      Volume: 10 Pages: 70053-70067

    • DOI

      10.1109/access.2022.3188246

    • Related Report
      2022 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] Neg/pos-Normalized Accuracy Measures for Software Defect Prediction2022

    • Author(s)
      Maohua Gan, Zeynep Yucel, Akito Monden
    • Journal Title

      IEEE Access

      Volume: 10 Pages: 134580-134591

    • DOI

      10.1109/access.2022.3232144

    • Related Report
      2022 Research-status Report
    • Peer Reviewed / Open Access
  • [Journal Article] ソフトウェア開発工数予測におけるauto-sklearnの適用2021

    • Author(s)
      田中 和也, 門田 暁人, Zeynep Yucel
    • Journal Title

      コンピュータソフトウェア

      Volume: 38

    • NAID

      130008132029

    • Related Report
      2021 Research-status Report
    • Peer Reviewed
  • [Journal Article] Association Metrics Between Two Continuous Variables for Software Project Data2021

    • Author(s)
      Takumi Kanehira, Akito Monden, Zeynep Yucel
    • Journal Title

      Proc. 22nd IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing

      Volume: 1 Pages: 1-6

    • Related Report
      2021 Research-status Report
  • [Journal Article] A Novel Approach to Address External Validity Issues in Fault Prediction Using Bandit Algorithms2021

    • Author(s)
      Teruki Hayakawa, Masateru Tsunoda, Koji Toda, Keitaro Nakasai, Amjed Tahir, Kwabena Ebo Bennin, Akito Monden, and Kenichi Matsumoto
    • Journal Title

      IEICE Transactions on Information and Systems

      Volume: E104-D Pages: 327-331

    • NAID

      130007979426

    • Related Report
      2020 Research-status Report
    • Peer Reviewed
  • [Presentation] BERTによるセキュリティバグの判別の試み2023

    • Author(s)
      横山 大貴,西浦 生成,門田 暁人
    • Organizer
      ソフトウェア工学の基礎ワークショップFOSE2023
    • Related Report
      2023 Annual Research Report
  • [Presentation] A Cost-Effectiveness Metric for Association Rule Mining in Software Defect Prediction2023

    • Author(s)
      Kinari Nishiura, Takeki Kasagi, Akito Monden
    • Organizer
      2023 Congress in Computer Science, Computer Engineering & Applied Computing (CSCE)
    • Related Report
      2023 Annual Research Report
  • [Presentation] A Dynamic Model Selection Approach to Mitigate the Change of Balance Problem in Cross-Version Bug Prediction2022

    • Author(s)
      Hiroshi Demanou, Akito Monden, Masateru Tsunoda
    • Organizer
      10th International Workshop on Quantitative Approaches to Software
    • Related Report
      2022 Research-status Report
    • Int'l Joint Research
  • [Presentation] データ断片からのソフトウェア開発データ復元の実験評価2022

    • Author(s)
      西脇将樹, 門田暁人, 笹倉万里子, 西浦生成
    • Organizer
      電子情報通信学会ソフトウェアサイエンス研究会
    • Related Report
      2021 Research-status Report
  • [Presentation] データ断片からのソフトウェア開発データの復元の試み2020

    • Author(s)
      西脇 将樹, 門田 暁人
    • Organizer
      第27回ソフトウェア工学の基礎ワークショップ
    • Related Report
      2020 Research-status Report
  • [Presentation] ソフトウェア開発工数予測におけるデータスムージングの実験的評価2020

    • Author(s)
      伊永 健人, 門田 暁人
    • Organizer
      第27回ソフトウェア工学の基礎ワークショップ
    • Related Report
      2020 Research-status Report

URL: 

Published: 2020-04-28   Modified: 2025-01-30  

Information User Guide FAQ News Terms of Use Attribution of KAKENHI

Powered by NII kakenhi