Data balancing for regression using imbalanced dataset

Research Project

Project/Area Number	21K21297
Research Category	Grant-in-Aid for Research Activity Start-up
Allocation Type	Multi-year Fund
Review Section	1001:Information science, computer engineering, and related fields
Research Institution	Kyoto Tachibana University
Principal Investigator	Yoshikawa Hiroki 京都橘大学, 工学部, 助教R (10905350)
Project Period (FY)	2021-08-30 – 2023-03-31
Project Status	Completed (Fiscal Year 2022)
Budget Amount *help	¥3,120,000 (Direct Cost: ¥2,400,000、Indirect Cost: ¥720,000) Fiscal Year 2022: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000) Fiscal Year 2021: ¥1,560,000 (Direct Cost: ¥1,200,000、Indirect Cost: ¥360,000)
Keywords	機械学習 / 不均衡データ / データバランシング / 分類問題 / 回帰問題 / 生成モデル / 損失関数 / 敵対的生成ネットワーク / 高次元データ
Outline of Research at the Start	複雑な機械学習手法のパッケージ化が進んだことにより，実行エラーを伴わない問題が発生しても利用者は気づきにくい．不均衡データを用いた訓練による推定値の偏りはそのような問題の一つである．不均衡データを訓練に用いると本来少数派のデータに対しても多数派と推定しやすい推定器が訓練されることがある．本研究では連続値を推定する回帰問題に対し，一般的には分類問題で用いられるデータバランシングを応用し，推定値が偏る問題の解決を目指す．データバランシングはデータ生成等によりデータの分布の均衡を取る手法である．本研究では深層学習に基づく生成モデルを用いることで時系列や画像等の高次元データに対応した手法を提案する．
Outline of Final Research Achievements	We propose methods to address the imbalance of estimated values in regression and classification problems, respectively. The first method is a data balancing technique for regression problems using time series data as explanatory variables. This method generates new samples by interpolating time series data from two extracted samples in the dataset. Through performance evaluation, we found that it is possible to improve the estimation accuracy for minority data while suppressing the increase in mean absolute error. The second method is a data balancing technique for classification problems using conditional generative adversarial networks. Through performance evaluation using open datasets, we found that the proposed method achieved training a well-balanced estimator.
Academic Significance and Societal Importance of the Research Achievements	利用者が気づきにくい不均衡データによる推定値の偏りを軽減する手法を提案し，様々な機械学習との組み合わせ・応用を可能とする点が本研究の社会的意義である．特に近年ではセンシングデバイスの小型化・低価格化が進み，機械学習の科学・医療など様々な分野への応用手法が開発されていることから，今後ますますモバイル・ユビキタス分野において機械学習は利用されることが予想される．そのような応用事例において本研究は大きな役割を果たすと申請者は考える．

Report

(3 results)

2022 Annual Research Report Final Research Report ( PDF )
2021 Research-status Report

Research Products
(4 results)

All 2022 2021

All Presentation (4 results) (of which Int'l Joint Research: 3 results)

[Presentation] Privacy-preserving data augmentation for thermal sensation dataset based on variational autoencoder2022
- Author(s)
  Hiroki Yoshikawa, Akira Uchiyama, Teruo Higashino
- Organizer
  The 9th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation
- Related Report
  2022 Annual Research Report
- Int'l Joint Research
[Presentation] Data Balancing for Thermal Comfort Datasets Using Conditional Wasserstein GAN with a Weighted Loss Function2021
- Author(s)
  Hiroki Yoshikawa, Akira Uchiyama, Teruo Higashino
- Organizer
  The 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation (BuildSys 2021) Workshops
- Related Report
  2021 Research-status Report
- Int'l Joint Research
[Presentation] Time-Series Physiological Data Balancing for Regression2021
- Author(s)
  Hiroki Yoshikawa, Akira Uchiyama, Teruo Higashino
- Organizer
  The 2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA 2021)
- Related Report
  2021 Research-status Report
- Int'l Joint Research
[Presentation] 不均衡データセットを用いた回帰問題における損失関数の検討2021
- Author(s)
  吉川寛樹, 内山彰, 東野輝夫
- Organizer
  情報処理学会MBL研究会第99回研究発表会
- Related Report
  2021 Research-status Report

Data balancing for regression using imbalanced dataset

Principal Investigator

Yoshikawa Hiroki 京都橘大学, 工学部, 助教R (10905350)

¥3,120,000 (Direct Cost: ¥2,400,000、Indirect Cost: ¥720,000)

Report

Research Products

[Presentation] Privacy-preserving data augmentation for thermal sensation dataset based on variational autoencoder2022

Author(s)

Organizer

Related Report

[Presentation] Data Balancing for Thermal Comfort Datasets Using Conditional Wasserstein GAN with a Weighted Loss Function2021

Author(s)

Organizer

Related Report

[Presentation] Time-Series Physiological Data Balancing for Regression2021

Author(s)

Organizer

Related Report

[Presentation] 不均衡データセットを用いた回帰問題における損失関数の検討2021

Author(s)

Organizer

Related Report