Cost-Sensitive Sparse Random Machines for Imbalanced High-Dimensional Data

Fei Yu1, Liu Wenbing1
1College of Information and Intelligent, Hunan Agricultural University, Changsha, China, 410128
DOI: https://doi.org/10.71448/bcds2563-3
Published: 30/09/2025
Cite this article as: Fei Yu, Liu Wenbing. Cost-Sensitive Sparse Random Machines for Imbalanced High-Dimensional Data. Bulletin of Computer and Data Sciences, Volume 6 Issue 3. Page: 34-49.

Abstract

Random Machines (RM) are ensemble models that combine bootstrap sampling with support vector machines (SVMs) trained using randomly sampled kernels. Prior work has shown that RM can outperform Random Forests on many classification and regression tasks, but also that performance deteriorates in two practically important regimes: strong class imbalance and high-dimensional, low-sample-size data. In this paper we propose Cost-Sensitive Sparse Random Machines (CS–SRM), an extension of RM designed specifically for these settings. Each base learner in CS–SRM is a cost-sensitive SVM trained on a random feature subspace with either sparse linear or nonlinear kernels, and ensemble weights are derived from imbalance-aware out-of-bag metrics such as Matthews correlation coefficient and \(F_1\)-score. We outline the methodology, describe a simulation and case-study evaluation design, and summarize the types of results such a study would yield. Conceptually, CS–SRM improves minority-class performance and stability in high-dimensional regimes while preserving the flexibility of multi-kernel ensembles.

Keywords: random Machines, class imbalance, high-dimensional data, cost-sensitive SVM, sparse multi-kernel ensembles

Abstract

Random Machines (RM) are ensemble models that combine bootstrap sampling with support vector machines (SVMs) trained using randomly sampled kernels. Prior work has shown that RM can outperform Random Forests on many classification and regression tasks, but also that performance deteriorates in two practically important regimes: strong class imbalance and high-dimensional, low-sample-size data. In this paper we propose Cost-Sensitive Sparse Random Machines (CS–SRM), an extension of RM designed specifically for these settings. Each base learner in CS–SRM is a cost-sensitive SVM trained on a random feature subspace with either sparse linear or nonlinear kernels, and ensemble weights are derived from imbalance-aware out-of-bag metrics such as Matthews correlation coefficient and \(F_1\)-score. We outline the methodology, describe a simulation and case-study evaluation design, and summarize the types of results such a study would yield. Conceptually, CS–SRM improves minority-class performance and stability in high-dimensional regimes while preserving the flexibility of multi-kernel ensembles.

Keywords: random Machines, class imbalance, high-dimensional data, cost-sensitive SVM, sparse multi-kernel ensembles
Fei Yu
College of Information and Intelligent, Hunan Agricultural University, Changsha, China, 410128
Liu Wenbing
College of Information and Intelligent, Hunan Agricultural University, Changsha, China, 410128

DOI

Cite this article as:

Fei Yu, Liu Wenbing. Cost-Sensitive Sparse Random Machines for Imbalanced High-Dimensional Data. Bulletin of Computer and Data Sciences, Volume 6 Issue 3. Page: 34-49.

Publication history

Copyright © 2025 Fei Yu, Liu Wenbing. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Browse Advance Search