Cluster based oversampling for imbalanced learning

Oversampling is a widespread remedy used when there is data imbalance in classification problems. Some oversampling techniques amount to generating new cases in the minority class which are similar to the observed ones. ROSE (Random OverSampling Examples) is an algorithm for generating new data, both in minority and majority classes, by using ideas from kernel density estimation and bootstrap resampling. In this paper, we show that a new strategy which couples density-based clustering methods with ROSE can improve the performance of supervised classification methods with data imbalance. Evidence from some simulation experiments shows that the new procedure is promising and solves some issues related to the use of ROSE.

Cluster based oversampling for imbalanced learning

Gioia Di Credico;Nicola Torelli

2022-01-01

Abstract

Oversampling is a widespread remedy used when there is data imbalance in classification problems. Some oversampling techniques amount to generating new cases in the minority class which are similar to the observed ones. ROSE (Random OverSampling Examples) is an algorithm for generating new data, both in minority and majority classes, by using ideas from kernel density estimation and bootstrap resampling. In this paper, we show that a new strategy which couples density-based clustering methods with ROSE can improve the performance of supervised classification methods with data imbalance. Evidence from some simulation experiments shows that the new procedure is promising and solves some issues related to the use of ROSE.

Scheda breve

Scheda completa

	Anno
	
				2022
			
	ISBN
	
				9788891932310
			
	URL
	
				https://it.pearson.com/content/dam/region-core/italy/pearson-italy/pdf/Docenti/UniversitÃ /Sis-2022-4c-low.pdf
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti Convegno (Proceeding)

File in questo prodotto:

File	Dimensione	Formato
Di Credico_Cluster based oversampling for imbalanced learning.pdf Accesso chiuso Descrizione: contributo con frontespizio e indice del volume Tipologia: Documento in Versione Editoriale Licenza: Digital Rights Management non definito Dimensione 2.3 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.3 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/3030898

Citazioni

ND

ND

ND

social impact