The number of known, bright (i < 18), high-redshift (z > 2.5) QSOs in the Southern hemisphere is considerably lower than the corresponding number in the Northern hemisphere due to the lack of multiwavelength surveys at δ< 0. Recent works, such as the QUBRICS survey, successfully identified new, high-redshift QSOs in the South by means of a machine-learning approach applied on a large photometric data-set. Building on the success of QUBRICS, we present a new QSO selection method based on the Probabilistic Random Forest (PRF), an improvement of the classic Random Forest algorithm. The PRF takes into account measurement errors, treating input data as probability distribution functions: this allows us to obtain better accuracy and a robust predictive model. We applied the PRF to the same photometric data-set used in QUBRICS, based on the SkyMapper DR1, Gaia DR2, 2MASS, WISE, and GALEX databases. The resulting candidate list includes 626 sources with i < 18. We estimate for our proposed algorithm a completeness of ~84 per cent and a purity of ~78 per cent on the test data-sets. Preliminary spectroscopic campaigns allowed us to observe 41 candidates, of which 29 turned out to be z > 2.5 QSOs. The performances of the PRF, currently comparable to those of the CCA, are expected to improve as the number of high-z QSOs available for the training sample grows: results are however already promising, despite this being one of the first applications of this method to an astrophysical context.
The probabilistic random forest applied to the selection of quasar candidates in the QUBRICS survey
Guarneri F.
;Cristiani S.
;Cupani G.;D'Odorico V.
2021-01-01
Abstract
The number of known, bright (i < 18), high-redshift (z > 2.5) QSOs in the Southern hemisphere is considerably lower than the corresponding number in the Northern hemisphere due to the lack of multiwavelength surveys at δ< 0. Recent works, such as the QUBRICS survey, successfully identified new, high-redshift QSOs in the South by means of a machine-learning approach applied on a large photometric data-set. Building on the success of QUBRICS, we present a new QSO selection method based on the Probabilistic Random Forest (PRF), an improvement of the classic Random Forest algorithm. The PRF takes into account measurement errors, treating input data as probability distribution functions: this allows us to obtain better accuracy and a robust predictive model. We applied the PRF to the same photometric data-set used in QUBRICS, based on the SkyMapper DR1, Gaia DR2, 2MASS, WISE, and GALEX databases. The resulting candidate list includes 626 sources with i < 18. We estimate for our proposed algorithm a completeness of ~84 per cent and a purity of ~78 per cent on the test data-sets. Preliminary spectroscopic campaigns allowed us to observe 41 candidates, of which 29 turned out to be z > 2.5 QSOs. The performances of the PRF, currently comparable to those of the CCA, are expected to improve as the number of high-z QSOs available for the training sample grows: results are however already promising, despite this being one of the first applications of this method to an astrophysical context.File | Dimensione | Formato | |
---|---|---|---|
stab1867.pdf
accesso aperto
Tipologia:
Documento in Versione Editoriale
Licenza:
Copyright Editore
Dimensione
1.04 MB
Formato
Adobe PDF
|
1.04 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.