The number of known, bright (i < 18), high-redshift (z > 2.5) QSOs in the Southern hemisphere is considerably lower than the corresponding number in the Northern hemisphere due to the lack of multiwavelength surveys at δ< 0. Recent works, such as the QUBRICS survey, successfully identified new, high-redshift QSOs in the South by means of a machine-learning approach applied on a large photometric data-set. Building on the success of QUBRICS, we present a new QSO selection method based on the Probabilistic Random Forest (PRF), an improvement of the classic Random Forest algorithm. The PRF takes into account measurement errors, treating input data as probability distribution functions: this allows us to obtain better accuracy and a robust predictive model. We applied the PRF to the same photometric data-set used in QUBRICS, based on the SkyMapper DR1, Gaia DR2, 2MASS, WISE, and GALEX databases. The resulting candidate list includes 626 sources with i < 18. We estimate for our proposed algorithm a completeness of ~84 per cent and a purity of ~78 per cent on the test data-sets. Preliminary spectroscopic campaigns allowed us to observe 41 candidates, of which 29 turned out to be z > 2.5 QSOs. The performances of the PRF, currently comparable to those of the CCA, are expected to improve as the number of high-z QSOs available for the training sample grows: results are however already promising, despite this being one of the first applications of this method to an astrophysical context.

The probabilistic random forest applied to the selection of quasar candidates in the QUBRICS survey

Guarneri F.
;
Cristiani S.
;
Cupani G.;D'Odorico V.
2021-01-01

Abstract

The number of known, bright (i < 18), high-redshift (z > 2.5) QSOs in the Southern hemisphere is considerably lower than the corresponding number in the Northern hemisphere due to the lack of multiwavelength surveys at δ< 0. Recent works, such as the QUBRICS survey, successfully identified new, high-redshift QSOs in the South by means of a machine-learning approach applied on a large photometric data-set. Building on the success of QUBRICS, we present a new QSO selection method based on the Probabilistic Random Forest (PRF), an improvement of the classic Random Forest algorithm. The PRF takes into account measurement errors, treating input data as probability distribution functions: this allows us to obtain better accuracy and a robust predictive model. We applied the PRF to the same photometric data-set used in QUBRICS, based on the SkyMapper DR1, Gaia DR2, 2MASS, WISE, and GALEX databases. The resulting candidate list includes 626 sources with i < 18. We estimate for our proposed algorithm a completeness of ~84 per cent and a purity of ~78 per cent on the test data-sets. Preliminary spectroscopic campaigns allowed us to observe 41 candidates, of which 29 turned out to be z > 2.5 QSOs. The performances of the PRF, currently comparable to those of the CCA, are expected to improve as the number of high-z QSOs available for the training sample grows: results are however already promising, despite this being one of the first applications of this method to an astrophysical context.
File in questo prodotto:
File Dimensione Formato  
stab1867.pdf

accesso aperto

Tipologia: Documento in Versione Editoriale
Licenza: Copyright Editore
Dimensione 1.04 MB
Formato Adobe PDF
1.04 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/3057923
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 15
  • ???jsp.display-item.citation.isi??? 14
social impact