In this contribution we discuss data quality issues related to the application of web scraping techniques to the Cineca IRIS platform to derive co-authorship data among Italian university scholars. First, a semi-automatic tool is adopted to retrieve metadata from the platform, then a disambinguation network-based approach is considered to deal with author name disambiguation. This combined procedure is used to derive the co-authorship relations among Italian academic statisticians on the basis of the publications they inserted in the IRIS system until 2017.

Web-Based Data Collection and Quality Issues in Co-Authorship Network Analysis

Domenico De Stefano;Susanna Zaccarin
2019-01-01

Abstract

In this contribution we discuss data quality issues related to the application of web scraping techniques to the Cineca IRIS platform to derive co-authorship data among Italian university scholars. First, a semi-automatic tool is adopted to retrieve metadata from the platform, then a disambinguation network-based approach is considered to deal with author name disambiguation. This combined procedure is used to derive the co-authorship relations among Italian academic statisticians on the basis of the publications they inserted in the IRIS system until 2017.
File in questo prodotto:
File Dimensione Formato  
Zaccarin_Web-Based Data Collection and Quality Issues.pdf

Accesso chiuso

Descrizione: contributo con frontespizio e indice del volume
Tipologia: Documento in Versione Editoriale
Licenza: Copyright Editore
Dimensione 1.53 MB
Formato Adobe PDF
1.53 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/2946992
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact