In this contribution we discuss data quality issues related to the application of web scraping techniques to the Cineca IRIS platform to derive co-authorship data among Italian university scholars. First, a semi-automatic tool is adopted to retrieve metadata from the platform, then a disambinguation network-based approach is considered to deal with author name disambiguation. This combined procedure is used to derive the co-authorship relations among Italian academic statisticians on the basis of the publications they inserted in the IRIS system until 2017.
Web-Based Data Collection and Quality Issues in Co-Authorship Network Analysis
Domenico De Stefano;Susanna Zaccarin
2019-01-01
Abstract
In this contribution we discuss data quality issues related to the application of web scraping techniques to the Cineca IRIS platform to derive co-authorship data among Italian university scholars. First, a semi-automatic tool is adopted to retrieve metadata from the platform, then a disambinguation network-based approach is considered to deal with author name disambiguation. This combined procedure is used to derive the co-authorship relations among Italian academic statisticians on the basis of the publications they inserted in the IRIS system until 2017.File in questo prodotto:
File | Dimensione | Formato | |
---|---|---|---|
Zaccarin_Web-Based Data Collection and Quality Issues.pdf
Accesso chiuso
Descrizione: contributo con frontespizio e indice del volume
Tipologia:
Documento in Versione Editoriale
Licenza:
Copyright Editore
Dimensione
1.53 MB
Formato
Adobe PDF
|
1.53 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.