A stream of research on co-authorship, used as a proxy of scholars’ collaborative behavior, focuses on members of a given scientific community defined at discipline and/or national basis for which co-authorship data have to be retrieved. Recent literature pointed out that international digital libraries provide partial coverage of the entire scholar scientific production as well as under-coverage of the scholars in the community. Bias in retrieving co-authorship data of the community of interest can affect network construction and network measures in several ways, providing a partial picture of the real collaboration in writing papers among scholars. In this contribution, we collected bibliographic records of Italian academic statisticians from an online platform (IRIS) available at most universities. Even if it guarantees a high coverage rate of our population and its scientific production, it is necessary to deal with some data quality issues. Thus, a web scraping procedure based on a semi-automatic tool to retrieve publication metadata, as well as data management tools to detect duplicate records and to reconcile authors, is proposed. As a result of our procedure, it emerged that collaboration is an active and increasing practice for Italian academic statisticians with some differences according to the gender, the academic ranking, and the university location of scholars. The heuristic procedure to accomplish data quality issues in the IRIS platform can represent a working case report to adapt to other bibliographic archives with similar characteristics.

Quality issues in co-authorship data of a national scientific community

De Stefano, Domenico;Vitale, Maria Prosperina
;
Zaccarin, Susanna
2023-01-01

Abstract

A stream of research on co-authorship, used as a proxy of scholars’ collaborative behavior, focuses on members of a given scientific community defined at discipline and/or national basis for which co-authorship data have to be retrieved. Recent literature pointed out that international digital libraries provide partial coverage of the entire scholar scientific production as well as under-coverage of the scholars in the community. Bias in retrieving co-authorship data of the community of interest can affect network construction and network measures in several ways, providing a partial picture of the real collaboration in writing papers among scholars. In this contribution, we collected bibliographic records of Italian academic statisticians from an online platform (IRIS) available at most universities. Even if it guarantees a high coverage rate of our population and its scientific production, it is necessary to deal with some data quality issues. Thus, a web scraping procedure based on a semi-automatic tool to retrieve publication metadata, as well as data management tools to detect duplicate records and to reconcile authors, is proposed. As a result of our procedure, it emerged that collaboration is an active and increasing practice for Italian academic statisticians with some differences according to the gender, the academic ranking, and the university location of scholars. The heuristic procedure to accomplish data quality issues in the IRIS platform can represent a working case report to adapt to other bibliographic archives with similar characteristics.
File in questo prodotto:
File Dimensione Formato  
quality-issues-in-co-authorship-data-of-a-national-scientific-community.pdf

accesso aperto

Tipologia: Documento in Versione Editoriale
Licenza: Creative commons
Dimensione 457.57 kB
Formato Adobe PDF
457.57 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/3038630
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact