The bibliographic archives used to study scientific collaboration can affect the derived bibliometric indicators as well as the co-authorship network structure. Indeed, the most used international databases might not be able to cover all kinds of works, especially for those disciplines having a more national orientation in their scientific production. In this case, the integration of high-impact journals databases with specialized and local bibliographic archives may be the best compromise to obtain a good coverage of whole research products of scientists involved in a specific field. To carry out the above task, two main challenges have to be addressed: 1) how to combine information by identifying and linking duplicate records, i.e. record linkage, and 2) how to deal with authors name disambiguation, i.e. synonyms and polysems. In this study, we aimed at discussing main issues and practical considerations when these two features are dealt to reach a better quality of co-authorship data for network analysis. Specifically, the bibliographic archives used in De Stefano et al. [2013] are joined to obtain a unified coauthorship network, based on both top-international as well as nationally oriented scientific production of Italian academic Statisticians. To this aim, in the first step a semi-automatic method was adopted to merge three bibliographic archives. Due to the lack of training data, in the second step a modified version of the techniques described in Strotmann et al. [2009] provided promising results for author name disambiguation. Once we assessed how well the two procedures fared in achieving high quality results in the constructed co-authorship network, further statistical analyses will be devoted to identify the co-authorship characteristics of the emerging groups of statisticians under analysis.

Improving co-authorship network structure by combining different data sources: issues and practical considerations

DE STEFANO, DOMENICO;ZACCARIN, SUSANNA
2015-01-01

Abstract

The bibliographic archives used to study scientific collaboration can affect the derived bibliometric indicators as well as the co-authorship network structure. Indeed, the most used international databases might not be able to cover all kinds of works, especially for those disciplines having a more national orientation in their scientific production. In this case, the integration of high-impact journals databases with specialized and local bibliographic archives may be the best compromise to obtain a good coverage of whole research products of scientists involved in a specific field. To carry out the above task, two main challenges have to be addressed: 1) how to combine information by identifying and linking duplicate records, i.e. record linkage, and 2) how to deal with authors name disambiguation, i.e. synonyms and polysems. In this study, we aimed at discussing main issues and practical considerations when these two features are dealt to reach a better quality of co-authorship data for network analysis. Specifically, the bibliographic archives used in De Stefano et al. [2013] are joined to obtain a unified coauthorship network, based on both top-international as well as nationally oriented scientific production of Italian academic Statisticians. To this aim, in the first step a semi-automatic method was adopted to merge three bibliographic archives. Due to the lack of training data, in the second step a modified version of the techniques described in Strotmann et al. [2009] provided promising results for author name disambiguation. Once we assessed how well the two procedures fared in achieving high quality results in the constructed co-authorship network, further statistical analyses will be devoted to identify the co-authorship characteristics of the emerging groups of statisticians under analysis.
File in questo prodotto:
File Dimensione Formato  
De Stefano_Improving co-authorship network structure by combining different data sources.pdf

Accesso chiuso

Descrizione: abstract
Tipologia: Documento in Versione Editoriale
Licenza: Copyright Editore
Dimensione 68.71 kB
Formato Adobe PDF
68.71 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/2903979
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact