Resampling is a class of statistical techniques (including bootstrapping) that allows the estimation of the sampling distribution of an estimator by approximating the sampling from the true underlying population. In general the sample is made with replacement from a given dataset S on n observation in order to generate B pseudosamples S*, generally, of size n. The distribution of the statistic of interest θ is evaluated for each of the B pseudosamples realizations and these values, denoted with θ*, are used to obtain the sampling distribution of θ. Recently, bootstrap has been successfully implemented also in social network framework (Borgatti, Snijders, 1999). In this context bootstrap is mainly used to obtain standard error of the estimator of a given network statistic η and to compute the related confidence intervals. In this contribution, we start from the consideration that, in their classical form, resampling techniques, and in particular bootstrap, are designed to sample from independent and identically distributed (i.i.d.) data (Efron, 1979). Often this assumption does not hold when we deal with relational data. The reason why the i.i.d. assumption does not work in case of networks is due to two common characteristics present in many real networks: i) skewed degree distribution; ii) autocorrelation. The characteristic i occurs when there exist actors with very high concentration of links compared to the others, whereas the characteristic ii can be present in several forms, for instance when two actors are connected because of a third actor is tied to each of them. This complex structure causes some problem in using i.i.d. resampling techniques, mainly because these features decrease the effective observed sample size and increase the variance of the parameters that are estimated from network data (Jensen and Neville, 2005). Several resampling procedures have been developed to deal with dependent data (Lahiri, 2003; Buhlmann, 2002; Carlstein et al., 1998) but very few of them have been used in the relational data framework (Eldardiry, Neville 2008). In this paper we present a procedure to resample from a network taking into account both the presence of very connected nodes and the preservation of the dependency of the network structure. The presented resampling approach allows to sample from a global structure that varies while the local dependency structure is preserved.
A Resampling Procedure for the Estimation of Network Parameters
DE STEFANO, DOMENICO;
2009-01-01
Abstract
Resampling is a class of statistical techniques (including bootstrapping) that allows the estimation of the sampling distribution of an estimator by approximating the sampling from the true underlying population. In general the sample is made with replacement from a given dataset S on n observation in order to generate B pseudosamples S*, generally, of size n. The distribution of the statistic of interest θ is evaluated for each of the B pseudosamples realizations and these values, denoted with θ*, are used to obtain the sampling distribution of θ. Recently, bootstrap has been successfully implemented also in social network framework (Borgatti, Snijders, 1999). In this context bootstrap is mainly used to obtain standard error of the estimator of a given network statistic η and to compute the related confidence intervals. In this contribution, we start from the consideration that, in their classical form, resampling techniques, and in particular bootstrap, are designed to sample from independent and identically distributed (i.i.d.) data (Efron, 1979). Often this assumption does not hold when we deal with relational data. The reason why the i.i.d. assumption does not work in case of networks is due to two common characteristics present in many real networks: i) skewed degree distribution; ii) autocorrelation. The characteristic i occurs when there exist actors with very high concentration of links compared to the others, whereas the characteristic ii can be present in several forms, for instance when two actors are connected because of a third actor is tied to each of them. This complex structure causes some problem in using i.i.d. resampling techniques, mainly because these features decrease the effective observed sample size and increase the variance of the parameters that are estimated from network data (Jensen and Neville, 2005). Several resampling procedures have been developed to deal with dependent data (Lahiri, 2003; Buhlmann, 2002; Carlstein et al., 1998) but very few of them have been used in the relational data framework (Eldardiry, Neville 2008). In this paper we present a procedure to resample from a network taking into account both the presence of very connected nodes and the preservation of the dependency of the network structure. The presented resampling approach allows to sample from a global structure that varies while the local dependency structure is preserved.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.