The problem of supporting similarity analysis of XML data is a major problem in the data fusion research area. Several approaches have been proposed in literature, but lack of flexibility represents a hard challenge to be faced-off, especially in modern Cloud Computing environments. Inspired by this motivation, we propose SemSynX, a novel technique for supporting similarity analysis of XML data via semantic and syntactic heterogeneity/homogeneity detection. SemSynX retrieves several similarity scores over input XML documents, thus enabling flexible management and “customization” of similarity tools over XML data. In particular, the proposed technique is highly customizable, and it permits the specification of thresholds for the requested degree of similarity for paths and values as well as for the degree of relevance for path and value matching. Also, selection of paths and semantics-based comparison of label content are supported. It thus makes possible to “adjust” the similarity analysis depending on the nature of the input XML documents.

Towards flexible similarity analysis of XML data

CUZZOCREA, Alfredo Massimiliano
2015

Abstract

The problem of supporting similarity analysis of XML data is a major problem in the data fusion research area. Several approaches have been proposed in literature, but lack of flexibility represents a hard challenge to be faced-off, especially in modern Cloud Computing environments. Inspired by this motivation, we propose SemSynX, a novel technique for supporting similarity analysis of XML data via semantic and syntactic heterogeneity/homogeneity detection. SemSynX retrieves several similarity scores over input XML documents, thus enabling flexible management and “customization” of similarity tools over XML data. In particular, the proposed technique is highly customizable, and it permits the specification of thresholds for the requested degree of similarity for paths and values as well as for the degree of relevance for path and value matching. Also, selection of paths and semantics-based comparison of label content are supported. It thus makes possible to “adjust” the similarity analysis depending on the nature of the input XML documents.
9783319261379
9783319261386
http://link.springer.com/book/10.1007/978-3-319-26138-6/page/4
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/2872401
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact