In this paper we introduce and experimentally assess SemSynX, a novel technique for supporting similarity analysis of XML data via semantic and syntactic heterogeneity/homogeneity detection. Given two XML trees, SemSynX retrieves a list of semantic and syntactic heterogeneity/homogeneity matches of objects (i.e., elements, values, tags, attributes) occurring in certain paths of the trees. A local score that takes into account the path and value similarity is given for each heterogeneity/homogeneity found. A global score that summarizes the number of equal matches as well as the local scores globally is also provided. The proposed technique is highly customizable, and it permits the specification of thresholds for the requested degree of similarity for paths and values as well as for the degree of relevance for path and value matching. It thus makes possible to “adjust” the similarity analysis depending on the nature of the input XML trees. SemSynX has been implemented in terms of a XQuery library, as to enhance interoperability with other XML processing tools. To complete our analytical contributions, a comprehensive experimental assessment and evaluation of SemSynX over several classes of XML documents is provided.

SemSynX: Flexible similarity analysis of XML data via semantic and syntactic heterogeneity/homogeneity detection

CUZZOCREA, Alfredo Massimiliano
2016-01-01

Abstract

In this paper we introduce and experimentally assess SemSynX, a novel technique for supporting similarity analysis of XML data via semantic and syntactic heterogeneity/homogeneity detection. Given two XML trees, SemSynX retrieves a list of semantic and syntactic heterogeneity/homogeneity matches of objects (i.e., elements, values, tags, attributes) occurring in certain paths of the trees. A local score that takes into account the path and value similarity is given for each heterogeneity/homogeneity found. A global score that summarizes the number of equal matches as well as the local scores globally is also provided. The proposed technique is highly customizable, and it permits the specification of thresholds for the requested degree of similarity for paths and values as well as for the degree of relevance for path and value matching. It thus makes possible to “adjust” the similarity analysis depending on the nature of the input XML trees. SemSynX has been implemented in terms of a XQuery library, as to enhance interoperability with other XML processing tools. To complete our analytical contributions, a comprehensive experimental assessment and evaluation of SemSynX over several classes of XML documents is provided.
File in questo prodotto:
File Dimensione Formato  
Proceeding article.pdf

Accesso chiuso

Descrizione: proceeding article
Tipologia: Documento in Versione Editoriale
Licenza: Digital Rights Management non definito
Dimensione 5.66 MB
Formato Adobe PDF
5.66 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Front cover - table of contents.pdf

Accesso chiuso

Descrizione: Front cover proceedings - Table of contents
Tipologia: Documento in Versione Editoriale
Licenza: Digital Rights Management non definito
Dimensione 5.31 MB
Formato Adobe PDF
5.31 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11368/2898318
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact