Knowledge discovery and data mining generally discovers implicit, previously unknown, and useful knowledge from data. As one of the popular knowledge discovery and data mining tasks, frequent itemset mining, in particular, discovers knowledge in the form of sets of frequently co-occurring items, events, or objects. On the one hand, in many real-life applications, users mine frequent patterns from traditional databases of precise data, in which users know certainly the presence of items in transactions. On the other hand, in many other real-life applications, users mine frequent itemsets from probabilistic sets of uncertain data, in which users are uncertain about the likelihood of the presence of items in transactions. Each item in these probabilistic sets of uncertain data is often associated with an existential probability expressing the likelihood of its presence in that transaction. To mine frequent itemsets from these probabilistic datasets, many existing algorithms capture lots of information to compute expected support. To reduce the amount of space required, algorithms capture some but not all information in computing or approximating expected support. The tradeoff is that the upper bounds to expected support may not be tight. In this paper, we examine several upper bounds and recommend to the user which ones consume less space while providing good approximation to expected support of frequent itemsets in mining probabilistic sets of uncertain data.

Approximation to expected support of frequent itemsets in mining probabilistic sets of uncertain data

CUZZOCREA, Alfredo Massimiliano;
2015

Abstract

Knowledge discovery and data mining generally discovers implicit, previously unknown, and useful knowledge from data. As one of the popular knowledge discovery and data mining tasks, frequent itemset mining, in particular, discovers knowledge in the form of sets of frequently co-occurring items, events, or objects. On the one hand, in many real-life applications, users mine frequent patterns from traditional databases of precise data, in which users know certainly the presence of items in transactions. On the other hand, in many other real-life applications, users mine frequent itemsets from probabilistic sets of uncertain data, in which users are uncertain about the likelihood of the presence of items in transactions. Each item in these probabilistic sets of uncertain data is often associated with an existential probability expressing the likelihood of its presence in that transaction. To mine frequent itemsets from these probabilistic datasets, many existing algorithms capture lots of information to compute expected support. To reduce the amount of space required, algorithms capture some but not all information in computing or approximating expected support. The tradeoff is that the upper bounds to expected support may not be tight. In this paper, we examine several upper bounds and recommend to the user which ones consume less space while providing good approximation to expected support of frequent itemsets in mining probabilistic sets of uncertain data.
http://www.sciencedirect.com/science/journal/18770509/60
File in questo prodotto:
File Dimensione Formato  
Approximation to expected support of frequent itemsets in mining probabilistic sets of uncertain data.pdf

accesso aperto

Descrizione: articolo principale
Tipologia: Documento in Versione Editoriale
Licenza: Digital Rights Management non definito
Dimensione 337.3 kB
Formato Adobe PDF
337.3 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11368/2872368
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 2
social impact