Designing a Parallel Relational Data Warehouse (PRDW) consists of a set of tasks: (i) choosing the hardware architecture; (ii) fragmenting the data warehouse schema; (iii) allocating the generated fragments; (iv) replicating fragments in order to ensure high performance; (v) defining the strategies for load balancing and query processing. The major drawback of this life-cycle is the fact that it does not consider the inter-dependency among sub-problems related to the design of PRDW, and it makes use of heterogeneous metrics to evaluate the “quality” of the final design. In previous research efforts, we introduced an analytical cost model for parallel OLAP query processing in cluster environments. In a second experience, we have taken into account the inter-dependency existing between fragmentation and allocation. In this paper, we propose a novel methodology, called F&A&R, which further extends previous results, and defines an approach where the main PRDW design phases (i.e., fragmentation, allocation, and replication) are performed simultaneously, in a global fashion. In particular, our approach determines whether the fragmentation pattern currently generated is relevant to the allocation process or not. An original method of supporting data replication, based on fuzzy k-means clustering, is also proposed and successfully integrated within the whole design framework. Finally, we experimentally assessed the performance of F&A&R against a well-known data warehouse benchmark, with very promising results.

A Global Paradigm for Designing Parallel Relational Data Warehouses in Distributed Environments

CUZZOCREA, Alfredo Massimiliano
2014

Abstract

Designing a Parallel Relational Data Warehouse (PRDW) consists of a set of tasks: (i) choosing the hardware architecture; (ii) fragmenting the data warehouse schema; (iii) allocating the generated fragments; (iv) replicating fragments in order to ensure high performance; (v) defining the strategies for load balancing and query processing. The major drawback of this life-cycle is the fact that it does not consider the inter-dependency among sub-problems related to the design of PRDW, and it makes use of heterogeneous metrics to evaluate the “quality” of the final design. In previous research efforts, we introduced an analytical cost model for parallel OLAP query processing in cluster environments. In a second experience, we have taken into account the inter-dependency existing between fragmentation and allocation. In this paper, we propose a novel methodology, called F&A&R, which further extends previous results, and defines an approach where the main PRDW design phases (i.e., fragmentation, allocation, and replication) are performed simultaneously, in a global fashion. In particular, our approach determines whether the fragmentation pattern currently generated is relevant to the allocation process or not. An original method of supporting data replication, based on fuzzy k-means clustering, is also proposed and successfully integrated within the whole design framework. Finally, we experimentally assessed the performance of F&A&R against a well-known data warehouse benchmark, with very promising results.
LECTURE NOTES IN COMPUTER SCIENCE
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11368/2896374
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact