Similarity searches retrieve elements in a dataset with similar characteristics to the input query element. Recent works show that graph-based methods have outperformed others in the literature, such as tree-based and hash-based methods. However, graphs are highly parameter-sensitive for indexing and searching, which usually demands extra time for finding a suitable trade-off for specific user requirements. Current approaches to select parameters rely on observing published experimental results or Grid Search procedures. While the former has no guarantees that good settings for a dataset will also perform well on a different one, the latter is computationally expensive and limited to a small range of values. In this work, we propose a meta-learning-based recommender framework capable of providing a suitable graph configuration according to the characteristics of the input dataset. We present two instantiations of the framework: a global instantiation that uses the whole meta-database to train meta-models and a dataset-similarity-based instantiation that relies on clustering to generate meta-models tailored to datasets with similar characteristics. We also developed generic and tuned versions of the instantiations. The generic versions can satisfy user requirements in orders of magnitude faster than the traditional Grid Search. The tuned versions provide more accurate predictions at a higher cost. Our results show that the tuned methods outperform the Grid Search for most cases, providing recommendations close to the optimal one and being a suitable alternative, particularly for more challenging datasets.
A meta-learning configuration framework for graph-based similarity search indexes
Barbon S.;
2023-01-01
Abstract
Similarity searches retrieve elements in a dataset with similar characteristics to the input query element. Recent works show that graph-based methods have outperformed others in the literature, such as tree-based and hash-based methods. However, graphs are highly parameter-sensitive for indexing and searching, which usually demands extra time for finding a suitable trade-off for specific user requirements. Current approaches to select parameters rely on observing published experimental results or Grid Search procedures. While the former has no guarantees that good settings for a dataset will also perform well on a different one, the latter is computationally expensive and limited to a small range of values. In this work, we propose a meta-learning-based recommender framework capable of providing a suitable graph configuration according to the characteristics of the input dataset. We present two instantiations of the framework: a global instantiation that uses the whole meta-database to train meta-models and a dataset-similarity-based instantiation that relies on clustering to generate meta-models tailored to datasets with similar characteristics. We also developed generic and tuned versions of the instantiations. The generic versions can satisfy user requirements in orders of magnitude faster than the traditional Grid Search. The tuned versions provide more accurate predictions at a higher cost. Our results show that the tuned methods outperform the Grid Search for most cases, providing recommendations close to the optimal one and being a suitable alternative, particularly for more challenging datasets.File | Dimensione | Formato | |
---|---|---|---|
1-s2.0-S0306437922001016-main.pdf
Accesso chiuso
Tipologia:
Documento in Versione Editoriale
Licenza:
Copyright Editore
Dimensione
2.18 MB
Formato
Adobe PDF
|
2.18 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
1-s2.0-S0306437922001016-main-Post_print.pdf
Open Access dal 18/09/2023
Tipologia:
Bozza finale post-referaggio (post-print)
Licenza:
Creative commons
Dimensione
3.3 MB
Formato
Adobe PDF
|
3.3 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.