Abstract: | Interactive data exploration platforms in Web, business and scientific domains are becoming increasingly popular. Typically, users without prior knowledge of data interact with these platforms in an exploratory manner hoping they might retrieve the results they are looking for. One way to explore large-volume data is by posing aggregate queries which group values of multiple rows by an aggregate operator to form a single value: an aggregated value. Though, when a query fails, i.e., returns undesired aggregated value, users will have to undertake a frustrating trial-and-error process to refine their queries, until a desired result is attained. This data exploration process, however, is growing rather difficult as the underlying data is typically of large-volume and high-dimensionality. While heuristic-based techniques are fairly successful in generating refined queries that meet specified requirements on the aggregated values, they are rather oblivious to the (dis)similarity between the input query and its corresponding refined version. Meanwhile, enforcing a similarity-aware query refinement is rather a non-trivial challenge, as it requires a careful examination of the query space while maintaining a low processing cost. To address this challenge, we propose an innovative scheme for efficient Similarity-Aware Refinement of Aggregation Queries called (EAGER) which aims to balance the tradeoff between satisfying the aggregate and similarity constraints imposed on the refined query to maximize its overall benefit to the user. To achieve that goal, EAGER implements efficient strategies to minimize the costs incurred in exploring the available search space by utilizing similarity-based and monotonic-based pruning techniques to bound the search space and quickly find a refined query that meets users’ expectations. Our extensive experiments show the scalability exhibited by EAGER under various workload settings, and the significant benefits it provides. |