期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations 总被引：1，自引：0，他引：1

Yi Ke Li Feifei Kollios George Srivastava Divesh 《Knowledge and Data Engineering, IEEE Transactions on》2008,20(12):1669-1682

This work introduces new algorithms for processing top-$k$ queries in uncertain databases, under the generally adopted model of x-relations. An x-relation consists of a number of x-tuples, and each x-tuple randomly instantiates into one tuple from one or more alternatives. Soliman et al.~cite{soliman07} first introduced the problem of top-$k$ query processing in uncertain databases and proposed various algorithms to answer such queries. Under the x-relation model, our new results significantly improve the state of the art, in terms of both running time and memory usage. In the single-alternative case, our new algorithms are 2 to 3 orders of magnitude faster than the previous algorithms. In the multi-alternative case, the improvement is even more dramatic: while the previous algorithms have exponential complexity in both time and space, our algorithms run in near linear or low polynomial time. Our study covers both types of top-$k$ queries proposed in cite{soliman07}. We provide both the theoretical analysis and an extensive experimental evaluation to demonstrate the superiority of the new approaches over existing solutions. 相似文献

2.

Efficient biased sampling for approximate clustering and outlier detection in large data sets 总被引：7，自引：0，他引：7

Kollios G. Gunopulos D. Koudas N. Berchtold S. 《Knowledge and Data Engineering, IEEE Transactions on》2003,15(5):1170-1187

We investigate the use of biased sampling according to the density of the data set to speed up the operation of general data mining tasks, such as clustering and outlier detection in large multidimensional data sets. In density-biased sampling, the probability that a given point will be included in the sample depends on the local density of the data set. We propose a general technique for density-biased sampling that can factor in user requirements to sample for properties of interest and can be tuned for specific data mining tasks. This allows great flexibility and improved accuracy of the results over simple random sampling. We describe our approach in detail, we analytically evaluate it, and show how it can be optimized for approximate clustering and outlier detection. Finally, we present a thorough experimental evaluation of the proposed method, applying density-biased sampling on real and synthetic data sets, and employing clustering and outlier detection algorithms, thus highlighting the utility of our approach. 相似文献

3.

Hashing methods for temporal data 总被引：2，自引：0，他引：2

Kollios G. Tsotras V.J. 《Knowledge and Data Engineering, IEEE Transactions on》2002,14(4):902-919

External dynamic hashing has been used in traditional database systems as a fast method for answering membership queries. Given a dynamic set S of objects, a membership query asks whether an object with identity k is in (the most current state of) S. This paper addresses the more general problem of temporal hashing. In this setting, changes to the dynamic set are time-stamped and the membership query has a temporal predicate, as in: "Find whether object with identity k was in set S at time t". We present an efficient solution for this problem that takes an ephemeral hashing scheme and makes it partially persistent. Our solution, also termed partially persistent hashing, uses a space that is linear on the total number of changes in the evolution of set S and has a small {O[log_B(n/B)]} query overhead. An experimental comparison of partially persistent hashing with various straightforward approaches (like external linear hashing, the multi-version B-tree and the R*-tree) shows that it provides the faster membership query response time. Partially persistent hashing should be seen as an extension of traditional external dynamic hashing in a temporal environment. It is independent of the ephemeral dynamic hashing scheme used; while this paper concentrates on linear hashing, the methodology applies to other dynamic hashing schemes as well 相似文献

4.

Mining frequent arrangements of temporal intervals 总被引：3，自引：3，他引：0

Panagiotis Papapetrou George Kollios Stan Sclaroff Dimitrios Gunopulos 《Knowledge and Information Systems》2009,21(2):133-171

The problem of discovering frequent arrangements of temporal intervals is studied. It is assumed that the database consists of sequences of events, where an event occurs during a time-interval. The goal is to mine temporal arrangements of event intervals that appear frequently in the database. The motivation of this work is the observation that in practice most events are not instantaneous but occur over a period of time and different events may occur concurrently. Thus, there are many practical applications that require mining such temporal correlations between intervals including the linguistic analysis of annotated data from American Sign Language as well as network and biological data. Three efficient methods to find frequent arrangements of temporal intervals are described; the first two are tree-based and use breadth and depth first search to mine the set of frequent arrangements, whereas the third one is prefix-based. The above methods apply efficient pruning techniques that include a set of constraints that add user-controlled focus into the mining process. Moreover, based on the extracted patterns a standard method for mining association rules is employed that applies different interestingness measures to evaluate the significance of the discovered patterns and rules. The performance of the proposed algorithms is evaluated and compared with other approaches on real (American Sign Language annotations and network data) and large synthetic datasets. 相似文献

5.

Self-tuning management of update-intensive multidimensional data in clusters of workstations

Vassil Kriakov George Kollios Alex Delis 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(3):739-764

Contemporary applications continuously modify large volumes of multidimensional data that must be accessed efficiently and, more importantly, must be updated in a timely manner. Single-server storage approaches are insufficient when managing such volumes of data, while the high frequency of data modification render classical indexing methods inefficient. To address these two problems we introduce a distributed storage manager for multidimensional data based on a Cluster-of-Workstations. The manager addresses the above challenges through a set of mechanisms that, through selective on-line data reorganization, collectively maintain a balanced load across a cluster of workstations. With the help of both a highly efficient and speedy self-tuning mechanism, based on a new data structure called stat-index, as well as a query aggregation and clustering algorithm, our storage manager attains short query response times even in the presence of massive modifications and highly skewed access patterns. Furthermore, we provide a data migration cost model used to determine the best data redistribution strategy. Through extensive experimentation with our prototype, we establish that our storage manager can sustain significant update rates with minimal overhead. 相似文献

6.

Elastic Translation Invariant Matching of Trajectories

Michail Vlachos George Kollios Dimitrios Gunopulos 《Machine Learning》2005,58(2-3):301-334

We investigate techniques for analysis and retrieval of object trajectories. We assume that a trajectory is a sequence of two or three dimensional points. Trajectory datasets are very common in environmental applications, mobility experiments, video surveillance and are especially important for the discovery of certain biological patterns. Such kind of data usually contain a great amount of noise, that makes all previously used metrics fail. Therefore, here we formalize non-metric similarity functions based on the Longest Common Subsequence (LCSS), which are very robust to noise and furthermore provide an intuitive notion of similarity between trajectories by giving more weight to the similar portions of the sequences. Stretching of sequences in time is allowed, as well as global translating of the sequences in space. Efficient approximate algorithms that compute these similarity measures are also provided. We compare these new methods to the widely used Euclidean and Dynamic Time Warping distance functions (for real and synthetic data) and show the superiority of our approach, especially under the strong presence of noise. We prove a weaker version of the triangle inequality and employ it in an indexing structure to answer nearest neighbor queries. Finally, we present experimental results that validate the accuracy and efficiency of our approach. 相似文献

7.

Spatio-temporal join selectivity

Jimeng Sun Yufei Tao Dimitris Papadias George Kollios 《Information Systems》2006,31(8):793-813

Given two sets S₁, S₂ of moving objects, a future timestamp t_q, and a distance threshold d, a spatio-temporal join retrieves all pairs of objects that are within distance d at t_q. The selectivity of a join equals the number of retrieved pairs divided by the cardinality of the Cartesian product S₁×S₂. This paper develops a model for spatio-temporal join selectivity estimation based on rigorous probabilistic analysis, and reveals the factors that affect the selectivity. Initially, we solve the problem for 1D (point and rectangle) objects whose location and velocities distribute uniformly, and then extend the results to multi-dimensional spaces. Finally, we deal with non-uniform distributions using a specialized spatio-temporal histogram. Extensive experiments confirm that the proposed formulae are highly accurate (average error below 10%). 相似文献

8.

Changes in gonadotrophin response to gonadotrophin releasing hormone in normal women following bilateral ovariectomy

E Alexandris S Milingos G Kollios K Seferiadis D Lolis IE Messinis 《Canadian Metallurgical Quarterly》1997,47(6):721-726

OBJECTIVE: Pituitary responsiveness to GnRH varies throughout the normal menstrual cycle. We have investigated whether there are differences in the ovarian mechanisms which regulate gonadotrophin secretion between the follicular and the luteal phase of the cycle. DESIGN: Normally ovulating women were studied during the first week following hysterectomy plus bilateral ovariectomy performed either in the mid- to late follicular phase (follicle size 16 mm) or in the early to midluteal phase (5 days post LH peak). The response of LH to a single dose of 10 micrograms GnRH was investigated 2 hours before the operation and every 12 hours after the operation until postoperative day 4 and every 24 hours until day 8. PATIENTS: Fourteen normally cycling premenopausal women with normal FSH (< 10 IU/l). Seven women were ovariectomized in the follicular and 7 in the luteal phase. MEASUREMENTS: Pituitary response to GnRH was calculated as the net increase in FSH (delta FSH) and LH (delta LH) at 30 minutes above the basal value. RESULTS: Basal levels of FSH and LH before the operation were significantly lower in the luteal than the follicular phase (P < 0.05), while those of oestradiol (E2) were similar. Also, similar were delta LH and delta FSH values. Serum progesterone and immunoreactive inhibin (Ir-inhibin) concentrations before the operation were higher in the luteal than the follicular phase (P < 0.05). Following the operation, serum E2, progesterone and Ir-inhibin values declined dramatically, while basal FSH and LH as well as delta FSH values showed a gradual and significant increase. The percentage increase in FSH and LH values (mean +/- SEM) on day 8 after the operation was similar in the follicular (453 +/- 99% and 118 +/- 35% respectively) and the luteal phase (480 +/- 71% and 192 +/- 45% respectively). In contrast to delta FSH, delta LH values after a temporal increase 12 hours from the operation, remained stable in the follicular phase and declined significantly in the luteal phase up to day 4. CONCLUSIONS: Basal gonadotrophin secretion during the normal menstrual cycle is predominantly under a negative ovarian effect. It is suggested that in contrast to FSH, the secretion of LH in response to GnRH is controlled by different ovarian mechanisms during the two phases of the menstrual cycle. 相似文献

9.

Leptin concentrations in the follicular phase of spontaneous cycles and cycles superovulated with follicle stimulating hormone

IE Messinis S Milingos K Zikopoulos G Kollios K Seferiadis D Lolis 《Canadian Metallurgical Quarterly》1998,13(5):1152-1156

It has been reported that oestradiol may play a role in the production of leptin from adipocytes. To investigate this relationship further, nine normally ovulating women were studied during two menstrual cycles, i.e. an untreated spontaneous cycle and a cycle treated with follicle stimulating hormone (FSH) from cycle day 2 until the day of human chorionic gonadotrophin (HCG) injection. Serum leptin values on cycle day 2 did not differ significantly between the spontaneous and the FSH cycles. In the spontaneous cycles, leptin values declined gradually and significantly up to day 7 and then increased progressively up to the day of luteinizing hormone (LH) surge onset, at which point they achieved the highest values. In the FSH cycles, serum leptin values increased gradually and significantly up to day 6, remaining stable thereafter, and were in the midfollicular phase significantly higher than in the spontaneous cycles. Significant positive correlations were found between mean values of leptin and mean values of oestradiol during the second half of the follicular phase in the spontaneous cycles and during the first half in the FSH cycles. A significant negative correlation was found between these two parameters in the spontaneous cycles during the first half of the follicular phase. Serum leptin levels were significantly higher in the midluteal than in the follicular phase in both cycles. These results demonstrate for the first time significant changes in leptin values during the follicular phase of the human menstrual cycle and a significant increase during superovulation induction with FSH. It is suggested that oestradiol may be involved in the regulation of leptin production in women. 相似文献

10.

BoostMap: an embedding method for efficient nearest neighbor retrieval

Athitsos V Alon J Sclaroff S Kollios G 《IEEE transactions on pattern analysis and machine intelligence》2008,30(1):89-104

This paper describes BoostMap, a method for efficient nearest neighbor retrieval under computationally expensive distance measures. Database and query objects are embedded into a vector space, in which distances can be measured efficiently. Each embedding is treated as a classifier that predicts for any three objects X, A, B whether X is closer to A or to B. It is shown that a linear combination of such embeddingbased classifiers naturally corresponds to an embedding and a distance measure. Based on this property, the BoostMap method reduces the problem of embedding construction to the classical boosting problem of combining many weak classifiers into an optimized strong classifier. The classification accuracy of the resulting strong classifier is a direct measure of the amount of nearest neighbor structure preserved by the embedding. An important property of BoostMap is that the embedding optimization criterion is equally valid in both metric and non-metric spaces. Performance is evaluated in databases of hand images, handwritten digits, and time series. In all cases, BoostMap significantly improves retrieval efficiency with small losses in accuracy compared to brute-force search. Moreover, BoostMap significantly outperforms existing nearest neighbor retrieval methods, such as Lipschitz embeddings, FastMap, and VP-trees. 相似文献