共查询到20条相似文献,搜索用时 15 毫秒
1.
Traditional sampling-based estimators infer the actual selectivity of a query based purely on runtime information gathering, excluding the previously collected information, which underutilizes the information available. Table-based and parametric estimators extrapolate the actual selectivity of a query based only on the previously collected information, ignoring online information, which results in inaccurate estimation in a frequently updated environment. We propose a novel hybrid estimator that utilizes and optimally combines the online and previously collected information. A theoretical analysis demonstrates that the online and previously collected information is complementary, and that the comprehensive utilization of the online and previously collected information is of value for further performance improvement. Our theoretical results are validated by a comprehensive experimental study using a practical database, in the presence of insert, delete and update operations. The hybrid approach is very promising in the sense that it provides an adaptive mechanism that allows the optimal combination of information obtained from different sources in order to achieve a higher estimation accuracy and reliability 相似文献
2.
Cloud computing has become a promising paradigm as next generation computing model, by providing computation, software, data access, and storage services that do not need to know the location of physical resources interconnected across the globe providing such services. In such an environment, important issues as information sharing and resource/service discovery arise. In order to overcome critical limitations in centralized approaches for information sharing and resource/service discovery, this paper proposes a framework of a scalable multi-attribute hybrid overlay featured with decentralized information sharing, flexible resource/service discovery, fault tolerance and load balancing. Additionally, the proposed hybrid overlay integrates a structured P2P system with an unstructured one to support complex queries. Mechanisms such as load balancing and fault tolerance implemented in our proposed system to improve the overall system performance are also discussed. Experimental results show that the performance of the proposed approach is feasible and stable, as the proposed hybrid overlay improves system performance by reducing the number of routing hops and balancing the load by migrating requests. 相似文献
3.
The problem of answering queries using views is to find efficient methods of answering a query using a set of previously
materialized views over the database, rather than accessing the database relations. The problem has received significant attention
because of its relevance to a wide variety of data management problems, such as data integration, query optimization, and
the maintenance of physical data independence. To date, the performance of proposed algorithms has received very little attention,
and in particular, their scale up in the presence of a large number of views is unknown. We first analyze two previous algorithms,
the bucket algorithm and the inverse-rules, and show their deficiencies. We then describe the MiniCon, a novel algorithm for
finding the maximally-contained rewriting of a conjunctive query using a set of conjunctive views. We present the first experimental
study of algorithms for answering queries using views. The study shows that the MiniCon scales up well and significantly outperforms
the previous algorithms. We describe an extension of the MiniCon to handle comparison predicates, and show its performance
experimentally. Finally, we describe how the MiniCon can be extended to the context of query optimization.
Received: 15 October 2000 / Accepted: 15 April 2001 Published online: 28 June 2001 相似文献
4.
It has been observed that queries over XML data sources are often unsatisfiable. Unsatisfiability may stem from several different sources, e.g., the user may be insufficiently familiar with the labels appearing the documents, or may not be intimately aware of the hierarchical structure of the documents. To deal with query and document mismatches, previous research has considered returning answers that maximally satisfy (in some sense) the query, instead of only returning strictly satisfying answers. However, this breaks the golden database rule that only strictly satisfying answers are returned when querying. Indeed, the relationship between the query and answers is no longer clear, when unsatisfying answers are returned. To reinstate the golden database rule, this article proposes a framework for automatically correcting queries over XML. This framework generates similar satisfiable queries, when the user query is unsatisfiable. The user can then choose a satisfiable query of interest, and receive exactly satisfying answers to this query. 相似文献
5.
Modern applications requiring spatial network processing pose several interesting query optimization challenges. Spatial networks are usually represented as graphs, and therefore, queries involving a spatial network can be executed by using the corresponding graph representation. This means that the cost for executing a query is determined by graph properties such as the graph order and size (i.e., number of nodes and edges) and other graph parameters. In this paper, we present novel methods to estimate the number of nodes and edges in regions of interest in spatial networks, towards predicting the space and time requirements for range queries. The methods are evaluated by using real-life and synthetic data sets. Experimental results show that the number of nodes and edges can be estimated efficiently and accurately, with relatively small space requirements, thus providing useful information to the query optimizer. 相似文献
6.
Searching XML data using keyword queries has attracted much attention because it enables Web users to easily access XML data without having to learn a structured query language or study possibly complex data schemas. Most of the current approaches identify the meaningful results of a given keyword query based on the semantics of lowest common ancestor (LCA) and its variants. However, given the fact that LCA candidates are usually numerous and of low relevance to the users?? information need, how to effectively and efficiently identify the most relevant results from a large number of LCA candidates is still a challenging and unresolved issue. In this article, we introduce a novel semantics of relevant results based on mutual information between the query keywords. Then, we introduce a novel approach for identifying the relevant answers of a given query by adopting skyline semantics. We also recommend three different ranking criteria for selecting the top- k relevant results of the query. Efficient algorithms are proposed which rely on some provable properties of the dominance relationship between result candidates to rapidly identify the top- k dominant results. Extensive experiments were conducted to evaluate our approach and the results show that the proposed approach has a good performance compared with other existing approaches in different data sets and evaluation metrics 相似文献
7.
Twig query pattern matching is a core operation in XML query processing. Indexing XML documents for twig query processing
is of fundamental importance to supporting effective information retrieval. In practice, many XML documents on the web are
heterogeneous and have their own formats; documents describing relevant information can possess different structures. Therefore
some “user-interesting” documents having similar but non-exact structures against a user query are often missed out. In this
paper, we propose the RRSi, a novel structural index designed for structure-based query lookup on heterogeneous sources of XML documents supporting
proximate query answers. The index avoids the unnecessary processing of structurally irrelevant candidates that might show
good content relevance. An optimized version of the index, oRRSi, is also developed to further reduce both space requirements and computational complexity. To our knowledge, these structural
indexes are the first to support proximity twig queries on XML documents. The results of our preliminary experiments show
that RRSi and oRRSi based query processing significantly outperform previously proposed techniques in XML repositories with structural heterogeneity.
相似文献
8.
Video-on-demand service in wireless networks is one important step to achieving the goal of providing video services anywhere
anytime. Typically, carrier mobile networks are used to deliver videos wirelessly. Since every video stream comes from the
base station, regardless of what bandwidth sharing techniques are being utilized, the media stream system is still limited
by the network capacity of the base station. The key to overcome the scalability issue is to exploit resources available at
mobile clients in a peer-to-peer setting. We observe that it is common to have a carrier mobile network and a mobile peer-to-peer
network co-exist in a wireless environment. A feature of such hybrid environment is that the former offers high availability
assurance, while the latter presents an opportunistic use of resources available at mobile clients. Our proposed video-on-demand
technique, PatchPeer, leverages this network characteristic to allow the video-on-demand system scale beyond the bandwidth
capacity of the server. Mobile clients in PatchPeer are no longer passive receivers, but also active senders of video streams
to other mobile clients. Our extensive performance study shows that PatchPeer can accept more clients than the current state-of-the-art
technique, while maintaining the same Quality-of-Service to clients.
Tai T. Do
is a Ph.D. student in Computer Science at the University of Central Florida, working in the Data Systems Laboratory. He received
a B.S. degree in Electrical Engineering from the University of Oklahoma in 2001. His main research interests are Distributed
Systems and Databases (Peer-to-Peer Systems, Distributed Monitoring Queries), Communications and Networking (Video Delivery
Techniques, Wireless Communication Protocols), Decision Support Systems (Real-time Route Diversion Systems), and Security
and Privacy (Anonymity for Location-based Services). Tai T. Do is a recipient of the UCF Order of Pegasus, i.e. UCF Best Student
Award, class of 2008.
Kien A. Hua
received the B.S. degree in Computer Science, M.S. and Ph.D. degrees in Electrical Engineering, all from the University of
Illinois at Urbana-Champaign, in 1982, 1984, and 1987, respectively. Form 1987 to 1990 he was with IBM Corporation. He joined
the University of Central Florida in 1990, and is currently a professor in the School of Computer Science. Dr. Hua has published
widely including several papers recognized as best papers at various international conferences. He has served as Conference
Chair, Vice-Chair, Associate Chair, Demo Chair, and Program Committee Member for numerous ACM and IEEE conferences. Currently,
he is on the editorial boards of Journal of Multimedia Tools and Applications and International Journal of Advanced Information
Technology. Dr. Hua is an IEEE Fellow.
Ning Jiang
received the Ph.D. degree in Computer Science from the University of Central Florida. Currently, he is working at the Office
Lab at Microsoft Corp. His main research interests are Mobile computing, Data mining, and Network security.
Fuyu Liu
is a Ph.D. student in Computer Science at the University of Central Florida, working in the Data Systems Laboratory. His main
research interests are Distributed Systems and Databases (Distributed Monitoring Queries, Mobile COmputing), and Security
and Privacy (Anonymity for Location-based Services).
相似文献
9.
Given a large directed graph, rapidly answering reachability queries between source and target nodes is an important problem. Existing methods for reachability tradeoff indexing time and space versus query time performance. However, the biggest limitation of existing methods is that they do not scale to very large real-world graphs. We present a simple yet scalable reachability index, called GRAIL, that is based on the idea of randomized interval labeling and that can effectively handle very large graphs. Based on an extensive set of experiments, we show that while more sophisticated methods work better on small graphs, GRAIL is the only index that can scale to millions of nodes and edges. GRAIL has linear indexing time and space, and the query time ranges from constant time to being linear in the graph order and size. Our reference C++ implementations are open source and available for download at http://www.code.google.com/p/grail/. 相似文献
10.
Multiple moving objects, partially occluded objects, or even a single object moving against the background gives rise to discontinuities in the optical flow field in corresponding image sequences. While uniform global regularization based moderately fast techniques cannot provide accurate estimates of the discontinuous flow field, statistical optimization based accurate techniques suffer from excessive solution time. A `weighted anisotropic' smoothness based numerically robust algorithm is proposed that can generate discontinuous optical flow field with high speed and linear computational complexity. Weighted sum of the first-order spatial derivatives of the flow field is used for regularization. Less regularization is performed where strong gradient information is available. The flow field at any point is interpolated more from those at neighboring points along the weaker intensity gradient component. Such intensity gradient weighted regularization leads to Euler-Lagrange equations with strong anisotropies coupled with discontinuities in their coefficients. A robust multilevel iterative technique, that recursively generates coarse-level problems based on intensity gradient weighted smoothing weights, is employed to estimate discontinuous optical flow field. Experimental results are presented to demonstrate the efficacy of the proposed technique 相似文献
11.
XML is a flexible and powerful tool that enables information and security sharing in heterogeneous environments. Scalable
technologies are needed to effectively manage the growing volumes of XML data. A wide variety of methods exist for storing
and searching XML data; the two most common techniques are conventional tree-based and relational approaches. Tree-based approaches
represent XML as a tree and use indexes and path join algorithms to process queries. In contrast, the relational approach
utilizes the power of a mature relational database to store and search XML. This method relationally maps XML queries to SQL
and reconstructs the XML from the database results. To date, the limited acceptance of the relational approach to XML processing
is due to the need to redesign the relational schema each time a new XML hierarchy is defined. We, in contrast, describe a
relational approach that is fixed schema eliminating the need for schema redesign at the expense of potentially longer runtimes.
We show, however, that these potentially longer runtimes are still significantly shorter than those of the tree approach.
We use a popular XML benchmark to compare the scalability of both approaches. We generated large collections of heterogeneous
XML documents ranging in size from 500 MB to 8 GB using the XBench benchmark. The scalability of each method was measured
by running XML queries that cover a wide range of XML search features on each collection. We measure the scalability of each
method over different query features as the collection size increases. In addition, we examine the performance of each method
as the result size and the number of predicates increase. Our results show that our relational approach provides a scalable
approach to XML retrieval by leveraging existing relational database optimizations. Furthermore, we show that the relational
approach typically outperforms the tree-based approach while scaling consistently over all collections studied.
相似文献
12.
We introduce the notion of XML Stream Attribute Grammars (XSAGs). XSAGs are the first scalable query language for XML streams
(running strictly in linear time with bounded memory consumption independent of the size of the stream) that allows for actual
data transformations rather than just document filtering. XSAGs are also relatively easy to use for humans. Moreover, the
XSAG formalism provides a strong intuition for which queries can or cannot be processed scalably on streams. We introduce
XSAGs together with the necessary language-theoretic machinery, study their theoretical properties such as expressiveness
and complexity, and discuss their implementation. 相似文献
13.
XML stream applications bring the challenge of efficiently processing queries on sequentially accessible token-based data streams. The automata paradigm is naturally suited for pattern recognition on tokenized XML streams, but requires patches for fulfilling the filtering or restructuring functionalities in the XML query language. In contrast, the algebraic paradigm is a well-established technique for processing self-contained tuples. It however does not traditionally support token inputs. The Raindrop framework is the first to accommodate these two paradigms within one algebraic framework, taking advantage of both. This paper describes the overall framework, highlighting in particular three aspects. First, we describe how the tokens and tuples are modeled in one uniform query processing model. Second, we present the query rewriting that switches computations between these two data models. Third, we discuss strategies for the implementation and synchronization of the operators within the framework. We report experimental results that illustrate the unique optimization opportunities offered by this novel framework. 相似文献
14.
This paper presents the scalable on-line execution (SOLE) algorithm for continuous and on-line evaluation of concurrent continuous spatio-temporal queries over data streams.
Incoming spatio-temporal data streams are processed in-memory against a set of outstanding continuous queries. The SOLE algorithm
utilizes the scarce memory resource efficiently by keeping track of only the significant objects. In-memory stored objects are expired (i.e., dropped) from memory once they become insignificant. SOLE is a scalable algorithm where all the continuous outstanding queries share the same buffer pool. In addition, SOLE
is presented as a spatio-temporal join between two input streams, a stream of spatio-temporal objects and a stream of spatio-temporal
queries. To cope with intervals of high arrival rates of objects and/or queries, SOLE utilizes a load-shedding approach where some of the stored objects are dropped from memory. SOLE is implemented as a pipelined query operator that
can be combined with traditional query operators in a query execution plan to support a wide variety of continuous queries.
Performance experiments based on a real implementation of SOLE inside a prototype of a data stream management system show
the scalability and efficiency of SOLE in highly dynamic environments.
This work was supported in part by the National Science Foundation under Grants IIS-0093116, IIS-0209120, and 0010044-CCR. 相似文献
15.
This paper presents Araneola (Araneola means “little spider” in Latin.), a scalable reliable application-level multicast system for highly dynamic wide-area environments. Araneola supports multi-point to multi-point reliable communication in a fully distributed manner, while incurring constant load (in terms of message and space complexity) on each node. For a tunable parameter k≥3, Araneola constructs and dynamically maintains a basic overlay structure in which each node’s degree is either k or k+1, and roughly 90% of the nodes have degree k. Empirical evaluation shows that Araneola’s basic overlay achieves three important mathematical properties of k-regular random graphs (i.e., random graphs in which each node has exactly k neighbors) with N nodes: (i) its diameter grows logarithmically with N; (ii) it is generally k-connected; and (iii) it remains highly connected following random removal of linear-size subsets of edges or nodes. The overlay is constructed and maintained at a low cost: each join, leave, or failure is handled locally, and entails the sending of only about 3 k messages in total, independent of N. Moreover, this cost decreases as the churn rate increases.The low degree of Araneola’s basic overlay structure allows for allocating plenty of additional bandwidth for specific application needs. In this paper, we give an example for such a need — communicating with nearby nodes; we enhance the basic overlay with additional links chosen according to geographic proximity and available bandwidth. We show that this approach, i.e., a combination of random and nearby links, reduces the number of physical hops messages traverse without hurting the overlay’s robustness, as compared with completely random Araneola overlays (in which all the links are random) with the same average node degree.Given Araneola’s overlay, we sketch out several message dissemination techniques that can be implemented on top of this overlay. We present a full implementation and evaluation of a gossip-based multicast scheme, with up to 10,000 nodes. We show that compared with a (non-overlay-based) gossip-based multicast protocol, gossiping over Araneola achieves substantial improvements in load, reliability, and latency. 相似文献
16.
Optimizing queries using materialized views has not been addressed adequately in the context of XML due to the many limitations associated with the definition and usability of materialized views in traditional XML query evaluation models. 相似文献
17.
The paper proposes a preprocessing scheme for efficient processing of XML queries in XML-based information retrieval systems. For the preprocessing, we use a signature-based approach. In the conventional (flat document-based) information retrieval systems, user queries consist of keywords and boolean operators, and thus signatures are structured in a flat manner. However, in XML-based information retrieval systems, the user queries have the form of path queries. Therefore, the flat signature cannot be effective for XML documents. In the paper, we propose two structured signature methods for XML documents. Through experiments, we evaluate the performance of the proposed methods. 相似文献
18.
The ever-increasing size of data emanating from mobile devices and sensors, dictates the use of distributed systems for storing and querying these data. Typically, such data sources provide some spatio-temporal information, alongside other useful data. The RDF data model can be used to interlink and exchange data originating from heterogeneous sources in a uniform manner. For example, consider the case where vessels report their spatio-temporal position, on a regular basis, by using various surveillance systems. In this scenario, a user might be interested to know which vessels were moving in a specific area for a given temporal range. In this paper, we address the problem of efficiently storing and querying spatio-temporal RDF data in parallel. We specifically study the case of SPARQL queries with spatio-temporal constraints, by proposing the DiStRDF system, which is comprised of a Storage and a Processing Layer. The DiStRDF Storage Layer is responsible for efficiently storing large amount of historical spatio-temporal RDF data of moving objects. On top of it, we devise our DiStRDF Processing Layer, which parses a SPARQL query and produces corresponding logical and physical execution plans. We use Spark, a well-known distributed in-memory processing framework, as the underlying processing engine. Our experimental evaluation, on real data from both aviation and maritime domains, demonstrates the efficiency of our DiStRDF system, when using various spatio-temporal range constraints. 相似文献
19.
We have established a preprocessing method for determining the meaningfulness of a table to allow for information extraction from tables on the Internet. A table offers a preeminent clue in text mining because it contains meaningful data displayed in rows and columns. However, tables are used on the Internet for both knowledge structuring and document design. Therefore, we were interested in determining whether or not a table has meaningfulness that is related to the structural information provided at the abstraction level of the table head. Accordingly, we: 1) investigated the types of tables present in HTML documents, 2) established the features that distinguished meaningful tables from others, 3) constructed a training data set using the established features after having filtered any obvious decorative tables, and 4) constructed a classification model using a decision tree. Based on these features, we set up heuristics for table head extraction from meaningful tables, and obtained an F-measure of 95.0 percent in distinguishing meaningful tables from decorative tables and an accuracy of 82.1 percent in extracting the table head from the meaningful tables. 相似文献
20.
Advances in Cloud computing technology and the availability of affordable and easy to use Cloud services are enabling a multitude of scientific applications to use these resources as primary or secondary computing infrastructure. The urban and built environment research domain is one area that can benefit greatly from Cloud computing. The global population growth and increase in the size and population of cities raise many challenges for governments, planners and researchers alike. The Australian Urban Research Infrastructure Network (AURIN— http://www.aurin.org.au) project has been tasked with developing an advanced platform (e-Infrastructure) across Australia to tackle these challenges. The platform leverages large-scale Cloud resources to provide federated data access to, at present over 1100 data sets from major and often definitive government and industry data-rich organisations, and for scalable data processing and visualisation. The original AURIN tools were developed using the object modelling system (OMS) and supported integrated workflows to define and enact/re-enact scientific processes. More recently the work has evolved to focus more on delivery of a workbench offering a rich range of tools delivered through an extensible workflow environment. In this paper, we provide the background to AURIN including the scientific drivers that are shaping the work and the realisation of the Cloud-based AURIN environment. We focus in particular on the workflow environment and show how it seamlessly utilizes the Cloud for urban research processes focused especially on data-intensive spatial analysis. We illustrate the utilisation of this workflow environment across a range of case studies reflecting urban research activities. 相似文献
|