共查询到20条相似文献,搜索用时 15 毫秒
1.
angular velocity control dynamic system guides the agent's direction angle, while another dynamic system selects the environmental input that will
be used in the control system. The agent interacts with the environment through its knowledge of the position of stationary
and moving objects. In our system agents automatically avoid stationary and moving obstacles to reach the desired target(s).
This approach allows us to prove the stability conditions that result in a principled methodology for the computation of the
system's dynamic parameters. We present a variety of real-time simulations that illustrate the power of our approach. 相似文献
2.
Efficient similarity search for market basket data 总被引:2,自引:0,他引:2
Alexandros Nanopoulos Yannis Manolopoulos 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(2):138-152
Several organizations have developed very large market basket databases for the maintenance of customer transactions. New
applications, e.g., Web recommendation systems, present the requirement for processing similarity queries in market basket
databases. In this paper, we propose a novel scheme for similarity search queries in basket data. We develop a new representation
method, which, in contrast to existing approaches, is proven to provide correct results. New algorithms are proposed for the
processing of similarity queries. Extensive experimental results, for a variety of factors, illustrate the superiority of
the proposed scheme over the state-of-the-art method.
Edited by R. Ng. Received: August 6, 2001 / Accepted: May 21, 2002 Published online: September 25, 2002 相似文献
3.
Flip Korn Alexandros Labrinidis Yannis Kotidis Christos Faloutsos 《The VLDB Journal The International Journal on Very Large Data Bases》2000,8(3-4):254-266
Association Rule Mining algorithms operate on a data matrix (e.g., customers products) to derive association rules [AIS93b, SA96]. We propose a new paradigm, namely, Ratio Rules, which are quantifiable in that we can measure the “goodness” of a set of discovered rules. We also propose the “guessing
error” as a measure of the “goodness”, that is, the root-mean-square error of the reconstructed values of the cells of the
given matrix, when we pretend that they are unknown. Another contribution is a novel method to guess missing/hidden values
from the Ratio Rules that our method derives. For example, if somebody bought $10 of milk and $3 of bread, our rules can “guess”
the amount spent on butter. Thus, unlike association rules, Ratio Rules can perform a variety of important tasks such as forecasting,
answering “what-if” scenarios, detecting outliers, and visualizing the data. Moreover, we show that we can compute Ratio Rules
in a single pass over the data set with small memory requirements (a few small matrices), in contrast to association rule mining methods
which require multiple passes and/or large memory. Experiments on several real data sets (e.g., basketball and baseball statistics,
biological data) demonstrate that the proposed method: (a) leads to rules that make sense; (b) can find large itemsets in
binary matrices, even in the presence of noise; and (c) consistently achieves a “guessing error” of up to 5 times less than
using straightforward column averages.
Received: March 15, 1999 / Accepted: November 1, 1999 相似文献
4.
UnQL: a query language and algebra for semistructured data based on structural recursion 总被引:5,自引:0,他引:5
Peter Buneman Mary Fernandez Dan Suciu 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(1):76-110
Abstract. This paper presents structural recursion as the basis of the syntax and semantics of query languages for semistructured data
and XML. We describe a simple and powerful query language based on pattern matching and show that it can be expressed using
structural recursion, which is introduced as a top-down, recursive function, similar to the way XSL is defined on XML trees.
On cyclic data, structural recursion can be defined in two equivalent ways: as a recursive function which evaluates the data
top-down and remembers all its calls to avoid infinite loops, or as a bulk evaluation which processes the entire data in parallel
using only traditional relational algebra operators. The latter makes it possible for optimization techniques in relational
queries to be applied to structural recursion. We show that the composition of two structural recursion queries can be expressed
as a single such query, and this is used as the basis of an optimization method for mediator systems. Several other formal
properties are established: structural recursion can be expressed in first-order logic extended with transitive closure; its
data complexity is PTIME; and over relational data it is a conservative extension of the relational calculus. The underlying
data model is based on value equality, formally defined with bisimulation. Structural recursion is shown to be invariant with
respect to value equality.
Received: July 9, 1999 / Accepted: December 24, 1999 相似文献
5.
Yasushi Sakurai Masatoshi Yoshikawa Shunsuke Uemura Haruhiko Kojima 《The VLDB Journal The International Journal on Very Large Data Bases》2002,11(2):93-108
We propose a novel index structure, the A-tree (approximation tree), for similarity searches in high-dimensional data. The
basic idea of the A-tree is the introduction of virtual bounding rectangles (VBRs) which contain and approximate MBRs or data
objects. VBRs can be represented quite compactly and thus affect the tree configuration both quantitatively and qualitatively.
First, since tree nodes can contain a large number of VBR entries, fanout becomes large, which increases search speed. More
importantly, we have a free hand in arranging MBRs and VBRs in the tree nodes. Each A-tree node contains an MBR and its children
VBRs. Therefore, by fetching an A-tree node, we can obtain information on the exact position of a parent MBR and the approximate
position of its children. We have performed experiments using both synthetic and real data sets. For the real data sets, the
A-tree outperforms the SR-tree and the VA-file in all dimensionalities up to 64 dimensions, which is the highest dimension
in our experiments. Additionally, we propose a cost model for the A-tree. We verify the validity of the cost model for synthetic
and real data sets.
Edited by T. Sellis. Received: December 8, 2000 / Accepted: March 20, 2002 Published online: September 25, 2002 相似文献
6.
7.
Distance-based outliers: algorithms and applications 总被引:20,自引:0,他引:20
Edwin M. Knorr Raymond T. Ng Vladimir Tucakov 《The VLDB Journal The International Journal on Very Large Data Bases》2000,8(3-4):237-253
This paper deals with finding outliers (exceptions) in large, multidimensional datasets. The identification of outliers can
lead to the discovery of truly unexpected knowledge in areas such as electronic commerce, credit card fraud, and even the
analysis of performance statistics of professional athletes. Existing methods that we have seen for finding outliers can only
deal efficiently with two dimensions/attributes of a dataset. In this paper, we study the notion of DB (distance-based) outliers. Specifically, we show that (i) outlier detection can be done efficiently for large datasets, and for k-dimensional datasets with large values of k (e.g., ); and (ii), outlier detection is a meaningful and important knowledge discovery task.
First, we present two simple algorithms, both having a complexity of , k being the dimensionality and N being the number of objects in the dataset. These algorithms readily support datasets with many more than two attributes.
Second, we present an optimized cell-based algorithm that has a complexity that is linear with respect to N, but exponential with respect to k. We provide experimental results indicating that this algorithm significantly outperforms the two simple algorithms for . Third, for datasets that are mainly disk-resident, we present another version of the cell-based algorithm that guarantees
at most three passes over a dataset. Again, experimental results show that this algorithm is by far the best for . Finally, we discuss our work on three real-life applications, including one on spatio-temporal data (e.g., a video surveillance
application), in order to confirm the relevance and broad applicability of DB outliers.
Received February 15, 1999 / Accepted August 1, 1999 相似文献
8.
Clustering categorical data sets using tabu search techniques 总被引:2,自引:0,他引:2
Clustering methods partition a set of objects into clusters such that objects in the same cluster are more similar to each other than objects in different clusters according to some defined criteria. The fuzzy k-means-type algorithm is best suited for implementing this clustering operation because of its effectiveness in clustering data sets. However, working only on numeric values limits its use because data sets often contain categorical values. In this paper, we present a tabu search based clustering algorithm, to extend the k-means paradigm to categorical domains, and domains with both numeric and categorical values. Using tabu search based techniques, our algorithm can explore the solution space beyond local optimality in order to aim at finding a global solution of the fuzzy clustering problem. It is found that the clustering results produced by the proposed algorithm are very high in accuracy. 相似文献
9.
Adaptive piggybacking: a novel technique for data sharing in video-on-demand storage servers 总被引:17,自引:0,他引:17
Recent technology advances have made multimedia on-demand services, such as home entertainment and home-shopping, important
to the consumer market. One of the most challenging aspects of this type of service is providing access either instantaneously
or within a small and reasonable latency upon request. We consider improvements in the performance of multimedia storage servers
through data sharing between requests for popular objects, assuming that the I/O bandwidth is the critical resource in the system. We discuss a novel approach to data sharing,
termed adaptive piggybacking, which can be used to reduce the aggregate I/O demand on the multimedia storage server and thus
reduce latency for servicing new requests. 相似文献
10.
Peter Muth Patrick O'Neil Achim Pick Gerhard Weikum 《The VLDB Journal The International Journal on Very Large Data Bases》2000,8(3-4):199-221
Numerous applications such as stock market or medical information systems require that both historical and current data be
logically integrated into a temporal database. The underlying access method must support different forms of “time-travel”
queries, the migration of old record versions onto inexpensive archive media, and high insertion and update rates. This paper
presents an access method for transaction-time temporal data, called the log-structured history data access method (LHAM)
that meets these demands. The basic principle of LHAM is to partition the data into successive components based on the timestamps
of the record versions. Components are assigned to different levels of a storage hierarchy, and incoming data is continuously
migrated through the hierarchy. The paper discusses the LHAM concepts, including concurrency control and recovery, our full-fledged
LHAM implementation, and experimental performance results based on this implementation. A detailed comparison with the TSB-tree,
both analytically and based on experiments with real implementations, shows that LHAM is highly superior in terms of insert
performance, while query performance is in almost all cases at least as good as for the TSB-tree; in many cases it is much
better.
Received March 4, 1999 / Accepted September 28, 1999 相似文献
11.
Yueh-Min Huang Jen-Wen Ding Shiao-Li Tsao 《The VLDB Journal The International Journal on Very Large Data Bases》1999,8(1):44-54
To provide high accessibility of continuous-media (CM) data, CM servers generally stripe data across multiple disks. Currently,
the most widely used striping scheme for CM data is round-robin permutation (RRP). Unfortunately, when RRP is applied to variable-bit-rate
(VBR) CM data, load imbalance across multiple disks occurs, thereby reducing overall system performance. In this paper, the
performance of a VBR CM server with RRP is analyzed. In addition, we propose an efficient striping scheme called constant
time permutation (CTP), which takes the VBR characteristic into account and obtains a more balanced load than RRP. Analytic
models of both RRP and CTP are presented, and the models are verified via trace-driven simulations. Analysis and simulation
results show that CTP can substantially increase the number of clients supported, though it might introduce a few seconds/minutes
of initial delay.
Received June 9, 1998 / Accepted January 21, 1999 相似文献
12.
Clustering consists in partitioning a set of objects into disjoint and homogeneous clusters. For many years, clustering methods have been applied in a wide variety of disciplines and they also have been utilized in many scientific areas. Traditionally, clustering methods deal with numerical data, i.e. objects represented by a conjunction of numerical attribute values. However, nowadays commercial or scientific databases usually contain categorical data, i.e. objects represented by categorical attributes. In this paper we present a dissimilarity measure which is capable to deal with tree structured categorical data. Thus, it can be used for extending the various versions of the very popular k-means clustering algorithm to deal with such data. We discuss how such an extension can be achieved. Moreover, we empirically prove that the proposed dissimilarity measure is accurate, compared to other well-known (dis)similarity measures for categorical data. 相似文献
13.
Analysis of navigation behaviour in web sites integrating multiple information systems 总被引:6,自引:0,他引:6
Bettina Berendt Myra Spiliopoulou 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(1):56-75
Abstract. The analysis of web usage has mostly focused on sites composed of conventional static pages. However, huge amounts of information
available in the web come from databases or other data collections and are presented to the users in the form of dynamically
generated pages. The query interfaces of such sites allow the specification of many search criteria. Their generated results
support navigation to pages of results combining cross-linked data from many sources. For the analysis of visitor navigation
behaviour in such web sites, we propose the web usage miner (WUM), which discovers navigation patterns subject to advanced
statistical and structural constraints. Since our objective is the discovery of interesting navigation patterns, we do not
focus on accesses to individual pages. Instead, we construct conceptual hierarchies that reflect the query capabilities used
in the production of those pages. Our experiments with a real web site that integrates data from multiple databases, the German
SchulWeb, demonstrate the appropriateness of WUM in discovering navigation patterns and show how those discoveries can help
in assessing and improving the quality of the site.
Received June 21, 1999 / Accepted December 24, 1999 相似文献
14.
Oscar Díaz Arturo Jaime 《The VLDB Journal The International Journal on Very Large Data Bases》1997,6(4):282-295
Active database management systems (DBMSs) are a fast-growing area of research, mainly due to the large number of applications
which can benefit from this active dimension. These applications are far from being homogeneous, requiring different kinds
of functionalities. However, most of the active DBMSs described in the literature only provide a fixed, hard-wired execution model to support the active dimension. In object-oriented DBMSs, event-condition-action rules have been propo
sed for providing active behaviour. This paper presents EXACT, a rule manager for object-oriented DBMSs which provides a variety
of options from which the designer can choose the one that best fits the semantics of the concept to be supported by rules.
Due to the difficulty of foreseeing future requirements, special attention has been paid to making rule management easily
extensible, so that the user can tailor it to suit specific applications. This has been borne out by an implementation in
ADAM, an object
-oriented DBMS. An example is shown of how the default mechanism can be easily extended to support new requirements.
Edited by Y. Vassiliou. Received May 26, 1994 / Revised January 26, 1995, June 22, 1996 / Accepted November 4, 1996 相似文献
15.
Summary. We set out a modal logic for reasoning about multilevel security of probabilistic systems. This logic contains expressions
for time, probability, and knowledge. Making use of the Halpern-Tuttle framework for reasoning about knowledge and probability,
we give a semantics for our logic and prove it is sound. We give two syntactic definitions of perfect multilevel security
and show that their semantic interpretations are equivalent to earlier, independently motivated characterizations. We also
discuss the relation between these characterizations of security and between their usefulness in security analysis. 相似文献
16.
17.
18.
In this work a visual-based autonomous system capable of memorizing and recalling sensory-motor associations is presented.
The robot's behaviors are based on learned associations between its sensory inputs and its motor actions. Perception is divided
into two stages. The first one is functional: algorithmic procedures extract in real time visual features such as disparity
and local orientation from the input images. The second stage is mnemonic: the features produced by the different functional
areas are integrated with motor information and memorized or recalled. An efficient memory organization and fast information
retrieval enables the robot to learn to navigate and to avoid obstacles without need of an internal metric reconstruction
of the external environment.
Received: 22 November 1996 / Accepted: 18 November 1997 相似文献
19.
In this paper, we present a novel approach for multimedia data indexing and retrieval that is machine independent and highly
flexible for sharing multimedia data across applications. Traditional multimedia data indexing and retrieval problems have
been attacked using the central data server as the main focus, and most of the indexing and query-processing for retrieval
are highly application dependent. This precludes the use of created indices and query processing mechanisms for multimedia
data which, in general, have a wide variety of uses across applications. The approach proposed in this paper addresses three
issues: 1. multimedia data indexing; 2. inference or query processing; and 3. combining indices and inference or query mechanism
with the data to facilitate machine independence in retrieval and query processing. We emphasize the third issue, as typically
multimedia data are huge in size and requires intra-data indexing. We describe how the proposed approach addresses various
problems faced by the application developers in indexing and retrieval of multimedia data. Finally, we present two applications
developed based on the proposed approach: video indexing; and video content authorization for presentation. 相似文献
20.
Integration – supporting multiple application classes with heterogeneous performance requirements – is an emerging trend
in networks, file systems, and operating systems. We evaluate two architectural alternatives – partitioned and integrated
– for designing next-generation file systems. Whereas a partitioned server employs a separate file system for each application
class, an integrated file server multiplexes its resources among all application classes; we evaluate the performance of the
two architectures with respect to sharing of disk bandwidth among the application classes. We show that although the problem
of sharing disk bandwidth in integrated file systems is conceptually similar to that of sharing network link bandwidth in
integrated services networks, the arguments that demonstrate the superiority of integrated services networks over separate
networks are not applicable to file systems. Furthermore, we show that: an integrated server outperforms the partitioned server
in a large operating region and has slightly worse performance in the remaining region; the capacity of an integrated server
is larger than that of the partitioned server; and an integrated server outperforms the partitioned server by a factor of
up to 6 in the presence of bursty workloads. 相似文献