共查询到20条相似文献,搜索用时 10 毫秒
1.
Zengyou He Xiaofei Xu Joshua Zhexue Huang Shengchun Deng 《Expert systems with applications》2004,27(4):681-697
Outliers, or commonly referred to as exceptional cases, exist in many real-world databases. Detection of such outliers is important for many applications and has attracted much attention from the data mining research community recently. However, most existing methods are designed for mining outliers from a single dataset without considering the class labels of data objects. In this paper, we consider the class outlier detection problem ‘given a set of observations with class labels, find those that arouse suspicions, taking into account the class labels’. By generalizing two pioneer contributions [Proc WAIM02 (2002); Proc SSTD03] in this field, we develop the notion of class outlier and propose practical solutions by extending existing outlier detection algorithms to this case. Furthermore, its potential applications in CRM (customer relationship management) are also discussed. Finally, the experiments in real datasets show that our method can find interesting outliers and is of practical use. 相似文献
2.
Anat Eyal Tova Milo 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(1):16-38
A broad spectrum of electronic commerce applications is currently available on the Web, providing services in almost any
area one can think of. As the number and variety of such applications grow, more business opportunities emerge for providing
new services based on the integration and customization of existing applications. (Web shopping malls and support for comparative
shopping are just a couple of examples.) Unfortunately, the diversity of applications in each specific domain and the disparity
of interfaces, application flows, actor roles in the business transaction, and data formats, renders the integration and manipulation
of applications a rather difficult task. In this paper we present the Application Manifold system, aimed at simplifying the intricate task of integration and customization of e-commerce applications. The scope of
the work in this paper is limited to web-enabled e-commerce applications. We do not support the integration/customization
of proprietary/legacy applications. The wrapping of such applications as web services is complementary to our work. Based
on the emerging Web data standard, XML, and application modeling standard, UML, the system offers a novel declarative specification
language for describing the integration/customization task, supporting a modular approach where new applications can be added
and integrated at will with minimal effort. Then, acting as an application generator, the system generates a full integrated/customized
e-commerce application, with the declarativity of the specification allowing for the optimization and verification of the
generated application. The integration here deals with the full profile of the given e-commerce applications: the various
services offered by the applications, the activities and roles of the different actors participating in the application (e.g.,
customers, vendors), the application flow, as well as with the data involved in the process. This is in contrast to previous
works on Web data integration that focused primarily on querying the data available in the applications, mostly ignoring the
additional aspects mentioned above.
Received: 30 October 2000 / Accepted 14 March 2001 Published online: 2 August 2001 相似文献
3.
In packet audio applications, packets are buffered at a receiving site and their playout delayed in order to compensate for
variable network delays. In this paper, we consider the problem of adaptively adjusting the playout delay in order to keep
this delay as small as possible, while at the same time avoiding excessive “loss” due to the arrival of packets at the receiver
after their playout time has already passed. The contributions of this paper are twofold. First, given a trace of packet audio
receptions at a receiver, we present efficient algorithms for computing a bound on the achievable performance of any playout delay adjustment algorithm. More precisely, we compute upper and lower bounds (which are shown to be tight for the
range of loss and delay values of interest) on the optimum (minimum) average playout delay for a given number of packet losses
(due to late arrivals) at the receiver for that trace. Second, we present a new adaptive delay adjustment algorithm that tracks
the network delay of recently received packets and efficiently maintains delay percentile information. This information, together
with a “delay spike” detection algorithm based on (but extending) our earlier work, is used to dynamically adjust talkspurt
playout delay. We show that this algorithm outperforms existing delay adjustment algorithms over a number of measured audio
delay traces and performs close to the theoretical optimum over a range of parameter values of interest. 相似文献
4.
Summary. Long-lived and adaptive implementations of mutual exclusion and renaming in the read/write shared memory model are presented.
An implementation of a task is adaptive if the step complexity of any operation in the implementation is a function of the number of processes that take steps concurrently
with the operation. The renaming algorithm assigns a new unique id in the range to any process whose initial unique name is taken from a set of size N, for an arbitrary N and where k is the number of processes that actually take steps or hold a name while the new name is being acquired. The step complexity
of acquiring a new name is , while the step complexity of releasing a name is 1. The space complexity of the algorithm is where n is an upper bound on the number of processes that may be active at the same time (acquiring or holding new names), which
could be N in the worst case.
Both the system response time and the worst case number of operations per process in the presented mutual-exclusion algorithm
are adaptive. Both algorithms rely on the basic building block of a long-lived and adaptive splitter. While the adaptive-splitter
satisfies a slightly different set of properties than the Moir-Anderson splitter [MA95], it is adaptive and long-lived. In
addition, the new splitter properties enable the construction of a non-blocking long-lived (2k-1)-renaming algorithm (which is optimal in the size of the new name space). We believe that the mechanisms introduced in
our splitter implementation are interesting on their own, and might be used in other adaptive and long-lived constructions.
Received: March 2000 / Accepted July 2001 相似文献
5.
David Gibson Jon Kleinberg Prabhakar Raghavan 《The VLDB Journal The International Journal on Very Large Data Bases》2000,8(3-4):222-236
We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical
data. By “categorical data,” we mean tables with fields that cannot be naturally ordered by a metric – e.g., the names of
producers of automobiles, or the names of products offered by a manufacturer. Our approach is based on an iterative method
for assigning and propagating weights on the categorical values in a table; this facilitates a type of similarity measure
arising from the co-occurrence of values in the dataset. Our techniques can be studied analytically in terms of certain types
of non-linear dynamical systems.
Received February 15, 1999 / Accepted August 15, 1999 相似文献
6.
Surface approximation to scanned data 总被引:6,自引:0,他引:6
A method to approximate scanned data points with a B-spline surface is presented. The data are assumed to be organized in
the form of Q
i,j, i=0,…,n; j=0,…,m
i, i.e., in a row-wise fashion. The method produces a C
(p-1, q-1) continuous surface (p and q are the required degrees) that does not deviate from the data by more than a user-specified tolerance. The parametrization
of the surface is not affected negatively by the distribution of the points in each row, and it can be influenced by a user-supplied
knot vector. 相似文献
7.
8.
Zero-suppressed BDDs and their applications 总被引:2,自引:0,他引:2
Shin-ichi Minato 《International Journal on Software Tools for Technology Transfer (STTT)》2001,3(2):156-170
In many real-life problems, we are often faced with manipulating sets of combinations. In this article, we study a special
type of ordered binary decision diagram (OBDD), called zero-suppressed BDDs (ZBDDs). This data structure represents sets of
combinations more efficiently than using original OBDDs. We discuss the basic data structures and algorithms for manipulating
ZBDDs in contrast with the original OBDDs. We also present some practical applications of ZBDDs, such as solving combinatorial
problems with unate cube set algebra, logic synthesis methods, Petri net processing, etc. We show that a ZBDD is a useful
option in OBDD techniques, suitable for a part of the practical applications.
Published online: 15 May 2001 相似文献
9.
In this paper, we investigate a playout scheduling framework for supporting the continuous and synchronized presentations
of multimedia streams in a distributed multimedia presentation system. We assume a situation in which the server and network
transmissions provide sufficient support for the delivery of media objects. In this context, major issues regarding the enforcement
of the smooth presentation of multimedia streams at client sites must be addressed to deal with rate variance of stream presentations
and delay variance of networks. We develop various playout-scheduling algorithms that are adaptable to quality-of-service
parameters. The proposed algorithms permit the local adjustment of unsynchronized presentations by gradually accelerating
or retarding presentation components, rather than abruptly skipping or pausing the presentation materials. A comprehensive
experimental analysis of the proposed algorithms demonstrates that our algorithms can effectively avoid playout gaps (or hiccups)
in the presentations. This scheduling framework can be readily used to support customized multimedia presentations. 相似文献
10.
Flip Korn Alexandros Labrinidis Yannis Kotidis Christos Faloutsos 《The VLDB Journal The International Journal on Very Large Data Bases》2000,8(3-4):254-266
Association Rule Mining algorithms operate on a data matrix (e.g., customers products) to derive association rules [AIS93b, SA96]. We propose a new paradigm, namely, Ratio Rules, which are quantifiable in that we can measure the “goodness” of a set of discovered rules. We also propose the “guessing
error” as a measure of the “goodness”, that is, the root-mean-square error of the reconstructed values of the cells of the
given matrix, when we pretend that they are unknown. Another contribution is a novel method to guess missing/hidden values
from the Ratio Rules that our method derives. For example, if somebody bought $10 of milk and $3 of bread, our rules can “guess”
the amount spent on butter. Thus, unlike association rules, Ratio Rules can perform a variety of important tasks such as forecasting,
answering “what-if” scenarios, detecting outliers, and visualizing the data. Moreover, we show that we can compute Ratio Rules
in a single pass over the data set with small memory requirements (a few small matrices), in contrast to association rule mining methods
which require multiple passes and/or large memory. Experiments on several real data sets (e.g., basketball and baseball statistics,
biological data) demonstrate that the proposed method: (a) leads to rules that make sense; (b) can find large itemsets in
binary matrices, even in the presence of noise; and (c) consistently achieves a “guessing error” of up to 5 times less than
using straightforward column averages.
Received: March 15, 1999 / Accepted: November 1, 1999 相似文献
11.
Jeffrey A. Fayman Oded Sudarsky Ehud Rivlin Michael Rudzsky 《Machine Vision and Applications》2001,13(1):25-37
We present a new active vision technique called zoom tracking. Zoom tracking is the continuous adjustment of a camera's focal
length in order to keep a constant-sized image of an object moving along the camera's optical axis. Two methods for performing
zoom tracking are presented: a closed-loop visual feedback algorithm based on optical flow, and use of depth information obtained
from an autofocus camera's range sensor. We explore two uses of zoom tracking: recovery of depth information and improving
the performance of scale-variant algorithms. We show that the image stability provided by zoom tracking improves the performance
of algorithms that are scale variant, such as correlation-based trackers. While zoom tracking cannot totally compensate for
an object's motion, due to the effect of perspective distortion, an analysis of this distortion provides a quantitative estimate
of the performance of zoom tracking. Zoom tracking can be used to reconstruct a depth map of the tracked object. We show that
under normal circumstances this reconstruction is much more accurate than depth from zooming, and works over a greater range
than depth from axial motion while providing, in the worst case, only slightly less accurate results. Finally, we show how
zoom tracking can also be used in time-to-contact calculations.
Received: 15 February 2000 / Accepted: 19 June 2000 相似文献
12.
Computing systems are essential resources for both the business and public sectors. With the increasing interdependence of
integrated electronic commerce and business applications within the global computing environment, performance and reliability
are of great concern. Poor performance can mean lost cooperation, opportunity, and revenue. This paper describes performance
challenges that these applications face over the short and long term. We present an analytic technique that can predict the
performance of an e-commerce application over a given deployment period. This technique can be used to deduce performance
stress testing vectors over this period and for design and capacity planning exercises. A Web-based shopping server case study
is used as an example.
Published online: 22 August 2001 相似文献
13.
Summary. In a shared-memory distributed system, n independent asynchronous processes communicate by reading and writing to shared variables. An algorithm is adaptive (to total contention) if its step complexity depends only on the actual number, k, of active processes in the execution; this number is unknown in advance and may change in different executions of the algorithm.
Adaptive algorithms are inherently wait-free, providing fault-tolerance in the presence of an arbitrary number of crash failures and different processes' speed.
A wait-free adaptive collect algorithm with O(k) step complexity is presented, together with its applications in wait-free adaptive alg orithms for atomic snapshots, immediate
snapshots and renaming.
Received: August 1999 / Accepted: August 2001 相似文献
14.
J. Hu R.S. Kashi D. Lopresti G.T. Wilfong 《International Journal on Document Analysis and Recognition》2002,4(3):140-153
While techniques for evaluating the performance of lower-level document analysis tasks such as optical character recognition
have gained acceptance in the literature, attempts to formalize the problem for higher-level algorithms, while receiving a
fair amount of attention in terms of theory, have generally been less successful in practice, perhaps owing to their complexity.
In this paper, we introduce intuitive, easy-to-implement evaluation schemes for the related problems of table detection and
table structure recognition. We also present the results of several small experiments, demonstrating how well the methodologies
work and the useful sorts of feedback they provide. We first consider the table detection problem. Here algorithms can yield
various classes of errors, including non-table regions improperly labeled as tables (insertion errors), tables missed completely
(deletion errors), larger tables broken into a number of smaller ones (splitting errors), and groups of smaller tables combined
to form larger ones (merging errors). This leads naturally to the use of an edit distance approach for assessing the results
of table detection. Next we address the problem of evaluating table structure recognition. Our model is based on a directed
acyclic attribute graph, or table DAG. We describe a new paradigm, “graph probing,” for comparing the results returned by
the recognition system and the representation created during ground-truthing. Probing is in fact a general concept that could
be applied to other document recognition tasks as well.
Received July 18, 2000 / Accepted October 4, 2001 相似文献
15.
In this paper, we present two novel disk failure recovery methods that utilize the inherent characteristics of video streams
for efficient recovery. Whereas the first method exploits the inherent redundancy in video streams (rather than error-correcting
codes) to approximately reconstruct data stored on failed disks, the second method exploits the sequentiality of video playback
to reduce the overhead of online failure recovery in conventional RAID arrays. For the former approach, we present loss-resilient
versions of JPEG and MPEG compression algorithms. We present an inherently redundant array of disks (IRAD) architecture that combines these loss-resilient compression algorithms with techniques for efficient placement of video streams
on disk arrays to ensure that on-the-fly recovery does not impose any additional load on the array. Together, they enhance
the scalability of multimedia servers by (1) integrating the recovery process with the decompression of video streams, and
thereby distributing the reconstruction process across the clients; and (2) supporting graceful degradation in the quality
of recovered images with increase in the number of disk failures. We present analytical and experimental results to show that
both schemes significantly reduce the failure recovery overhead in a multimedia server. 相似文献
16.
We describe a collection of algorithms designed to support reliable synchronization and group membership services for distributed
multimedia applications. In particular, we consider those applications that require interactivity, isochronous rendering of
multimedia data, and high reliability. We show that the algorithms we propose (i) provide reliable support for the synchronization
of multimedia data streams, despite the occurrence of possible communication failures, (ii) maintain a consistent view of
the relative group membership of all the nonfaulty application components, (iii) guarantee time-bounded delay of component
failure detection and join, and (iv) meet effectively possible scalability requirements of the applications. 相似文献
17.
Some simple distributed algorithms for sparse networks 总被引:1,自引:0,他引:1
Summary. We give simple, deterministic, distributed algorithms for computing maximal matchings, maximal independent sets and colourings.
We show that edge colourings with at most colours, and maximal matchings can be computed within deterministic rounds, where is the maximum degree of the network. We also show how to find maximal independent sets and -vertex colourings within deterministic rounds. All hidden constants are very small and the algorithms are very simple.
Received: August 2000 / Accepted: November 2000 相似文献
18.
In this paper, we present an efficient global illumination technique, and then we discuss the results of its extensive experimental
validation. The technique is a hybrid of cluster-based hierarchical and progressive radiosity techniques, which does not require
storing links between interacting surfaces and clusters. We tested our technique by applying a multistage validation procedure,
which we designed specifically for global illumination solutions. First, we experimentally validate the algorithm against
analytically derived and measured real-world data to check how calculation speed is traded for lighting simulation accuracy
for various clustering and meshing scenarios. Then we test the algorithm performance and rendering quality by directly comparing
the virtual and real-world images of a complex environment. 相似文献
19.
Sérgio Vale Aguiar Campos Edmund Clarke 《International Journal on Software Tools for Technology Transfer (STTT)》1999,2(3):260-269
The task of checking if a computer system satisfies its timing specifications is extremely important. These systems are often
used in critical applications where failure to meet a deadline can have serious or even fatal consequences. This paper presents
an efficient method for performing this verification task. In the proposed method a real-time system is modeled by a state-transition
graph represented by binary decision diagrams. Efficient symbolic algorithms exhaustively explore the state space to determine
whether the system satisfies a given specification. In addition, our approach computes quantitative timing information such
as minimum and maximum time delays between given events. These results provide insight into the behavior of the system and
assist in the determination of its temporal correctness. The technique evaluates how well the system works or how seriously
it fails, as opposed to only whether it works or not. Based on these techniques a verification tool called Verus has been constructed. It has been used in the verification of several industrial real-time systems such as the robotics system
described below. This demonstrates that the method proposed is efficient enough to be used in real-world designs. The examples
verified show how the information produced can assist in designing more efficient and reliable real-time systems. 相似文献
20.
Rita Cucchiara 《Machine Vision and Applications》1998,11(1):1-6
The paper presents a genetic algorithm for clustering objects in images based on their visual features. In particular, a
novel solution code (named Boolean Matching Code) and a correspondent reproduction operator (the Single Gene Crossover) are defined specifically for clustering and are compared with other standard genetic approaches. The paper describes the
clustering algorithm in detail, in order to show the suitability of the genetic paradigm and underline the importance of
effective tuning of algorithm parameters to the application. The algorithm is evaluated on some test sets and an example of
its application in automated visual inspection is presented.
Received: 6 August 1996 / Accepted: 11 November 1997 相似文献