首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 10 毫秒
1.
Mining class outliers: concepts, algorithms and applications in CRM   总被引:4,自引:0,他引:4  
Outliers, or commonly referred to as exceptional cases, exist in many real-world databases. Detection of such outliers is important for many applications and has attracted much attention from the data mining research community recently. However, most existing methods are designed for mining outliers from a single dataset without considering the class labels of data objects. In this paper, we consider the class outlier detection problem ‘given a set of observations with class labels, find those that arouse suspicions, taking into account the class labels’. By generalizing two pioneer contributions [Proc WAIM02 (2002); Proc SSTD03] in this field, we develop the notion of class outlier and propose practical solutions by extending existing outlier detection algorithms to this case. Furthermore, its potential applications in CRM (customer relationship management) are also discussed. Finally, the experiments in real datasets show that our method can find interesting outliers and is of practical use.  相似文献   

2.
Integrating and customizing heterogeneous e-commerce applications   总被引:4,自引:0,他引:4  
A broad spectrum of electronic commerce applications is currently available on the Web, providing services in almost any area one can think of. As the number and variety of such applications grow, more business opportunities emerge for providing new services based on the integration and customization of existing applications. (Web shopping malls and support for comparative shopping are just a couple of examples.) Unfortunately, the diversity of applications in each specific domain and the disparity of interfaces, application flows, actor roles in the business transaction, and data formats, renders the integration and manipulation of applications a rather difficult task. In this paper we present the Application Manifold system, aimed at simplifying the intricate task of integration and customization of e-commerce applications. The scope of the work in this paper is limited to web-enabled e-commerce applications. We do not support the integration/customization of proprietary/legacy applications. The wrapping of such applications as web services is complementary to our work. Based on the emerging Web data standard, XML, and application modeling standard, UML, the system offers a novel declarative specification language for describing the integration/customization task, supporting a modular approach where new applications can be added and integrated at will with minimal effort. Then, acting as an application generator, the system generates a full integrated/customized e-commerce application, with the declarativity of the specification allowing for the optimization and verification of the generated application. The integration here deals with the full profile of the given e-commerce applications: the various services offered by the applications, the activities and roles of the different actors participating in the application (e.g., customers, vendors), the application flow, as well as with the data involved in the process. This is in contrast to previous works on Web data integration that focused primarily on querying the data available in the applications, mostly ignoring the additional aspects mentioned above. Received: 30 October 2000 / Accepted 14 March 2001 Published online: 2 August 2001  相似文献   

3.
Packet audio playout delay adjustment: performance bounds and algorithms   总被引:6,自引:0,他引:6  
In packet audio applications, packets are buffered at a receiving site and their playout delayed in order to compensate for variable network delays. In this paper, we consider the problem of adaptively adjusting the playout delay in order to keep this delay as small as possible, while at the same time avoiding excessive “loss” due to the arrival of packets at the receiver after their playout time has already passed. The contributions of this paper are twofold. First, given a trace of packet audio receptions at a receiver, we present efficient algorithms for computing a bound on the achievable performance of any playout delay adjustment algorithm. More precisely, we compute upper and lower bounds (which are shown to be tight for the range of loss and delay values of interest) on the optimum (minimum) average playout delay for a given number of packet losses (due to late arrivals) at the receiver for that trace. Second, we present a new adaptive delay adjustment algorithm that tracks the network delay of recently received packets and efficiently maintains delay percentile information. This information, together with a “delay spike” detection algorithm based on (but extending) our earlier work, is used to dynamically adjust talkspurt playout delay. We show that this algorithm outperforms existing delay adjustment algorithms over a number of measured audio delay traces and performs close to the theoretical optimum over a range of parameter values of interest.  相似文献   

4.
Summary. Long-lived and adaptive implementations of mutual exclusion and renaming in the read/write shared memory model are presented. An implementation of a task is adaptive if the step complexity of any operation in the implementation is a function of the number of processes that take steps concurrently with the operation. The renaming algorithm assigns a new unique id in the range to any process whose initial unique name is taken from a set of size N, for an arbitrary N and where k is the number of processes that actually take steps or hold a name while the new name is being acquired. The step complexity of acquiring a new name is , while the step complexity of releasing a name is 1. The space complexity of the algorithm is where n is an upper bound on the number of processes that may be active at the same time (acquiring or holding new names), which could be N in the worst case. Both the system response time and the worst case number of operations per process in the presented mutual-exclusion algorithm are adaptive. Both algorithms rely on the basic building block of a long-lived and adaptive splitter. While the adaptive-splitter satisfies a slightly different set of properties than the Moir-Anderson splitter [MA95], it is adaptive and long-lived. In addition, the new splitter properties enable the construction of a non-blocking long-lived (2k-1)-renaming algorithm (which is optimal in the size of the new name space). We believe that the mechanisms introduced in our splitter implementation are interesting on their own, and might be used in other adaptive and long-lived constructions. Received: March 2000 / Accepted July 2001  相似文献   

5.
We describe a novel approach for clustering collections of sets, and its application to the analysis and mining of categorical data. By “categorical data,” we mean tables with fields that cannot be naturally ordered by a metric – e.g., the names of producers of automobiles, or the names of products offered by a manufacturer. Our approach is based on an iterative method for assigning and propagating weights on the categorical values in a table; this facilitates a type of similarity measure arising from the co-occurrence of values in the dataset. Our techniques can be studied analytically in terms of certain types of non-linear dynamical systems. Received February 15, 1999 / Accepted August 15, 1999  相似文献   

6.
Surface approximation to scanned data   总被引:6,自引:0,他引:6  
A method to approximate scanned data points with a B-spline surface is presented. The data are assumed to be organized in the form of Q i,j, i=0,…,n; j=0,…,m i, i.e., in a row-wise fashion. The method produces a C (p-1, q-1) continuous surface (p and q are the required degrees) that does not deviate from the data by more than a user-specified tolerance. The parametrization of the surface is not affected negatively by the distribution of the points in each row, and it can be influenced by a user-supplied knot vector.  相似文献   

7.
8.
Zero-suppressed BDDs and their applications   总被引:2,自引:0,他引:2  
In many real-life problems, we are often faced with manipulating sets of combinations. In this article, we study a special type of ordered binary decision diagram (OBDD), called zero-suppressed BDDs (ZBDDs). This data structure represents sets of combinations more efficiently than using original OBDDs. We discuss the basic data structures and algorithms for manipulating ZBDDs in contrast with the original OBDDs. We also present some practical applications of ZBDDs, such as solving combinatorial problems with unate cube set algebra, logic synthesis methods, Petri net processing, etc. We show that a ZBDD is a useful option in OBDD techniques, suitable for a part of the practical applications. Published online: 15 May 2001  相似文献   

9.
Dynamic playout scheduling algorithms for continuous multimedia streams   总被引:1,自引:0,他引:1  
In this paper, we investigate a playout scheduling framework for supporting the continuous and synchronized presentations of multimedia streams in a distributed multimedia presentation system. We assume a situation in which the server and network transmissions provide sufficient support for the delivery of media objects. In this context, major issues regarding the enforcement of the smooth presentation of multimedia streams at client sites must be addressed to deal with rate variance of stream presentations and delay variance of networks. We develop various playout-scheduling algorithms that are adaptable to quality-of-service parameters. The proposed algorithms permit the local adjustment of unsynchronized presentations by gradually accelerating or retarding presentation components, rather than abruptly skipping or pausing the presentation materials. A comprehensive experimental analysis of the proposed algorithms demonstrates that our algorithms can effectively avoid playout gaps (or hiccups) in the presentations. This scheduling framework can be readily used to support customized multimedia presentations.  相似文献   

10.
Association Rule Mining algorithms operate on a data matrix (e.g., customers products) to derive association rules [AIS93b, SA96]. We propose a new paradigm, namely, Ratio Rules, which are quantifiable in that we can measure the “goodness” of a set of discovered rules. We also propose the “guessing error” as a measure of the “goodness”, that is, the root-mean-square error of the reconstructed values of the cells of the given matrix, when we pretend that they are unknown. Another contribution is a novel method to guess missing/hidden values from the Ratio Rules that our method derives. For example, if somebody bought $10 of milk and $3 of bread, our rules can “guess” the amount spent on butter. Thus, unlike association rules, Ratio Rules can perform a variety of important tasks such as forecasting, answering “what-if” scenarios, detecting outliers, and visualizing the data. Moreover, we show that we can compute Ratio Rules in a single pass over the data set with small memory requirements (a few small matrices), in contrast to association rule mining methods which require multiple passes and/or large memory. Experiments on several real data sets (e.g., basketball and baseball statistics, biological data) demonstrate that the proposed method: (a) leads to rules that make sense; (b) can find large itemsets in binary matrices, even in the presence of noise; and (c) consistently achieves a “guessing error” of up to 5 times less than using straightforward column averages. Received: March 15, 1999 / Accepted: November 1, 1999  相似文献   

11.
We present a new active vision technique called zoom tracking. Zoom tracking is the continuous adjustment of a camera's focal length in order to keep a constant-sized image of an object moving along the camera's optical axis. Two methods for performing zoom tracking are presented: a closed-loop visual feedback algorithm based on optical flow, and use of depth information obtained from an autofocus camera's range sensor. We explore two uses of zoom tracking: recovery of depth information and improving the performance of scale-variant algorithms. We show that the image stability provided by zoom tracking improves the performance of algorithms that are scale variant, such as correlation-based trackers. While zoom tracking cannot totally compensate for an object's motion, due to the effect of perspective distortion, an analysis of this distortion provides a quantitative estimate of the performance of zoom tracking. Zoom tracking can be used to reconstruct a depth map of the tracked object. We show that under normal circumstances this reconstruction is much more accurate than depth from zooming, and works over a greater range than depth from axial motion while providing, in the worst case, only slightly less accurate results. Finally, we show how zoom tracking can also be used in time-to-contact calculations. Received: 15 February 2000 / Accepted: 19 June 2000  相似文献   

12.
Computing systems are essential resources for both the business and public sectors. With the increasing interdependence of integrated electronic commerce and business applications within the global computing environment, performance and reliability are of great concern. Poor performance can mean lost cooperation, opportunity, and revenue. This paper describes performance challenges that these applications face over the short and long term. We present an analytic technique that can predict the performance of an e-commerce application over a given deployment period. This technique can be used to deduce performance stress testing vectors over this period and for design and capacity planning exercises. A Web-based shopping server case study is used as an example. Published online: 22 August 2001  相似文献   

13.
Summary. In a shared-memory distributed system, n independent asynchronous processes communicate by reading and writing to shared variables. An algorithm is adaptive (to total contention) if its step complexity depends only on the actual number, k, of active processes in the execution; this number is unknown in advance and may change in different executions of the algorithm. Adaptive algorithms are inherently wait-free, providing fault-tolerance in the presence of an arbitrary number of crash failures and different processes' speed. A wait-free adaptive collect algorithm with O(k) step complexity is presented, together with its applications in wait-free adaptive alg orithms for atomic snapshots, immediate snapshots and renaming. Received: August 1999 / Accepted: August 2001  相似文献   

14.
While techniques for evaluating the performance of lower-level document analysis tasks such as optical character recognition have gained acceptance in the literature, attempts to formalize the problem for higher-level algorithms, while receiving a fair amount of attention in terms of theory, have generally been less successful in practice, perhaps owing to their complexity. In this paper, we introduce intuitive, easy-to-implement evaluation schemes for the related problems of table detection and table structure recognition. We also present the results of several small experiments, demonstrating how well the methodologies work and the useful sorts of feedback they provide. We first consider the table detection problem. Here algorithms can yield various classes of errors, including non-table regions improperly labeled as tables (insertion errors), tables missed completely (deletion errors), larger tables broken into a number of smaller ones (splitting errors), and groups of smaller tables combined to form larger ones (merging errors). This leads naturally to the use of an edit distance approach for assessing the results of table detection. Next we address the problem of evaluating table structure recognition. Our model is based on a directed acyclic attribute graph, or table DAG. We describe a new paradigm, “graph probing,” for comparing the results returned by the recognition system and the representation created during ground-truthing. Probing is in fact a general concept that could be applied to other document recognition tasks as well. Received July 18, 2000 / Accepted October 4, 2001  相似文献   

15.
In this paper, we present two novel disk failure recovery methods that utilize the inherent characteristics of video streams for efficient recovery. Whereas the first method exploits the inherent redundancy in video streams (rather than error-correcting codes) to approximately reconstruct data stored on failed disks, the second method exploits the sequentiality of video playback to reduce the overhead of online failure recovery in conventional RAID arrays. For the former approach, we present loss-resilient versions of JPEG and MPEG compression algorithms. We present an inherently redundant array of disks (IRAD) architecture that combines these loss-resilient compression algorithms with techniques for efficient placement of video streams on disk arrays to ensure that on-the-fly recovery does not impose any additional load on the array. Together, they enhance the scalability of multimedia servers by (1) integrating the recovery process with the decompression of video streams, and thereby distributing the reconstruction process across the clients; and (2) supporting graceful degradation in the quality of recovered images with increase in the number of disk failures. We present analytical and experimental results to show that both schemes significantly reduce the failure recovery overhead in a multimedia server.  相似文献   

16.
We describe a collection of algorithms designed to support reliable synchronization and group membership services for distributed multimedia applications. In particular, we consider those applications that require interactivity, isochronous rendering of multimedia data, and high reliability. We show that the algorithms we propose (i) provide reliable support for the synchronization of multimedia data streams, despite the occurrence of possible communication failures, (ii) maintain a consistent view of the relative group membership of all the nonfaulty application components, (iii) guarantee time-bounded delay of component failure detection and join, and (iv) meet effectively possible scalability requirements of the applications.  相似文献   

17.
Some simple distributed algorithms for sparse networks   总被引:1,自引:0,他引:1  
Summary. We give simple, deterministic, distributed algorithms for computing maximal matchings, maximal independent sets and colourings. We show that edge colourings with at most colours, and maximal matchings can be computed within deterministic rounds, where is the maximum degree of the network. We also show how to find maximal independent sets and -vertex colourings within deterministic rounds. All hidden constants are very small and the algorithms are very simple. Received: August 2000 / Accepted: November 2000  相似文献   

18.
In this paper, we present an efficient global illumination technique, and then we discuss the results of its extensive experimental validation. The technique is a hybrid of cluster-based hierarchical and progressive radiosity techniques, which does not require storing links between interacting surfaces and clusters. We tested our technique by applying a multistage validation procedure, which we designed specifically for global illumination solutions. First, we experimentally validate the algorithm against analytically derived and measured real-world data to check how calculation speed is traded for lighting simulation accuracy for various clustering and meshing scenarios. Then we test the algorithm performance and rendering quality by directly comparing the virtual and real-world images of a complex environment.  相似文献   

19.
The task of checking if a computer system satisfies its timing specifications is extremely important. These systems are often used in critical applications where failure to meet a deadline can have serious or even fatal consequences. This paper presents an efficient method for performing this verification task. In the proposed method a real-time system is modeled by a state-transition graph represented by binary decision diagrams. Efficient symbolic algorithms exhaustively explore the state space to determine whether the system satisfies a given specification. In addition, our approach computes quantitative timing information such as minimum and maximum time delays between given events. These results provide insight into the behavior of the system and assist in the determination of its temporal correctness. The technique evaluates how well the system works or how seriously it fails, as opposed to only whether it works or not. Based on these techniques a verification tool called Verus has been constructed. It has been used in the verification of several industrial real-time systems such as the robotics system described below. This demonstrates that the method proposed is efficient enough to be used in real-world designs. The examples verified show how the information produced can assist in designing more efficient and reliable real-time systems.  相似文献   

20.
The paper presents a genetic algorithm for clustering objects in images based on their visual features. In particular, a novel solution code (named Boolean Matching Code) and a correspondent reproduction operator (the Single Gene Crossover) are defined specifically for clustering and are compared with other standard genetic approaches. The paper describes the clustering algorithm in detail, in order to show the suitability of the genetic paradigm and underline the importance of effective tuning of algorithm parameters to the application. The algorithm is evaluated on some test sets and an example of its application in automated visual inspection is presented. Received: 6 August 1996 / Accepted: 11 November 1997  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号