首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the process of learning the naive Bayes, estimating probabilities from a given set of training samples is crucial. However, when the training samples are not adequate, probability estimation method will inevitably suffer from the zero-frequency problem. To avoid this problem, Laplace-estimate and M-estimate are the two main methods used to estimate probabilities. The estimation of two important parameters m (integer variable) and p (probability variable) in these methods has a direct impact on the underlying experimental results. In this paper, we study the existing probability estimation methods and carry out a parameter Cross-test by experimentally analyzing the performance of M-estimate with different settings for the two parameters m and p. This part of experimental result shows that the optimal parameter values vary corresponding to different data sets. Motivated by these analysis results, we propose an estimation model based on self-adaptive differential evolution. Then we propose an approach to calculate the optimal m and p value for each conditional probability to avoid the zero-frequency problem. We experimentally test our approach in terms of classification accuracy using the 36 benchmark machine learning repository data sets, and compare it to a naive Bayes with Laplace-estimate and M-estimate with a variety of setting of parameters from literature and those possible optimal settings via our experimental analysis. The experimental results show that the estimation model is efficient and our proposed approach significantly outperforms the traditional probability estimation approaches especially for large data sets (large number of instances and attributes).  相似文献   

2.
《Parallel Computing》1997,22(13):1733-1746
Replicated data consistency is a key issue in the design of distributed real time groupware applications. In this paper, we propose a new protocol to cope with this problem. The proposed algorithm guarantees an optimal response time while ensuring data consistency at system quiescence. The originality of our proposition relies on the fact that neither locks nor clocks nor global information are required to establish data consistency. Instead, direct dependency relations between generated operations as well as operation transformation mechanism are used. The coupling of the above two mentioned mechanisms is shown to realize a good trade-off between the different requirements of groupware applications. Advantages of our approach are illustrated by comparing the algorithm to two well known optimistic concurrency control protocols for groupware applications: dOPT and ORESTE.  相似文献   

3.
The field of reinforcement learning (RL) has been energized in the past few decades by elegant theoretical results indicating under what conditions, and how quickly, certain algorithms are guaranteed to converge to optimal policies. However, in practical problems, these conditions are seldom met. When we cannot achieve optimality, the performance of RL algorithms must be measured empirically. Consequently, in order to meaningfully differentiate learning methods, it becomes necessary to characterize their performance on different problems, taking into account factors such as state estimation, exploration, function approximation, and constraints on computation and memory. To this end, we propose parameterized learning problems, in which such factors can be controlled systematically and their effects on learning methods characterized through targeted studies. Apart from providing very precise control of the parameters that affect learning, our parameterized learning problems enable benchmarking against optimal behavior; their relatively small sizes facilitate extensive experimentation. Based on a survey of existing RL applications, in this article, we focus our attention on two predominant, ??first order?? factors: partial observability and function approximation. We design an appropriate parameterized learning problem, through which we compare two qualitatively distinct classes of algorithms: on-line value function-based methods and policy search methods. Empirical comparisons among various methods within each of these classes project Sarsa(??) and Q-learning(??) as winners among the former, and CMA-ES as the winner in the latter. Comparing Sarsa(??) and CMA-ES further on relevant problem instances, our study highlights regions of the problem space favoring their contrasting approaches. Short run-times for our experiments allow for an extensive search procedure that provides additional insights on relationships between method-specific parameters??such as eligibility traces, initial weights, and population sizes??and problem instances.  相似文献   

4.
Automatic protocol mining is a promising approach for inferring accurate and complete API protocols. However, just as with any data-mining technique, this approach requires sufficient training data (object usage scenarios). Existing approaches resolve the problem by analyzing more programs, which may cause significant runtime overhead. In this paper, we propose an inheritance-based oversampling approach for object usage scenarios (OUSs). Our technique is based on the inheritance relationship in object-oriented programs. Given an object-oriented program p, generally, the OUSs that can be collected from a run of p are not more than the objects used during the run. With our technique, a maximum of n times more OUSs can be achieved, where n is the average number of super-classes of all general OUSs. To investigate the effect of our technique, we implement it in our previous prototype tool, ISpecMiner, and use the tool to mine protocols from several real-world programs. Experimental results show that our technique can collect 1.95 times more OUSs than general approaches. Additionally, accurate and complete API protocols are more likely to be achieved. Furthermore, our technique can mine API protocols for classes never even used in programs, which are valuable for validating software architectures, program documentation, and understanding. Although our technique will introduce some runtime overhead, it is trivial and acceptable.  相似文献   

5.
Dealing with high-dimensional data has always been a major problem in many pattern recognition and machine learning applications. Trace ratio criterion is a criterion that can be applicable to many dimensionality reduction methods as it directly reflects Euclidean distance between data points of within or between classes. In this paper, we analyze the trace ratio problem and propose a new efficient algorithm to find the optimal solution. Based on the proposed algorithm, we are able to derive an orthogonal constrained semi-supervised learning framework. The new algorithm incorporates unlabeled data into training procedure so that it is able to preserve the discriminative structure as well as geometrical structure embedded in the original dataset. Under such a framework, many existing semi-supervised dimensionality reduction methods such as SDA, Lap-LDA, SSDR, SSMMC, can be improved using our proposed framework, which can also be used to formulate a corresponding kernel framework for handling nonlinear problems. Theoretical analysis indicates that there are certain relationships between linear and nonlinear methods. Finally, extensive simulations on synthetic dataset and real world dataset are presented to show the effectiveness of our algorithms. The results demonstrate that our proposed algorithm can achieve great superiority to other state-of-art algorithms.  相似文献   

6.
Tracking the best hyperplane with a simple budget Perceptron   总被引:1,自引:0,他引:1  
Shifting bounds for on-line classification algorithms ensure good performance on any sequence of examples that is well predicted by a sequence of changing classifiers. When proving shifting bounds for kernel-based classifiers, one also faces the problem of storing a number of support vectors that can grow unboundedly, unless an eviction policy is used to keep this number under control. In this paper, we show that shifting and on-line learning on a budget can be combined surprisingly well. First, we introduce and analyze a shifting Perceptron algorithm achieving the best known shifting bounds while using an unlimited budget. Second, we show that by applying to the Perceptron algorithm the simplest possible eviction policy, which discards a random support vector each time a new one comes in, we achieve a shifting bound close to the one we obtained with no budget restrictions. More importantly, we show that our randomized algorithm strikes the optimal trade-off $U = \Theta(\sqrt{B})$ between budget B and norm U of the largest classifier in the comparison sequence. Experiments are presented comparing several linear-threshold algorithms on chronologically-ordered textual datasets. These experiments support our theoretical findings in that they show to what extent randomized budget algorithms are more robust than deterministic ones when learning shifting target data streams.  相似文献   

7.
We study an on-line machine covering problem, in which jobs arrive one by one and their processing times are known upon their arrival, and jobs are allowed to migrate between machines when a new job is added in the system. However, the total processing time of migration induced by an incoming job is bounded by a constant factor β times the processing time of the incoming job. The objective is to maximize the minimum machine load. In this paper, we present an on-line algorithm with competitive ratio 6/5 for the two identical machines case with β=1. Moreover, the presented on-line algorithm is only a local migration, that is, when one job is assigned to machine i, only the jobs on machine i are allowed to migrate. We also show that the provided algorithm is a best possible on-line algorithm in the sense of local migration.  相似文献   

8.
The multisearch problem is defined as follows. Given a data structure D modeled as a graph with n constant-degree nodes, perform O(n) searches on D. Let r be the length of the longest search path associated with a search process, and assume that the paths are determined "on-line." That is, the search paths may overlap arbitrarily. In this paper, we solve the multisearch problem for certain classes of graphs in O([formula] + r ([formula]/log n)) time on a [formula] × [formula]n mesh-connected computer. For many data structures, the search path traversed when answering one search query has length r = O(log n). For these cases, our algorithm processes O(n) such queries in asymptotically optimal Θ([formula]) time. The classes of graphs we consider contain many of the important data structures that arise in practice, ranging from simple trees to Kirkpatrick hierarchical search DAGs. Multisearch is a useful abstraction that can be used to implement parallel versions of standard sequential data structures on a mesh. As example applications, we consider a variety of parallel on-line tree traversals, as well as hierarchical representations of polyhedra and its myriad of applications (line-polyhedron intersection queries, multiple tangent plane determination, intersecting convex polyhedra, and three-dimensional convex hull).  相似文献   

9.
An important problem in pervasive environments is detecting predicates on sensed variables in an asynchronous distributed setting to determine context and to respond. We do not assume the availability of synchronized physical clocks because they may not be available or may be too expensive for predicate detection in such environments with a (relatively) low event occurrence rate. We address the problem of detecting each occurrence of a global predicate, at the earliest possible instant, by proposing a suite of three on-line middleware protocols having varying degrees of accuracy. We analyze the degree of accuracy for the proposed protocols. The extent of false negatives and false positives is determined by the run-time message processing latencies.  相似文献   

10.
Backoff protocols are probably the most widely used protocols for contention resolution in multiple access channels. In this paper, we analyze the stochastic behavior of backoff protocols for contention resolution among a set of clients and servers. each server being a multiple access channel that deals with contention like an ethernet channel. We use the standard model in which each client generates requests for a given server according to a Bernoulli distribution with a specified mean. Theclient–server request rateof a system is the maximum over all client–server pairs (i, j) of the sum of all request rates associated with either clientior serverj. (Having a subunit client–server request rate is a necessary condition for stability for single-server systems.) Our main result is that any superlinear polynomial backoff protocol is stable for any multiple-server system with a subunit client–server request rate. Our result is the first proof of stability for any backoff protocol for contention resolution with multiple servers. (The multiple-server problem does not reduce to the single-server problem, because each client can only send a single message at any step.) Our result is also the first proof thatanyweakly acknowledgment based protocol is stable for contention resolution with multiple servers and such high request rates. Two special cases of our result are of interest. Hastad, Leighton, and Rogoff have shown that for a single-server system with a subunit client–server request rate anymodifiedsuperlinear polynomial backoff protocol is stable. These modified backoff protocols are similar to standard backoff protocols but require more random bits to implement. The special case of our result in which there is only one server extends the result of Hastad, Leighton, and Rogoff to standard (practical) backoff protocols. Finally, our result applies to dynamic routing in optical networks. Specifically, a special case of our result demonstrates that superlinear polynomial backoff protocols are stable for dynamic routing in optical networks.  相似文献   

11.
In this paper we extend the control methodology based on Extended Markov Tracking (EMT) by providing the control algorithm with capabilities to calibrate and even partially reconstruct the environment’s model. This enables us to resolve the problem of performance deterioration due to model incoherence, a problem faced in all model-based control methods. The new algorithm, Ensemble Actions EMT (EA-EMT), utilises the initial environment model as a library of state transition functions and applies a variation of prediction with experts to assemble and calibrate a revised model. By so doing, this is the first hybrid control algorithm that enables on-line adaptation within the egocentric control framework which dictates the control of an agent’s perceptions, rather than an agent’s environment state. In our experiments, we performed a range of tests with increasing model incoherence induced by three types of exogenous environment perturbations: catastrophic—the environment becomes completely inconsistent with the model, deviating—some aspect of the environment behaviour diverges compared to that specified in the model, and periodic—the environment alternates between several possible divergences. The results show that EA-EMT resolved model incoherence and significantly outperformed its EMT predecessor by up to 95%.  相似文献   

12.
Speranza and Tuza [Ann. Oper. Res. 86 (1999) 494-506] studied the on-line problem of scheduling jobs on m identical machines with extendable working time. In this problem, each machine is assumed to have an identical regular working time, which can be extended if necessary. The working time of a machine is the larger one between its regular working time and the total processing time of jobs assigned to it. The objective is to minimize the total working time of machines. They presented an on-line algorithm Hx, with a competitive ratio at most 1.228 for any number of machines by choosing an appropriate parameter x. In this paper we consider a small number of machines. The best choices of x are given for m=2,3,4 and the tight bounds, 7/6, 11/9 and 19/16, respectively, are proved. Among them, the algorithm for m=2 is best possible. We then derive a new algorithm for m=3 with a competitive ratio 7/6.  相似文献   

13.
To study the data dependencies over heterogeneous data in dataspaces, we define a general dependency form, namely comparable dependencies (CDS), which specifies constraints on comparable attributes. It covers the semantics of a broad class of dependencies in databases, including functional dependencies (FDS), metric functional dependencies (MFDS), and matching dependencies (MDS). As we illustrated, comparable dependencies are useful in real practice of dataspaces, such as semantic query optimization. Due to heterogeneous data in dataspaces, the first question, known as the validation problem, is to tell whether a dependency (almost) holds in a data instance. Unfortunately, as we proved, the validation problem with certain error or confidence guarantee is generally hard. In fact, the confidence validation problem is also NP-hard to approximate to within any constant factor. Nevertheless, we develop several approaches for efficient approximation computation, such as greedy and randomized approaches with an approximation bound on the maximum number of violations that an object may introduce. Finally, through an extensive experimental evaluation on real data, we verify the superiority of our methods.  相似文献   

14.
In a number of real life applications, scientists do not have access to temporal data, since budget for data acquisition is always limited. Here we challenge the problem of causal inference between groups of heterogeneous non-temporal observations obtained from multiple sources. We consider a family of probabilistic algorithms for causal inference based on an assumption that in case where X causes Y, P(X) and P(Y|X) are statistically independent. For a number of real world applications, deep learning methods were reported to achieve the most accurate empirical performance, what motivates us to use deep Boltzmann machines to approximate the marginal and conditional probabilities of heterogeneous observations as accurate as possible.We introduce a novel algorithm to infer causal relationships between blocks of variables. The proposed method was tested on a benchmark of multivariate cause-effect pairs. We show by our experiments that our method achieves the state-of-the-art empirical accuracy, and sometimes outperforms the state-of-the-art methods. An important part of our contribution is an application of the proposed algorithm to an original medical data set, where we explore relations between alimentary patters, human gut microbiome composition, and health status.  相似文献   

15.
《Computer Networks》2005,47(1):47-61
In this work we consider the problem of routing bandwidth-guaranteed flows with time-variable bandwidth profiles on a MPLS network. We assume that each demand is routed in an explicitly routed LSP, and the amount of bandwidth that must be reserved along the LSP varies during the day according to a piece-wise mask, which is known in advance. The time-of-day bandwidth profiles can be explicitly declared by the VPN customers in the SLA, or alternatively predicted by the ISP based on past measurements.In this framework, we propose a simple on-line algorithm for optimal selection of LSP paths. We also provide an ILP formulation for the associated off-line problem, and adopt it as a reference performance bound for the on-line algorithm.Additionally, we compare the performances of fixed and variable routing in presence of time-variable bandwidth profiles. The results presented here suggest that the a priori knowledge of the per-demand traffic profiles can be exploited to achieve a fixed routing configuration, which can be marginally improved by variable reconfigurations. We relate our findings with a couple of previous works that in different application contexts achieved similar results.  相似文献   

16.
In this paper we consider the problem ofon-linescheduling ofhard real-timetasks onmultipleprocessors. For a given set of ready tasks, one can propose many schedules. These schedules, however, may not necessarily be suitable for on-line scheduling. A suitable on-line schedule is one which can accommodate any future task set when it arrives. The traditional approach to solve the on-line scheduling problem is to propose a heuristic, and then to prove its effectiveness by comparing it with existing heuristics using simulation. No attempt has, however, been made to obtain a condition on the current schedule which when satisfied will permit one to schedule an arbitrary future task. In this paper, we aim at developing such a condition on the current schedule for the set of ready tasks which when satisfied can guarantee an on-line schedule for any futurefeasibletask set.  相似文献   

17.
Existing image fusion methods always use the same representations for different modal medical images. Otherwise, they solve the fusion problem by subjectively defining characteristics to be preserved. However, it leads to the distortion of unique information and restricts the fusion performance. To address the limitations, this paper proposes an unsupervised enhanced medical image fusion network. We perform both surface-level and deep-level constraints for enhanced information preservation. The surface-level constraint is based on the saliency and abundance measurement to preserve the subjectively defined and intuitive characteristics. In the deep-level constraint, the unique information is objectively defined based on the unique channels of a pre-trained encoder. Moreover, in our method, the chrominance information of fusion results is also enhanced. It is because we use the high-quality details in structural images (e.g., MRI) to alleviate the mosaic in functional images (e.g., PET, SPECT). Both qualitative and quantitative experiments demonstrate the superiority of our method over the state-of-the-art fusion methods.  相似文献   

18.
Many computer vision and pattern recognition algorithms are very sensitive to the choice of an appropriate distance metric. Some recent research sought to address a variant of the conventional clustering problem called semi-supervised clustering, which performs clustering in the presence of some background knowledge or supervisory information expressed as pairwise similarity or dissimilarity constraints. However, existing metric learning methods for semi-supervised clustering mostly perform global metric learning through a linear transformation. In this paper, we propose a new metric learning method that performs nonlinear transformation globally but linear transformation locally. In particular, we formulate the learning problem as an optimization problem and present three methods for solving it. Through some toy data sets, we show empirically that our locally linear metric adaptation (LLMA) method can handle some difficult cases that cannot be handled satisfactorily by previous methods. We also demonstrate the effectiveness of our method on some UCI data sets. Besides applying LLMA to semi-supervised clustering, we have also used it to improve the performance of content-based image retrieval systems through metric learning. Experimental results based on two real-world image databases show that LLMA significantly outperforms other methods in boosting the image retrieval performance.  相似文献   

19.
Clustering problems are central to many knowledge discovery and data mining tasks. However, most existing clustering methods can only work with fixed-dimensional representations of data patterns. In this paper, we study the clustering of data patterns that are represented as sequences or time series possibly of different lengths. We propose a model-based approach to this problem using mixtures of autoregressive moving average (ARMA) models. We derive an expectation-maximization (EM) algorithm for learning the mixing coefficients as well as the parameters of the component models. To address the model selection problem, we use the Bayesian information criterion (BIC) to determine the number of clusters in the data. Experiments are conducted on a number of simulated and real datasets. Results from the experiments show that our method compares favorably with other methods proposed previously by others for similar time series clustering tasks.  相似文献   

20.
In this paper we propose three metaheuristic approaches, namely a Tabu Search, an Evolutionary Computation and an Ant Colony Optimization approach, for the edge-weighted k-cardinality tree (KCT) problem. This problem is an NP-hard combinatorial optimization problem that generalizes the well-known minimum weight spanning tree problem. Given an edge-weighted graph G=(V,E), it consists of finding a tree in G with exactly k⩽|V|−1 edges, such that the sum of the weights is minimal. First, we show that our new metaheuristic approaches are competitive by applying them to a set of existing benchmark instances and comparing the results to two different Tabu Search methods from the literature. The results show that these benchmark instances are not challenging enough for our metaheuristics. Therefore, we propose a diverse set of benchmark instances that are characterized by different features such as density and variance in vertex degree. We show that the performance of our metaheuristics depends on the characteristics of the tackled instance, as well as on the cardinality. For example, for low cardinalities the Ant Colony Optimization approach is best, whereas for high cardinalities the Tabu Search approach has advantages.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号