首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
基于样本选择的最近邻凸包分类器   总被引:1,自引:0,他引:1       下载免费PDF全文
最近邻凸包分类算法是一种以测试点到各类别样本凸包的距离为分类度量的最近邻分类算法。然而,该算法的凸二次规划问题优化求解的较高的计算复杂度限制了其在较大规模数据集上的应用。本文提出一种样本选择方法——子类凸包生长法。通过迭代,选择距离选出样本凸包最远的点,直到满足终止条件,从而实现数据集的有效约简。ORL数据库和MIT-CBCL人脸识别training-synthetic库上的实验结果表明,子类凸包生长法选出的少量样本生成的凸包能够很好的表征训练集,在不降低最近邻凸包分类器性能的同时,使得算法的计算速度大为提高。  相似文献   

2.
学习样本的质量和数量对于智能数据分类系统至关重要,但在数据分类系统中没有一个通用的良好方法用于发现有意义的样本。以此为动机,提出数据集合凸边界的概念,给出了快速发现有意义样本集合的方法。首先,利用箱型函数对学习样本集合中的异常和特征不全样本进行清洗;接着,提出数据锥的概念,对归一化的学习样本进行锥形分割;最后,对每个锥形样本子集进行中心化,以凸边界为基础提取距离凸边界差异极小的样本构成凸边界样本集合。实验在12个UCI数据集上进行,并与高斯朴素贝叶斯(GNB)、决策树(CART)、线性判别分析(LDA)、提升算法(AdaBoost)、随机森林(RF)和逻辑回归(LR)这六种经典的数据分类算法进行对比。结果表明,各个算法在凸边界样本集合的训练时间显著缩短,同时保持了分类性能。特别地,对包含噪声数据较多的数据集,如剖腹产、电网稳定性、汽车评估等数据集,凸边界样本集合能使分类性能得到提升。为了更好地评价凸边界样本集合的效率,以样本变化率和分类性能变化率的比值定义了样本清洗效率,并用该指标来客观评价凸边界样本的意义。清洗效率大于1时说明方法有效,且数值越高效果越好。在脉冲星数据集合上,所提方法对GNB算法的清洗效率超过68,说明所提方法性能优越。  相似文献   

3.
基于凸包估计的核参数选择方法   总被引:1,自引:0,他引:1  
核及相关参数的选择是支撑向量机(support vector machine,SVM)研究中的核心问题之一.基于统计学习理论,提出一种通过角度切割样本集求解训练样本的近似凸包来进行最优核参数选择的方法.该方法可以克服传统的基于求解优化问题的方法所具有的计算复杂度高的缺点,且无论数据是否稠密,分布是否均匀都可适用.数值实验说明了提出的方法可行性与有效性.  相似文献   

4.
杨雪巍  杨伟 《测控技术》2018,37(12):17-21
针对空气制冷循环系统的故障检测需求,结合数据挖掘的理念,提出了一种基于网络包面的故障检测方法。此方法将凸包理论和多元回归分析的理论应用于故障检测过程,并运用基于主成分分析降低信号冗余度的方法,实现对试验数据的预处理及可视化呈现。最后通过系统正常工作状态下的数据样本寻找系统正常工作状态下的凸点,利用多元回归分析对凸点进行拟合,描绘出系统正常工作范围,由此构建出网络包面故障检测模型,并分别使用故障数据样本集和正常数据样本集对模型进行仿真试验验证。仿真结果证明了该方法的可行性和有效性。  相似文献   

5.
In this paper, we show that complete uniform spaces can be represented domain-theoretically. We introduce the notion of a uniform domain, which is an ω-algebraic domain with some uniform structure on the set K(D) of finite elements of D. It is proved that when (X,μ) is a complete uniform space of countable weight, there is a uniform domain D such that X is the retract of the set L(D) of limit elements of D. On the other hand, in every uniform domain D, there exists a minimal subspace M(D) of L(D) on which K(D) induces a uniformity structure. Thus, a uniform domain can be considered as a set with a particular kind of base of a uniformity. Since every infinite increasing sequences in K(D) identifies one element of M(D), through a labelling of edges of K(D), we obtain an admissible representation of a uniform space in a uniform domain. We also show that such a representation is a proper representation.  相似文献   

6.
Many applications require an estimate for the covariance matrix that is non-singular and well-conditioned. As the dimensionality increases, the sample covariance matrix becomes ill-conditioned or even singular. A common approach to estimating the covariance matrix when the dimensionality is large is that of Stein-type shrinkage estimation. A convex combination of the sample covariance matrix and a well-conditioned target matrix is used to estimate the covariance matrix. Recent work in the literature has shown that an optimal combination exists under mean-squared loss, however it must be estimated from the data. In this paper, we introduce a new set of estimators for the optimal convex combination for three commonly used target matrices. A simulation study shows an improvement over those in the literature in cases of extreme high-dimensionality of the data. A data analysis shows the estimators are effective in a discriminant and classification analysis.  相似文献   

7.
Path-oriented Random Testing (PRT) aims at generating a uniformly spread out sequence of random test data that execute a single control flow path within a program. The main challenge of PRT lies in its ability to build efficiently such a test suite in order to minimize the number of rejects (test data that execute another control flow path). We address this problem with an original divide-and-conquer approach based on constraint reasoning over finite domains, a well-recognized Constraint Programming technique. Our approach first derives path conditions by using backward symbolic execution and computes a tight over-approximation of their associated subdomain by using constraint propagation and constraint refutation. Second, a uniform random test data generator is extracted from this approximated subdomain. We implemented this approach and got experimental results that show the practical benefits of PRT based on constraint reasoning. On average, we got a two-order magnitude CPU time improvement over standard Random Testing on a set of paths extracted from classical benchmark programs.  相似文献   

8.
We propose a general recurrent neural-network (RNN) model for nonlinear optimization over a nonempty compact convex subset which includes the bound subset and spheroid subset as special cases. It is shown that the compact convex subset is a positive invariant and attractive set of the RNN system and that all the network trajectories starting from the compact convex subset converge to the equilibrium set of the RNN system. The above equilibrium set of the RNN system coincides with the optimum set of the minimization problem over the compact convex subset when the objective function is convex. The analysis of these qualitative properties for the RNN model is conducted by employing the properties of the projection operator of Euclidean space onto the general nonempty closed convex subset. A numerical simulation example is also given to illustrate the qualitative properties of the proposed general RNN model for solving an optimization problem over various compact convex subsets.  相似文献   

9.
SVM是一种基于核的学习方法,核及相关参数的选择对其性能有非常重要的影响,提出了一种数据依赖的最优核参数估计方法,通过角度切割样本集求解训练样本的近似凸包,以确定最优的核参数。实验结果表明,无论数据是否稠密,分布是否均匀,算法都可适用,该方法有较高的可行性与有效性。  相似文献   

10.
In this study, Adaptive Neuro-Fuzzy Inference System (ANFIS) has been used to model local scouring depth and pattern scouring around concave and convex arch shaped circular bed sills. The experimental part of this research study includes seven sets of laboratory test cases which were performed in an experimental flume under different flow conditions. A data set consists of 2754 data points of scouring depth were collected to use in the ANFIS model. The ratio of arch diameter, D, to flume width, W, is used as a non dimensional variables in all test cases. The results from ANFIS model were compared with the results of ANN model obtained by Homayoon et al. [24] and previously presented models. The results indicated that for D/W equal to 1 and 1.2, the ANFIS models produced a good performance for convex and concave bed sills. As a result, the ANFIS models can be used as an alternative to ANN for estimation of scour depth and scour pattern around a concave bed sill installed with a bridge pier.  相似文献   

11.
This paper addresses the uniform stability of switched linear systems, where uniformity refers to the convergence rate of the multiple solutions that one obtains as the switching signal ranges over a given set. We provide a collection of results that can be viewed as extensions of LaSalle's Invariance Principle to certain classes of switched linear systems. Using these results one can deduce asymptotic stability using multiple Lyapunov functions whose Lie derivatives are only negative semidefinite. Depending on the regularity assumptions placed on the switching signals, one may be able to conclude just asymptotic stability or (uniform) exponential stability. We show by counter-example that the results obtained are tight.  相似文献   

12.
Chao Sima 《Pattern recognition》2006,39(9):1763-1780
A cross-validation error estimator is obtained by repeatedly leaving out some data points, deriving classifiers on the remaining points, computing errors for these classifiers on the left-out points, and then averaging these errors. The 0.632 bootstrap estimator is obtained by averaging the errors of classifiers designed from points drawn with replacement and then taking a convex combination of this “zero bootstrap” error with the resubstitution error for the designed classifier. This gives a convex combination of the low-biased resubstitution and the high-biased zero bootstrap. Another convex error estimator suggested in the literature is the unweighted average of resubstitution and cross-validation. This paper treats the following question: Given a feature-label distribution and classification rule, what is the optimal convex combination of two error estimators, i.e. what are the optimal weights for the convex combination. This problem is considered by finding the weights to minimize the MSE of a convex estimator. It also considers optimality under the constraint that the resulting estimator be unbiased. Owing to the large amount of results coming from the various feature-label models and error estimators, a portion of the results are presented herein and the main body of results appears on a companion website. In the tabulated results, each table treats the classification rules considered for the model, various Bayes errors, and various sample sizes. Each table includes the optimal weights, mean errors and standard deviations for the relevant error measures, and the MSE and MAE for the optimal convex estimator. Many observations can be made by considering the full set of experiments. Some general trends are outlined in the paper. The general conclusion is that optimizing the weights of a convex estimator can provide substantial improvement, depending on the classification rule, data model, sample size and component estimators. Optimal convex bootstrap estimators are applied to feature-set ranking to illustrate their potential advantage over non-optimized convex estimators.  相似文献   

13.
14.
Maximum likelihood set for estimating a probability mass function   总被引:1,自引:0,他引:1  
We propose a new method for estimating the probability mass function (pmf) of a discrete and finite random variable from a small sample. We focus on the observed counts--the number of times each value appears in the sample--and define the maximum likelihood set (MLS) as the set of pmfs that put more mass on the observed counts than on any other set of counts possible for the same sample size. We characterize the MLS in detail in this article. We show that the MLS is a diamond-shaped subset of the probability simplex [0,1]k bounded by at most k x (k-1) hyper-planes, where k is the number of possible values of the random variable. The MLS always contains the empirical distribution, as well as a family of Bayesian estimators based on a Dirichlet prior, particularly the well-known Laplace estimator. We propose to select from the MLS the pmf that is closest to a fixed pmf that encodes prior knowledge. When using Kullback-Leibler distance for this selection, the optimization problem comprises finding the minimum of a convex function over a domain defined by linear inequalities, for which standard numerical procedures are available. We apply this estimate to language modeling using Zipf's law to encode prior knowledge and show that this method permits obtaining state-of-the-art results while being conceptually simpler than most competing methods.  相似文献   

15.
In this paper a new, abstract method for analysis and visualization of multidimensional data sets in pattern recognition problems is introduced. It can be used to determine the properties of an unknown, complex data set and to assist in finding the most appropriate recognition algorithm. Additionally, it can be employed to design layers of a feedforward artificial neural network or to visualize the higher-dimensional problems in 2-D and 3-D without losing relevant data set information. The method is derived from the convex set theory and works by considering convex subsets within the data and analyzing their respective positions in the original dimension. Its ability to describe certain set features that cannot be explicitly projected into lower dimensions sets it apart from many other visualization techniques. Two classical multidimensional problems are analyzed and the results show the usefulness of the presented method and underline its strengths and weaknesses.  相似文献   

16.
Fuzzy sets and fuzzy state modeling require modifications of fundamental principles of statistical estimation and inference. These modifications trade increased computational effort for greater generality of data representation. For example, multivariate discrete response data of high (but finite) dimensionality present the problem of analyzing large numbers of cells with low event counts due to finite sample size. It would be useful to have a model based on an invariant metric to represent such data parsimoniously with a latent “smoothed” or low dimensional parametric structure. Determining the parameterization of such a model is difficult since multivariate normality (i.e., that all significant information is represented in the second order moments matrix), an assumption often used in fitting the most common types of latent variable models, is not appropriate. We present a fuzzy set model to analyze high dimensional categorical data where a metric for grades of membership in fuzzy sets is determined by latent convex sets, within which moments up to order J of a discrete distribution can be represented. The model, based on a fuzzy set parameterization, can be shown, using theorems on convex polytopes [1], to be dependent on only the enclosing linear space of the convex set. It is otherwise measure invariant. We discuss the geometry of the model's parameter space, the relation of the convex structure of model parameters to the dual nature of the case and variable spaces, how that duality relates to describing fuzzy set spaces, and modified principles of estimation.  相似文献   

17.
This Letter presents algorithms for computing a uniform sequence of n integer points in a given interval [0,m] where m and n are integers such that m>n>0. The uniformity of a point set is measured by the ratio of the minimum gap over the maximum gap. We prove that we can insert n integral points one by one into the interval [0,m] while keeping the uniformity of the point set at least 1/2. If we require uniformity strictly greater than 1/2, such a sequence does not always exist, but we can prove a tight upper bound on the length of the sequence for given values of n and m.  相似文献   

18.
Perhaps the most flexible synopsis of a database is a uniform random sample of the data; such samples are widely used to speed up processing of analytic queries and data-mining tasks, enhance query optimization, and facilitate information integration. The ability to bound the maximum size of a sample can be very convenient from a system-design point of view, because the task of memory management is simplified, especially when many samples are maintained simultaneously. In this paper, we study methods for incrementally maintaining a bounded-size uniform random sample of the items in a dataset in the presence of an arbitrary sequence of insertions and deletions. For “stable” datasets whose size remains roughly constant over time, we provide a novel sampling scheme, called “random pairing” (RP), that maintains a bounded-size uniform sample by using newly inserted data items to compensate for previous deletions. The RP algorithm is the first extension of the 45-year-old reservoir sampling algorithm to handle deletions; RP reduces to the “passive” algorithm of Babcock et al. when the insertions and deletions correspond to a moving window over a data stream. Experiments show that, when dataset-size fluctuations over time are not too extreme, RP is the algorithm of choice with respect to speed and sample-size stability. For “growing” datasets, we consider algorithms for periodically resizing a bounded-size random sample upwards. We prove that any such algorithm cannot avoid accessing the base data, and provide a novel resizing algorithm that minimizes the time needed to increase the sample size. We also show how to merge uniform samples from disjoint datasets to obtain a uniform sample of the union of the datasets; the merged sample can be incrementally maintained. Our new RPMerge algorithm extends the HRMerge algorithm of Brown and Haas to effectively deal with deletions, thereby facilitating efficient parallel sampling.  相似文献   

19.
Prototype-based methods are commonly used in cluster analysis and the results may be highly dependent on the prototype used. We propose a two-level fuzzy clustering method that involves adaptively expanding and merging convex polytopes, where the convex polytopes are considered as a “flexible” prototype. Therefore, the dependency on the use of a specified prototype can be eliminated. Also, the proposed method makes it possible to effectively represent an arbitrarily distributed data set without a priori knowledge of the number of clusters in the data set. In the first level of our proposed method, each cluster is represented by a convex polytope which is described by its set of vertices. Specifically, nonlinear membership functions are utilized to determine whether an input pattern creates a new cluster or whether an existing cluster should be modified. In the second level, the expandable clusters that are selected by an intercluster distance measure are merged to improve clustering efficiency and to reduce the order dependency of the incoming input patterns. Several experimental results are given to show the validity of our method  相似文献   

20.
Earlier clustering techniques such as the modified learning vector quantization (MLVQ) and the fuzzy Kohonen partitioning (FKP) techniques have focused on the derivation of a certain set of parameters so as to define the fuzzy sets in terms of an algebraic function. The fuzzy membership functions thus generated are uniform, normal, and convex. Since any irregular training data is clustered into uniform fuzzy sets (Gaussian, triangular, or trapezoidal), the clustering may not be exact and some amount of information may be lost. In this paper, two clustering techniques using a Kohonen-like self-organizing neural network architecture, namely, the unsupervised discrete clustering technique (UDCT) and the supervised discrete clustering technique (SDCT), are proposed. The UDCT and SDCT algorithms reduce this data loss by introducing nonuniform, normal fuzzy sets that are not necessarily convex. The training data range is divided into discrete points at equal intervals, and the membership value corresponding to each discrete point is generated. Hence, the fuzzy sets obtained contain pairs of values, each pair corresponding to a discrete point and its membership grade. Thus, it can be argued that fuzzy membership functions generated using this kind of a discrete methodology provide a more accurate representation of the actual input data. This fact has been demonstrated by comparing the membership functions generated by the UDCT and SDCT algorithms against those generated by the MLVQ, FKP, and pseudofuzzy Kohonen partitioning (PFKP) algorithms. In addition to these clustering techniques, a novel pattern classifying network called the Yager fuzzy neural network (FNN) is proposed in this paper. This network corresponds completely to the Yager inference rule and exhibits remarkable generalization abilities. A modified version of the pseudo-outer product (POP)-Yager FNN called the modified Yager FNN is introduced that eliminates the drawbacks of the earlier network and yi- elds superior performance. Extensive experiments have been conducted to test the effectiveness of these two networks, using various clustering algorithms. It follows that the SDCT and UDCT clustering algorithms are particularly suited to networks based on the Yager inference rule.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号