首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 718 毫秒
1.
A two-class classification problem is considered where the objects to be classified are bags of instances in d-space. The classification rule is defined in terms of an open d-ball. A bag is labeled positive if it meets the ball and labeled negative otherwise. Determining the center and radius of the ball is modeled as a SVM-like margin optimization problem. Necessary optimality conditions are derived leading to a polynomial algorithm in fixed dimension. A VNS type heuristic is developed and experimentally tested. The methodology is extended to classification by several balls and to more than two classes.  相似文献   

2.
高维空间球体的k-中心聚类问题   总被引:2,自引:1,他引:1       下载免费PDF全文
本文提出了高维空间球体的k-中心聚类问题。该问题是指对高维空间中多个球构成的集合B,构造是个球来共同覆盖B中所有已知的球,并使k个球中的最大半径最小。本文从B中有选择地取出一部分球构成集合s,称其为B的核心集,并利用该核心集,对给定ε给出了高维空间球体k-中心聚类问题关于球数n和维数d的多项式时间1-ε近似算法。而且,S中球的个数为O(1/ε^2),与B中球的个数和空间维数无关。  相似文献   

3.
陶剑文  王士同 《软件学报》2012,23(6):1458-1471
为了提高球形分类器的分类性能,受支持向量机和小球体大间隔等方法的启发,提出一种大间隔最小压缩包含球(large margin and minimal reduced enclosing ball,简称LMMREB)学习机,其在Mercer核诱导的特征空间,通过优化一个最小包含球,以寻求两个同心的分别包含二类模式的压缩包含球,且使二类模式分别与压缩包含球间最小间隔最大化,从而可以同时实现类间间隔和类内内聚性的最大化分别采用人工数据和实际数据进行实验,结果显示,LMMREB的分类性能优于或等同于相关方法.  相似文献   

4.
Searching for an effective dimension reduction space is an important problem in regression, especially for high-dimensional data such as microarray data. A major characteristic of microarray data consists in the small number of observations n and a very large number of genes p. This “large p, small n” paradigm makes the discriminant analysis for classification difficult. In order to offset this dimensionality problem a solution consists in reducing the dimension. Supervised classification is understood as a regression problem with a small number of observations and a large number of covariates. A new approach for dimension reduction is proposed. This is based on a semi-parametric approach which uses local likelihood estimates for single-index generalized linear models. The asymptotic properties of this procedure are considered and its asymptotic performances are illustrated by simulations. Applications of this method when applied to binary and multiclass classification of the three real data sets Colon, Leukemia and SRBCT are presented.  相似文献   

5.
Accurate diagnosis is a significant step in cancer treatment. Machine learning can support doctors in prognosis decision-making, and its performance is always weakened by the high dimension and small quantity of genetic data. Fortunately, deep learning can effectively process the high dimensional data with growing. However, the problem of inadequate data remains unsolved and has lowered the performance of deep learning. To end it, we propose a generative adversarial model that uses non target cancer data to help target generator training. We use the reconstruction loss to further stabilize model training and improve the quality of generated samples. We also present a cancer classification model to optimize classification performance. Experimental results prove that mean absolute error of cancer gene made by our model is 19.3% lower than DC-GAN, and the classification accuracy rate of our produced data is higher than the data created by GAN. As for the classification model, the classification accuracy of our model reaches 92.6%, which is 7.6% higher than the model without any generated data.  相似文献   

6.
7.
In a statistical setting of the classification (pattern recognition) problem the number of examples required to approximate an unknown labelling function is linear in the VC dimension of the target learning class. In this work we consider the question of whether such bounds exist if we restrict our attention to computable classification methods, assuming that the unknown labelling function is also computable. We find that in this case the number of examples required for a computable method to approximate the labelling function not only is not linear, but grows faster (in the VC dimension of the class) than any computable function. No time or space constraints are put on the predictors or target functions; the only resource we consider is the training examples. The task of classification is considered in conjunction with another learning problem - data compression. An impossibility result for the task of data compression allows us to estimate the sample complexity for pattern recognition.  相似文献   

8.
This paper investigates the effect of partial least squares (PLS) in unbalanced pattern classification. Beyond dimension reduction, PLS is proved to be superior to generate favorable features for classification. The PLS classifier (PLSC) is illustrated to give extremely better prediction accuracy to the class with the smaller data number. In this paper, an asymmetric PLS classifier (APLSC) is proposed to boost the poor performance of PLSC to the class with the larger data number. PLSC and APLSC are compared with five state-of-arts algorithms, support vector machines (SVMs), unbalanced SVMs, asymmetric principal component and discriminant analysis (APCDA), SMOTE and Adaboost. Experimental results on six UCI data sets show that APLSC improves PLSC in promoting overall classification accuracy, at the same time, APLSC and PLSC perform better than other five algorithms even under seriously unbalanced distribution.  相似文献   

9.
In the literature, very few researches have addressed the problem of recognizing the digits placed on spherical surfaces, even though digit recognition has already attracted extensive attentions and been attacked from various directions. As a particular example of recognizing this kind of digits, in this paper, we introduce a digit ball detection and recognition system to recognize the digit appearing on a 3D ball. The so-called digit ball is the ball carrying Arabic number on its spherical surface. Our system works under weakly controlled environment to detect and recognize the digit balls for practical application, which requires the system to keep on working without recognition errors in a real-time manner. Two main challenges confront our system, one is how to accurately detect the balls and the other is how to deal with the arbitrary rotation of the balls. For the first one, we develop a novel method to detect the balls appearing in a single image and demonstrate its effectiveness even when the balls are densely placed. To circumvent the other challenge, we use spin image and polar image for the representation of the balls to achieve rotation-invariance advantage. Finally, we adopt a dictionary learning-based method for the recognition task. To evaluate our system, a series of experiments are performed on real-world digit ball images, and the results validate the effectiveness of our system, which achieves 100 % accuracy in the experiments.  相似文献   

10.
为解决数据流分类过程中样本标注和概念漂移问题,提出了一种基于实例迁移的数据流分类挖掘模型.首先,该模型用支持向量机作学习器,用所得分类模型中的支持向量构建源领域,待分类的当前数据块为目标域.然后,借助互近邻思想在源域中挑选目标域中样本的真邻居进行实例迁移,避免发生负迁移.最后,通过合并目标域和迁移样本形成训练集,提高标注样本数量,增强模型的泛化能力.理论分析和实验结果表明,所提方法具有可行性,相比其它学习方法在分类准确性方面更具优势.  相似文献   

11.
Incremental feature extraction is effective for facilitating the analysis of large-scale streaming data. However, most current incremental feature extraction methods are not suitable for processing streaming data with high feature dimensions because only a few methods have low time complexity, which is linear with both the number of samples and features. In addition, feature extraction methods need to improve the performance of further classification. Therefore, incremental feature extraction methods need to be more efficient and effective. Partial least squares (PLS) is known to be an effective dimension reduction technique for classification. However, the application of PLS to streaming data is still an open problem. In this study, we propose a highly efficient and powerful dimension reduction algorithm called incremental PLS (IPLS), which comprises a two-stage extraction process. In the first stage, the PLS target function is adapted so it is incremental by updating the historical mean to extract the leading projection direction. In the second stage, the other projection directions are calculated based on the equivalence between the PLS vectors and the Krylov sequence. We compared the performance of IPLS with other state-of-the-art incremental feature extraction methods such as incremental principal components analysis, incremental maximum margin criterion, and incremental inter-class scatter using real streaming datasets. Our empirical results showed that IPLS performed better than other methods in terms of its efficiency and further classification accuracy.  相似文献   

12.
许敏  王士同  顾鑫  俞林 《软件学报》2013,24(10):2312-2326
针对回归问题中存在采集数据不完整而导致预测性能降低的情况,根据支撑向量回归机(support vectorregression,简称SVR)等价于中心约束最小包含球(center-constrained minimum enclosing ball,简称CC-MEB)以及相似领域概率分布差异只与两域各自的最小包含球中心点位置有关的理论新结果,提出了针对大数据集的领域自适应核心集支撑向量回归机(adaptive-core vector regression,简称A-CVR).该算法利用源域CC-MEB 中心点对目标域CC-MEB 中心点进行校正,从而提高目标域的回归预测性能.实验结果表明,这种领域自适应算法可以弥补目标域缺失数据的不足,大大提高回归预测性能.  相似文献   

13.
高维空间球集的覆盖问题是指对高维空间中多个球构成的集合S,构造一个直径最小的球来覆盖S中所有已知球。本文提出了球集直径的概念,给出求解球集直径的1/3~(1/2)近似算法。基于此算法求解球集实例集合S的初始核心集,进而给出高维空间球集覆盖问题的1+ε近似算法,算法时间复杂度为O(nd/ε+d2/ε3/2(1/ε+d)lg1/ε)。算法保证核心集中球的个数为O(1/ε),与S中球的个数和空间维数无关。  相似文献   

14.
In this paper, an algorithm is introduced that computes an arbitrarily fine approximation of the smallest enclosing ball of a point set in any dimension. This operation is important in, for example, classification, clustering, and data mining. The algorithm is very simple to implement, gives reliable results, and gracefully handles large problem instances in low and high dimensions, as confirmed by both theoretical arguments and empirical evaluation. For example, using a CPU with eight cores, it takes less than two seconds to compute a 1.001‐approximation of the smallest enclosing ball of one million points uniformly distributed in a hypercube in dimension 200. Furthermore, the presented approach extends to a more general class of input objects, such as ball sets.  相似文献   

15.
Dimensionality reduction is a very important tool in data mining. Intrinsic dimension of data sets is a key parameter for dimensionality reduction. However, finding the correct intrinsic dimension is a challenging task. In this paper, a new intrinsic dimension estimation method is presented. The estimator is derived by finding the exponential relationship between the radius of an incising ball and the number of samples included in the ball. The method is compared with the previous dimension estimation methods. Experiments have been conducted on synthetic and high dimensional image data sets and on data sets of the Santa Fe time series competition, and the results show that the new method is accurate and robust.  相似文献   

16.
对不平衡数据集SVM分类存在着分类结果偏向多数类的情况,使得分类结果中少数类的F1-Measure值偏低.本文提出一种不改变样本集合的样本数,并结合样本点总数,分类过程中的支持向量个数,少数类和多数类的准确率,生成权重值对分类超平面参数b进行优化,以此提高少数类样本点分类准确率的方法,并通过实验证明该方法的有效性.  相似文献   

17.
18.
高维空间球集的覆盖问题是指对高维空间中多个球构成的集合S,构造一个直径最小的球来覆盖S中所有已知球。本文提出了球集直径的概念,给出求解球集直径的1/〖KF(〗3〖KF)〗近似算法。基于此算法求解球集实例集合S的初始核心集,进而给出高维空间球集覆盖问题的1+ε近似算法,算法时间复杂度为O(nd/ε+d2/ε〖SX(〗3〖〗2〖SX)〗(1/ε+d)lg1/ε)。算法保证核心集中球的个数为 O(1/ε),与S中球的个数和空间维数无关。  相似文献   

19.
A mixed effects least squares support vector machine (LS-SVM) classifier is introduced to extend the standard LS-SVM classifier for handling longitudinal data. The mixed effects LS-SVM model contains a random intercept and allows to classify highly unbalanced data, in the sense that there is an unequal number of observations for each case at non-fixed time points. The methodology consists of a regression modeling and a classification step based on the obtained regression estimates. Regression and classification of new cases are performed in a straightforward manner by solving a linear system. It is demonstrated that the methodology can be generalized to deal with multi-class problems and can be extended to incorporate multiple random effects. The technique is illustrated on simulated data sets and real-life problems concerning human growth.  相似文献   

20.
提出一种基于近似最小闭包球原理的中文博客(Blog)话题分类方法。根据近似最小闭包球原理,将支持向量机的优化求解转换为近似最小闭包球求解,使得只需选择大规模数据集的一个核心子集参与分类器的训练过程,以提高Blog话题分类中大规模训练集的处理能力。在较大规模的Blog数据集上进行中文Blog特征选择及话题分类实验。实验结果表明,该方法不仅准确率可达到支持向量机同等的效果,且可减少训练时间,获得较好的Blog话题分类效果。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号