首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Supervised learning often requires a large number of labeled examples, which has become a critical bottleneck in the case that manual annotating the class labels is costly. To mitigate this issue, a new framework called pairwise comparison (Pcomp) classification is proposed to allow training examples only weakly annotated with pairwise comparison, i.e., which one of two examples is more likely to be positive. The previous study solves Pcomp problems by minimizing the classification error, which may lead to less robust model due to its sensitivity to class distribution. In this paper, we propose a robust learning framework for Pcomp data along with a pairwise surrogate loss called Pcomp-AUC. It provides an unbiased estimator to equivalently maximize AUC without accessing the precise class labels. Theoretically, we prove the consistency with respect to AUC and further provide the estimation error bound for the proposed method. Empirical studies on multiple datasets validate the effectiveness of the proposed method.  相似文献   

2.
We present an extensive empirical comparison between nineteen prototypical supervised ensemble learning algorithms, including Boosting, Bagging, Random Forests, Rotation Forests, Arc-X4, Class-Switching and their variants, as well as more recent techniques like Random Patches. These algorithms were compared against each other in terms of threshold, ranking/ordering and probability metrics over nineteen UCI benchmark data sets with binary labels. We also examine the influence of two base learners, CART and Extremely Randomized Trees, on the bias–variance decomposition and the effect of calibrating the models via Isotonic Regression on each performance metric. The selected data sets were already used in various empirical studies and cover different application domains. The source code and the detailed results of our study are publicly available.  相似文献   

3.
Neural Computing and Applications - Feature Selection (FS) is an important preprocessing step that is involved in machine learning and data mining tasks for preparing data (especially...  相似文献   

4.
The paper presents a novel framework for large class, binary pattern classification problem by learning-based combination of multiple features. In particular, class of binary patterns including characters/primitives and symbols has been considered in the scope of this work. We demonstrate novel binary multiple kernel learning-based classification architecture for applications including such problems for fast and efficient performance. The character/primitive classification problem primarily concentrates on Gujarati and Bangla character recognition from the analytical and experimental context. A novel feature representation scheme for symbols images is introduced containing the necessary elastic and non-elastic deformation invariance properties. The experimental efficacy of proposed framework for symbol classification has been demonstrated on two public data sets.  相似文献   

5.
One of the simplest, and yet most consistently well-performing set of classifiers is the naïve Bayes models (a special class of Bayesian network models). However, these models rely on the (naïve) assumption that all the attributes used to describe an instance are conditionally independent given the class of that instance. To relax this independence assumption, we have in previous work proposed a family of models, called latent classification models (LCMs). LCMs are defined for continuous domains and generalize the naïve Bayes model by using latent variables to model class-conditional dependencies between the attributes. In addition to providing good classification accuracy, the LCM has several appealing properties, including a relatively small parameter space making it less susceptible to over-fitting. In this paper we take a first step towards generalizing LCMs to hybrid domains, by proposing an LCM for domains with binary attributes. We present algorithms for learning the proposed model, and we describe a variational approximation-based inference procedure. Finally, we empirically compare the accuracy of the proposed model to the accuracy of other classifiers for a number of different domains, including the problem of recognizing symbols in black and white images.  相似文献   

6.
Effectiveness of local binary pattern (LBP) features is well proven in the field of texture image classification and retrieval. This paper presents a more effective completed modeling of the LBP. The traditional LBP has a shortcoming that sometimes it may represent different structural patterns with same LBP code. In addition, LBP also lacks global information and is sensitive to noise. In this paper, the binary patterns generated using threshold as a summation of center pixel value and average local differences are proposed. The proposed local structure patterns (LSP) can more accurately classify different textural structures as they utilize both local and global information. The LSP can be combined with a simple LBP and center pixel pattern to give a completed local structure pattern (CLSP) to achieve higher classification accuracy. In order to make CLSP insensitive to noise, a robust local structure pattern (RLSP) is also proposed. The proposed scheme is tested over three representative texture databases viz. Outex, Curet, and UIUC. The experimental results indicate that the proposed method can achieve higher classification accuracy while being more robust to noise.  相似文献   

7.
王全 《计算机应用》2007,27(10):2372-2375
提出一种能够适应数据流突变式概念变化的增量分类算法,采用网格技术对数据集特征向量进行量化,利用Haar小波多种分辨率的数据表示方式,基于最近邻技术发现测试点的合适类标签。在真实数据集上的测试证明,与已存在的数据流分类算法相比,提出的分类算法精度较高,具有很低的更新代价,适合数据流应用的需求。  相似文献   

8.
Several meta-learning techniques for multi-label classification (MLC), such as chaining and stacking, have already been proposed in the literature, mostly aimed at improving predictive accuracy through the exploitation of label dependencies. In this paper, we propose another technique of that kind, called dependent binary relevance (DBR) learning. DBR combines properties of both, chaining and stacking. We provide a careful analysis of the relationship between these and other techniques, specifically focusing on the underlying dependency structure and the type of training data used for model construction. Moreover, we offer an extensive empirical evaluation, in which we compare different techniques on MLC benchmark data. Our experiments provide evidence for the good performance of DBR in terms of several evaluation measures that are commonly used in MLC.  相似文献   

9.
Pan  Zhibin  Wu  Xiuquan  Li  Zhengyi 《Multimedia Tools and Applications》2020,79(9-10):5477-5500
Multimedia Tools and Applications - Local binary pattern (LBP) has already been proved to be a powerful measure of image texture with fixed sampling scheme: all P neighbor pixels in a single-scale...  相似文献   

10.
11.
Recently, the local binary patterns (LBP) have been widely used in the texture classification. The LBP methods obtain the binary pattern by comparing the gray scales of pixels on a small circular region with the gray scale of their central pixel. The conventional LBP methods only describe microstructures of texture images, such as edges, corners, spots and so on, although many of them show good performances on the texture classification. This situation still could not be changed, even though the multi-resolution analysis technique is adopted by LBP methods. Moreover, the circular sampling region limits the ability of the conventional LBP methods in describing anisotropic features. In this paper, we change the shape of sampling region and get an extended LBP operator. And a multi-structure local binary pattern (Ms-LBP) operator is achieved by executing the extended LBP operator on different layers of an image pyramid. Thus, the proposed method is simple yet efficient to describe four types of structures: isotropic microstructure, isotropic macrostructure, anisotropic microstructure and anisotropic macrostructure. We demonstrate the performance of our method on two public texture databases: the Outex and the CUReT. The experimental results show the advantages of the proposed method.  相似文献   

12.
一种基于折半层次搜索的包分类算法   总被引:3,自引:2,他引:1  
潘登  张大方  谢鲲  张继 《计算机应用》2009,29(2):500-502
折半层次搜索(BSOL)算法是一种高效的包分类算法,容易拓展至多维包分类,并支持range类型的规则。但由于其核心结构是在特里树(Trie)的每一层创建hash表,因此当hash装载因子较大或hash冲突较大时,会影响其效率。分析折半层次搜索算法的优缺点,引入布鲁姆过滤器,提出了一种新的改进算法,为Trie树的每一层建立了一个布鲁姆过滤器,在进行hash查找之前先进行一次布鲁姆查询运算,能够在hash冲突较大的情况下依然具有良好的性能。仿真实验结果表明,在数据包的命中率低于90%并且hash装载因子较大的情况下,新算法在运行时间上要优于以前的算法。  相似文献   

13.
Chu  Hong-Min  Huang  Kuan-Hao  Lin  Hsuan-Tien 《Machine Learning》2019,108(8-9):1193-1230

We study multi-label classification (MLC) with three important real-world issues: online updating, label space dimension reduction (LSDR), and cost-sensitivity. Current MLC algorithms have not been designed to address these three issues simultaneously. In this paper, we propose a novel algorithm, cost-sensitive dynamic principal projection (CS-DPP) that resolves all three issues. The foundation of CS-DPP is an online LSDR framework derived from a leading LSDR algorithm. In particular, CS-DPP is equipped with an efficient online dimension reducer motivated by matrix stochastic gradient, and establishes its theoretical backbone when coupled with a carefully-designed online regression learner. In addition, CS-DPP embeds the cost information into label weights to achieve cost-sensitivity along with theoretical guarantees. Experimental results verify that CS-DPP achieves better practical performance than current MLC algorithms across different evaluation criteria, and demonstrate the importance of resolving the three issues simultaneously.

  相似文献   

14.
M  Vidhya  S  Aji 《Applied Intelligence》2022,52(12):14164-14177
Applied Intelligence - The challenges raised by the massive data are being managed by the community through the advancements of infrastructure and algorithms, and now the processing of fast data is...  相似文献   

15.
Minimum classification error training for online handwriting recognition   总被引:1,自引:0,他引:1  
This paper describes an application of the minimum classification error (MCE) criterion to the problem of recognizing online unconstrained-style characters and words. We describe an HMM-based, character and word-level MCE training aimed at minimizing the character or word error rate while enabling flexibility in writing style through the use of multiple allographs per character. Experiments on a writer-independent character recognition task covering alpha-numerical characters and keyboard symbols show that the MCE criterion achieves more than 30 percent character error rate reduction compared to the baseline maximum likelihood-based system. Word recognition results, on vocabularies of 5k to 10k, show that MCE training achieves around 17 percent word error rate reduction when compared to the baseline maximum likelihood system.  相似文献   

16.

In this paper, an extended complete LBP (ELBP) for texture classification is proposed, in which the local feature vectors are composed of the ratio of the central pixel and its neighborhood pixels to a specific threshold. ECLBP_C represents the gray level of the image, which is obtained by comparing the center pixel with the global threshold. ECLBP_S and ECLBP_M represent the symbol component and the magnitude component of the 3-neighbor region of the center pixel respectively, which are obtained by calculating two binary codes using the original LBP algorithm for the 3-neighbor region of the center pixel. In order to make the proposed algorithm scalable, in addition to the 3-neighbor pixels of the central pixel, the proposed algorithm use the center pixel as the center, r as the radius in the circle with ɑ as the filter’s radius to generate extended binary coding, such as ECLBP_ES_r,α ECLBP_EM_ r,α. In order to describe the local region feature vector in detail, specified ECLBP_ES_r,α and ECLBP_EM_r,α can be obtained by defining the number of extensions according to actual needs, and then established and concatenated all ECLBP gray histograms for statistics. In the experimental part, we analyze the performance of the proposed algorithm in detail, and prove that the algorithm has good scalability and robustness. The experimental results show that the classification accuracy of the proposed algorithm is up to 99% after 3 expansions in Table 2. The source codes of the proposed algorithm can be downloaded from https://github.com/zenqiang/ECLBP.git.

  相似文献   

17.
This paper proposes a cellular automata-based solution of a binary classification problem. The proposed method is based on a two-dimensional, three-state cellular automaton (CA) with the von Neumann neighborhood. Since the number of possible CA rules (potential CA-based classifiers) is huge, searching efficient rules is conducted with use of a genetic algorithm (GA). Experiments show an excellent performance of discovered rules in solving the classification problem. The best found rules perform better than the heuristic CA rule designed by a human and also better than one of the most widely used statistical method: the k-nearest neighbors algorithm (k-NN). Experiments show that CAs rules can be successfully reused in the process of searching new rules.  相似文献   

18.
Adaptive binary tree for fast SVM multiclass classification   总被引:1,自引:0,他引:1  
Jin  Cheng  Runsheng   《Neurocomputing》2009,72(13-15):3370
This paper presents an adaptive binary tree (ABT) to reduce the test computational complexity of multiclass support vector machine (SVM). It achieves a fast classification by: (1) reducing the number of binary SVMs for one classification by using separating planes of some binary SVMs to discriminate other binary problems; (2) selecting the binary SVMs with the fewest average number of support vectors (SVs). The average number of SVs is proposed to denote the computational complexity to exclude one class. Compared with five well-known methods, experiments on many benchmark data sets demonstrate our method can speed up the test phase while remain the high accuracy of SVMs.  相似文献   

19.
20.
We propose a scoring criterion, named mixture-based factorized conditional log-likelihood (mfCLL), which allows for efficient hybrid learning of mixtures of Bayesian networks in binary classification tasks. The learning procedure is decoupled in foreground and background learning, being the foreground the single concept of interest that we want to distinguish from a highly complex background. The overall procedure is hybrid as the foreground is discriminatively learned, whereas the background is generatively learned. The learning algorithm is shown to run in polynomial time for network structures such as trees and consistent κ-graphs. To gauge the performance of the mfCLL scoring criterion, we carry out a comparison with state-of-the-art classifiers. Results obtained with a large suite of benchmark datasets show that mfCLL-trained classifiers are a competitive alternative and should be taken into consideration.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号