In this paper classification on dissimilarity representations is applied to medical imaging data with the task of discrimination between normal images and images with signs of disease. We show that dissimilarity-based classification is a beneficial approach in dealing with weakly labeled data, i.e. when the location of disease in an image is unknown and therefore local feature-based classifiers cannot be trained. A modification to the standard dissimilarity-based approach is proposed that makes a dissimilarity measure multi-valued, hence, able to retain more information. A multi-valued dissimilarity between an image and a prototype becomes an image representation vector in classification. Several classification outputs with respect to different prototypes are further integrated into a final image decision. Both standard and proposed methods are evaluated on data sets of chest radiographs with textural abnormalities and compared to several feature-based region classification approaches applied to the same data. On a tuberculosis data set the multi-valued dissimilarity-based classification performs as well as the best region classification method applied to the fully labeled data, with an area under the receiver operating characteristic (ROC) curve (Az) of 0.82. The standard dissimilarity-based classification yields Az=0.80. On a data set with interstitial abnormalities both dissimilarity-based approaches achieve Az=0.98 which is closely behind the best region classification method.  相似文献   

Extended morphological profile (EMP) is an important mathematical tool for extracting structural information from the hyperspectral images. However, the accuracy of the EMP-based classification is greatly influenced by the choice of structuring element (SE). In this article, two supervised classification frameworks multiclassifier system with morphological profiles (MCSMP) and MCSMP2 are proposed that exploit rich spectral and structural information of hyperspectral images using EMPs and multiclassifier system for better classification than conventional methods. The EMPs with SEs of multiple shapes are used instead of one particular shape to better detect the response from the structures in the image. The EMPs created from SEs of different shapes are independently classified followed by decision fusion to generate final classification map. The classification results are compared with the conventional pixelwise and other EMP-based methods. The experimental results from three different types of hyperspectral data sets demonstrate that the proposed methods have significantly improved the spectral approach and outperformed the other studied methods in terms of classification accuracy. The new methods are more robust to the noise and produce good classification accuracy with very limited training samples. Various decision fusion techniques are evaluated, which performed differently in tested scenarios. Two different classifiers, Support Vector Machine (SVM) and random forest, are used in the experiments. It is shown that the proposed methods perform better with random forest classifier.  相似文献   

The Journal of Supercomputing - Extended profiles are an important technique for modelling the spatial information of hyperspectral images at different levels of detail. They are used extensively...  相似文献   

Gene selection is one of the important issues for cancer classification based on gene expression profiles. Filter and wrapper approaches are widely used for gene selection, where the former is hard to measure the relationship between genes and the latter requires lots of computation. We present a novel method, called gene boosting, to select relevant gene subsets by integrating filter and wrapper approaches. It repeatedly selects a set of top-ranked informative genes by a filtering algorithm with respect to a temporal training dataset constructed according to the classification result for the original training dataset. Empirical results on three microarray benchmark datasets have shown that the proposed method is effective and efficient in finding a relevant and concise gene subset. It achieved competitive performance with fewer genes in a reasonable time, as well as led to the identification of some genes frequently getting selected.  相似文献   

随着大规模基因表达谱技术的发展,基于基因表达谱的癌症诊断方法正在成为临床医学上一种快速有效的诊断方法,但是由于基因表达数据维数过高、样本量小、噪声大,使得正确提取有关癌症的特征基因成为关键。以结肠癌肿瘤的基因表达谱数据为例,提出了结合Fisher权函数、离散傅里叶变换和主成分分析的混合特征基因提取方法,以多元Logistic回归分析和贝叶斯决策作为分类器进行肿瘤分类检测。实验结果表明,该方法对于结肠癌数据集CV识别准确率高达96.80%。  相似文献   

Using the multitemporal multispectral data acquired by Landsat satellites and a physical model describing this behavior, new features that are crop specific have been derived. The new feature space is two-dimensional irrespective of the number of Landsat observations. A feasibility study, over 40 sites, has been performed to classify the segment pixels into those of corn, soybeans, and others using these new features and a linear classifier. The results compare very favorably with other existing methods. The results also indicate where additional accuracy gains can be made.  相似文献   

Nowadays, the use of hyperspectral sensors has been extended to a variety of applications such as the classification of remote-sensing images. Recently, a spectral–spatial classification scheme (ELM-EMP) based on Extreme Learning Machine (ELM) and Extended Morphological Profiles (EMPs) computed using Principal Component Analysis (PCA) and morphological operations has been introduced. In this work, an efficient implementation of this scheme over commodity Graphics Processing Units (GPUs) is shown. Additionally, several techniques and optimizations are introduced to improve the accuracy of the classification. In particular, a scheme using an ELM classifier based on kernels (KELM) and EMP is presented (KELM-EMP). Similar schemes adding a spatial regularization process (KELM-EMP-S and ELM-EMP-S) are also proposed. Moreover, two PCA algorithms have been compared in both accuracy and speed terms. Regarding the GPU projection, different techniques and optimizations have been applied such as the use of optimized Compute Unified Device Architecture (CUDA) libraries or a block-asynchronous execution technique. As a result, the accuracy obtained by the two proposed schemes (ELM-EMP-S and KELM-EMP-S) is better than for the original scheme ELM-EMP and the execution time has been significantly reduced.  相似文献   

Typically, the fundamental assumption in non-linear regression models is the normality of the errors. Even though this model offers great flexibility for modeling these effects, it suffers from the same lack of robustness against departures from distributional assumptions as other statistical models based on the Gaussian distribution. It is of practical interest, therefore, to study non-linear models which are less sensitive to departures from normality, as well as related assumptions. Thus the current methods proposed for linear regression models need to be extended to non-linear regression models. This paper discusses non-linear regression models for longitudinal data with errors that follow a skew-elliptical distribution. Additionally, we discuss Bayesian statistical methods for the classification of observations into two or more groups based on skew-models for non-linear longitudinal profiles. Parameter estimation for a discriminant model that classifies individuals into distinct predefined groups or populations uses appropriate posterior simulation schemes. The methods are illustrated with data from a study involving 173 pregnant women. The main objective in this study is to predict normal versus abnormal pregnancy outcomes from beta human chorionic gonadotropin data available at early stages of pregnancy.  相似文献   

This paper presents a new approach to the analysis of hyperspectral images, a new class of image data that is mainly used in remote sensing applications. The method is based on the generalization of concepts from mathematical morphology to multi-channel imagery. A new vector organization scheme is described, and fundamental morphological vector operations are defined by extension. Theoretical definitions of extended morphological operations are used in the formal definition of the concept of extended morphological profile, which is used for multi-scale analysis of hyperspectral data. This approach is particularly well suited for the analysis of image scenes where most of the pixels collected by the sensor are characterized by their mixed nature, i.e. they are formed by a combination of multiple underlying responses produced by spectrally distinct materials. Experimental results demonstrate the applicability of the proposed technique in mixed pixel analysis of simulated and real hyperspectral data collected by the NASA/Jet Propulsion Laboratory Airborne Visible/Infrared Imaging Spectrometer and the DLR Digital Airborne (DAIS 7915) and Reflective Optics System Imaging Spectrometers. The proposed method works effectively in the presence of noise and low spatial resolution. A quantitative and comparative performance study with regards to other standard hyperspectral analysis methodologies reveals that the combined utilization of spatial and spectral information in the proposed technique produces classification results which are superior to those found by using the spectral information alone.  相似文献   

Properly designing a wavelet neural network (WNN) is crucial for achieving the optimal generalization performance. In this paper, two different approaches were proposed for improving the predictive capability of WNNs. First, the types of activation functions used in the hidden layer of the WNN were varied. Second, the proposed enhanced fuzzy c-means clustering algorithm—specifically, the modified point symmetry-based fuzzy c-means (MSFCM) algorithm—was employed in selecting the locations of the translation vectors of the WNN. The modified WNN was then applied to heterogeneous cancer classification using four different microarray benchmark datasets. The comparative experimental results showed that the proposed methodology achieved an almost 100% classification accuracy in multiclass cancer prediction, leading to superior performance with respect to other clustering algorithms. Subsequently, performance comparisons with other classifiers were made. An assessment analysis showed that this proposed approach outperformed most of the other classifiers.  相似文献   

从癌症基因表达谱分析入手,针对基因表达谱维数高、样本少的特点,提出一种用于癌症分类的基于邻域粗糙集和概率神经网络集成的分类方法.首先利用Relief算法对基因进行排序,然后利用邻域粗糙集选取分类特征基因,最后结合概率神经网络集成分类模型进行癌症分类.实验结果表明,该方法可以快速有效地选取癌症特征基因,能获得更好的分类效...  相似文献   

Consider a supervised learning problem in which examples contain both numerical- and text-valued features. To use traditional feature-vector-based learning methods, one could treat the presence or absence of a word as a Boolean feature and use these binary-valued features together with the numerical features. However, the use of a text-classification system on this is a bit more problematic—in the most straight-forward approach each number would be considered a distinct token and treated as a word. This paper presents an alternative approach for the use of text classification methods for supervised learning problems with numerical-valued features in which the numerical features are converted into bag-of-words features, thereby making them directly usable by text classification methods. We show that even on purely numerical-valued data the results of text classification on the derived text-like representation outperforms the more naive numbers-as-tokens representation and, more importantly, is competitive with mature numerical classification methods such as C4.5, Ripper, and SVM. We further show that on mixed-mode data adding numerical features using our approach can improve performance over not adding those features.  相似文献   

Classification using the l 2-norm-based representation is usually computationally efficient and is able to obtain high accuracy in the recognition of faces. Among l 2-norm-based representation methods, linear regression classification (LRC) and collaborative representation classification (CRC) have been widely used. LRC and CRC produce residuals in very different ways, but they both use residuals to perform classification. Therefore, by combining the residuals of these two methods, better performance for face recognition can be achieved. In this paper, a simple weighted sum based fusion scheme is proposed to integrate LRC and CRC for more accurate recognition of faces. The rationale of the proposed method is analyzed. Face recognition experiments illustrate that the proposed method outperforms LRC and CRC.  相似文献   

The quantitative evaluation of clusters has lagged far behind the development of clustering algorithms. This paper introduces a new procedure, based on probability profiles, for judging the validity of clusters established from rank-order proximity data. Probability profiles furnish a comprehensive picture of the compactness and isolation of a cluster, scaled in probability units. Given a rank-order proximity matrix and a cluster to be examined, profiles compare the cluster's upper bounds on the best compactness and isolation indices one would expect in a randomly chosen graph.After reviewing the pertinent literature this paper explains the background from graph theory and cluster analysis needed to treat cluster validity. The probabilities and bounds needed to form cluster profiles are derived and strategies for using profiles are suggested. Special attention is given to the underlying probability models.Profiles are demonstrated on four artificially generated data sets, two of which have good hierarchical structure, and on data from a speaker recognition project. They reject spurious clusters and accept apparently valid clusters. Since profiles quantify the interaction between a cluster and its environment, they provide a much richer source of information on cluster structure than single-number indices proposed in the literature.  相似文献   

专利和期刊隶属于不同的知识组织体系,要实现专利与期刊文献的交叉浏览和检索必须解决两种分类法(中国图书馆分类法(CLC)和国际专利分类法(IPC))之间的映射问题。在调研现有分类法类目映射方法的基础上,讨论了基于机器学习实现中国图书馆分类法和国际专利分类法之间类目映射的方法。通过对中图法某个类目标识的语料进行训练得到该类目的分类器,然后用其对国际专利分类法标识的语料进行分类,对分类结果进行分析得出类目间的映射关系。对比实验证明了该方法的有效性。  相似文献   

In this paper, we make a comparative study of the effectiveness of ensemble technique for sentiment classification. The ensemble framework is applied to sentiment classification tasks, with the aim of efficiently integrating different feature sets and classification algorithms to synthesize a more accurate classification procedure. First, two types of feature sets are designed for sentiment classification, namely the part-of-speech based feature sets and the word-relation based feature sets. Second, three well-known text classification algorithms, namely na?¨ve Bayes, maximum entropy and support vector machines, are employed as base-classifiers for each of the feature sets. Third, three types of ensemble methods, namely the fixed combination, weighted combination and meta-classifier combination, are evaluated for three ensemble strategies. A wide range of comparative experiments are conducted on five widely-used datasets in sentiment classification. Finally, some in-depth discussion is presented and conclusions are drawn about the effectiveness of ensemble technique for sentiment classification.  相似文献   

