共查询到20条相似文献,搜索用时 15 毫秒
1.
In this paper classification on dissimilarity representations is applied to medical imaging data with the task of discrimination between normal images and images with signs of disease. We show that dissimilarity-based classification is a beneficial approach in dealing with weakly labeled data, i.e. when the location of disease in an image is unknown and therefore local feature-based classifiers cannot be trained. A modification to the standard dissimilarity-based approach is proposed that makes a dissimilarity measure multi-valued, hence, able to retain more information. A multi-valued dissimilarity between an image and a prototype becomes an image representation vector in classification. Several classification outputs with respect to different prototypes are further integrated into a final image decision. Both standard and proposed methods are evaluated on data sets of chest radiographs with textural abnormalities and compared to several feature-based region classification approaches applied to the same data. On a tuberculosis data set the multi-valued dissimilarity-based classification performs as well as the best region classification method applied to the fully labeled data, with an area under the receiver operating characteristic (ROC) curve (Az) of 0.82. The standard dissimilarity-based classification yields Az=0.80. On a data set with interstitial abnormalities both dissimilarity-based approaches achieve Az=0.98 which is closely behind the best region classification method. 相似文献
2.
Jin-Hyuk Hong Author Vitae Author Vitae 《Pattern recognition》2009,42(9):1761-1767
Gene selection is one of the important issues for cancer classification based on gene expression profiles. Filter and wrapper approaches are widely used for gene selection, where the former is hard to measure the relationship between genes and the latter requires lots of computation. We present a novel method, called gene boosting, to select relevant gene subsets by integrating filter and wrapper approaches. It repeatedly selects a set of top-ranked informative genes by a filtering algorithm with respect to a temporal training dataset constructed according to the classification result for the original training dataset. Empirical results on three microarray benchmark datasets have shown that the proposed method is effective and efficient in finding a relevant and concise gene subset. It achieved competitive performance with fewer genes in a reasonable time, as well as led to the identification of some genes frequently getting selected. 相似文献
3.
Bascoy Pedro G. Quesada-Barriuso Pablo Heras Dora B. Argüello Francisco Demir Begüm Bruzzone Lorenzo 《The Journal of supercomputing》2019,75(3):1565-1579
The Journal of Supercomputing - Extended profiles are an important technique for modelling the spatial information of hyperspectral images at different levels of detail. They are used extensively... 相似文献
4.
Using the multitemporal multispectral data acquired by Landsat satellites and a physical model describing this behavior, new features that are crop specific have been derived. The new feature space is two-dimensional irrespective of the number of Landsat observations. A feasibility study, over 40 sites, has been performed to classify the segment pixels into those of corn, soybeans, and others using these new features and a linear classifier. The results compare very favorably with other existing methods. The results also indicate where additional accuracy gains can be made. 相似文献
5.
6.
Properly designing a wavelet neural network (WNN) is crucial for achieving the optimal generalization performance. In this paper, two different approaches were proposed for improving the predictive capability of WNNs. First, the types of activation functions used in the hidden layer of the WNN were varied. Second, the proposed enhanced fuzzy c-means clustering algorithm—specifically, the modified point symmetry-based fuzzy c-means (MSFCM) algorithm—was employed in selecting the locations of the translation vectors of the WNN. The modified WNN was then applied to heterogeneous cancer classification using four different microarray benchmark datasets. The comparative experimental results showed that the proposed methodology achieved an almost 100% classification accuracy in multiclass cancer prediction, leading to superior performance with respect to other clustering algorithms. Subsequently, performance comparisons with other classifiers were made. An assessment analysis showed that this proposed approach outperformed most of the other classifiers. 相似文献
7.
Rolando De la Cruz 《Computational statistics & data analysis》2008,53(2):436-449
Typically, the fundamental assumption in non-linear regression models is the normality of the errors. Even though this model offers great flexibility for modeling these effects, it suffers from the same lack of robustness against departures from distributional assumptions as other statistical models based on the Gaussian distribution. It is of practical interest, therefore, to study non-linear models which are less sensitive to departures from normality, as well as related assumptions. Thus the current methods proposed for linear regression models need to be extended to non-linear regression models. This paper discusses non-linear regression models for longitudinal data with errors that follow a skew-elliptical distribution. Additionally, we discuss Bayesian statistical methods for the classification of observations into two or more groups based on skew-models for non-linear longitudinal profiles. Parameter estimation for a discriminant model that classifies individuals into distinct predefined groups or populations uses appropriate posterior simulation schemes. The methods are illustrated with data from a study involving 173 pregnant women. The main objective in this study is to predict normal versus abnormal pregnancy outcomes from beta human chorionic gonadotropin data available at early stages of pregnancy. 相似文献
8.
Antonio Plaza Author Vitae Pablo Martinez Author VitaeAuthor Vitae Javier Plaza Author Vitae 《Pattern recognition》2004,37(6):1097-1116
This paper presents a new approach to the analysis of hyperspectral images, a new class of image data that is mainly used in remote sensing applications. The method is based on the generalization of concepts from mathematical morphology to multi-channel imagery. A new vector organization scheme is described, and fundamental morphological vector operations are defined by extension. Theoretical definitions of extended morphological operations are used in the formal definition of the concept of extended morphological profile, which is used for multi-scale analysis of hyperspectral data. This approach is particularly well suited for the analysis of image scenes where most of the pixels collected by the sensor are characterized by their mixed nature, i.e. they are formed by a combination of multiple underlying responses produced by spectrally distinct materials. Experimental results demonstrate the applicability of the proposed technique in mixed pixel analysis of simulated and real hyperspectral data collected by the NASA/Jet Propulsion Laboratory Airborne Visible/Infrared Imaging Spectrometer and the DLR Digital Airborne (DAIS 7915) and Reflective Optics System Imaging Spectrometers. The proposed method works effectively in the presence of noise and low spatial resolution. A quantitative and comparative performance study with regards to other standard hyperspectral analysis methodologies reveals that the combined utilization of spatial and spectral information in the proposed technique produces classification results which are superior to those found by using the spectral information alone. 相似文献
9.
Consider a supervised learning problem in which examples contain both numerical- and text-valued features. To use traditional feature-vector-based learning methods, one could treat the presence or absence of a word as a Boolean feature and use these binary-valued features together with the numerical features. However, the use of a text-classification system on this is a bit more problematic—in the most straight-forward approach each number would be considered a distinct token and treated as a word. This paper presents an alternative approach for the use of text classification methods for supervised learning problems with numerical-valued features in which the numerical features are converted into bag-of-words features, thereby making them directly usable by text classification methods. We show that even on purely numerical-valued data the results of text classification on the derived text-like representation outperforms the more naive numbers-as-tokens representation and, more importantly, is competitive with mature numerical classification methods such as C4.5, Ripper, and SVM. We further show that on mixed-mode data adding numerical features using our approach can improve performance over not adding those features. 相似文献
10.
Hongzhi Zhang Faqiang Wang Yan Chen Dapeng Zhang Kuanquan Wang Jingdong Liu 《Neural computing & applications》2014,25(3-4):833-838
Classification using the l 2-norm-based representation is usually computationally efficient and is able to obtain high accuracy in the recognition of faces. Among l 2-norm-based representation methods, linear regression classification (LRC) and collaborative representation classification (CRC) have been widely used. LRC and CRC produce residuals in very different ways, but they both use residuals to perform classification. Therefore, by combining the residuals of these two methods, better performance for face recognition can be achieved. In this paper, a simple weighted sum based fusion scheme is proposed to integrate LRC and CRC for more accurate recognition of faces. The rationale of the proposed method is analyzed. Face recognition experiments illustrate that the proposed method outperforms LRC and CRC. 相似文献
11.
The quantitative evaluation of clusters has lagged far behind the development of clustering algorithms. This paper introduces a new procedure, based on probability profiles, for judging the validity of clusters established from rank-order proximity data. Probability profiles furnish a comprehensive picture of the compactness and isolation of a cluster, scaled in probability units. Given a rank-order proximity matrix and a cluster to be examined, profiles compare the cluster's upper bounds on the best compactness and isolation indices one would expect in a randomly chosen graph.After reviewing the pertinent literature this paper explains the background from graph theory and cluster analysis needed to treat cluster validity. The probabilities and bounds needed to form cluster profiles are derived and strategies for using profiles are suggested. Special attention is given to the underlying probability models.Profiles are demonstrated on four artificially generated data sets, two of which have good hierarchical structure, and on data from a speaker recognition project. They reject spurious clusters and accept apparently valid clusters. Since profiles quantify the interaction between a cluster and its environment, they provide a much richer source of information on cluster structure than single-number indices proposed in the literature. 相似文献
12.
Petra Perner 《Applied Intelligence》2008,28(3):238-246
Image-based diagnostic tools are important tools for the determination of diseases in many medical applications. The interpretation
of these images is often done manually, based on prototypical images. Consequently, only a few images collected into an image
catalogue are initially available as a basis for the development of an automatic image-interpretation system. In this paper
we study the question if it is possible to build up an image-interpretation system based on such an image catalogue. We call
the system catalogue-based image classifier. The system is provided with feature-subset selection, feature weighting, and
prototype selection. The performance of the catalogue-based classifier is assessed by studying the accuracy and the reduction
of the prototypes after applying a prototype-selection algorithm. We describe the results that could be achieved and give
an outlook for further developments on a catalogue-based classifier. 相似文献
13.
In this paper, we make a comparative study of the effectiveness of ensemble technique for sentiment classification. The ensemble framework is applied to sentiment classification tasks, with the aim of efficiently integrating different feature sets and classification algorithms to synthesize a more accurate classification procedure. First, two types of feature sets are designed for sentiment classification, namely the part-of-speech based feature sets and the word-relation based feature sets. Second, three well-known text classification algorithms, namely na?¨ve Bayes, maximum entropy and support vector machines, are employed as base-classifiers for each of the feature sets. Third, three types of ensemble methods, namely the fixed combination, weighted combination and meta-classifier combination, are evaluated for three ensemble strategies. A wide range of comparative experiments are conducted on five widely-used datasets in sentiment classification. Finally, some in-depth discussion is presented and conclusions are drawn about the effectiveness of ensemble technique for sentiment classification. 相似文献
14.
15.
In this paper,we investigate a new problem–misleading classification in which each test instance is associated with an original class and a misleading class.Its goal for the data owner is to form the training set out of candidate instances such that the data miner will be misled to classify those test instances to their misleading classes rather than original classes.We discuss two cases of misleading classification.For the case where the classification algorithm is unknown to the data owner,a KNN based Ranking Algorithm(KRA)is proposed to rank all candidate instances based on the similarities between candidate instances and test instances.For the case where the classification algorithm is known,we propose a Greedy Ranking Algorithm(GRA)which evaluates each candidate instance by building up a classifier to predict the test set.In addition,we also show how to accelerate GRA in an incremental way when naive Bayes is employed as the classification algorithm.Experiments on 16 UCI data sets indicated that the ranked candidate instances by KRA can achieve promising leaking and misleading rates.When the classification algorithm is known,GRA can dramatically outperform KRA in terms of leaking and misleading rates though more running time is required. 相似文献
16.
Luisa Massari 《Information Systems Frontiers》2010,12(4):361-367
Online social networks have attracted millions of users, who have integrated social network web sites into their daily life.
Users participate to the changes and to the evolution of these sites because they are producers and reviewers of contents
that help them to maintain the existing social relationships, make new friends, collaborate and enrich experiences. This paper
presents a study of the characteristics of the users of MySpace web site, with the objective of studying relationships and
interactions among users and deriving hints about their behavior. The analysis relies on data collected by monitoring the
web site for 12 weeks. Typical user behaviors have been derived and classes of users characterized by different levels of
participation to the social network have been identified. In particular, the analysis reveals that most of the users actively
participate to the social network and specify many personal details. Social networks web sites allow access to such details;
the sharing of information about users and their relationships can lead to non-ethic online activities, which threat the privacy
and the security of users themselves. 相似文献
17.
A. Schechter 《Computer aided design》1978,10(2):101-109
A technique is presented for blending curvature profiles and creating a set of intermediate curves which gradually change their shape from that of one boundary curve to that of a second boundary curve. The use of this technique for calculating a 3D shape and its extension to blending both curvature and torsion profiles are discussed. 相似文献
18.
《传感器与微系统》2019,(9)
针对人体姿态监测传感器所返回数据的不平衡性特点影响分类性能的问题,提出一种基于不平衡数据分类的人体姿态分类算法。根据姿态监测传感器所返回数据的特点,基于K-means的思想,提出一种噪声样本识别算法。针对样本集的不平衡性问题,本文通过引入经典的过采样算法SMOTE,对少数类样本集进行操作。利用Adaboost学习框架的优势,对平衡后的样本集进行训练,获得最终分类模型。选择G-mean、F-value及AUC为分类模型的评价指标,通过在ARe Mr人体姿态数据集上与三种经典的不平衡分类模型CUS-Boost、SMOTEBoost以及RUS-Boost算法相对比。验证了本文所提出的基于不平衡数据分类的人体姿态分类算法有效性、精准性。 相似文献
19.
Dash Jatindra Kumar Mukhopadhyay Sudipta Gupta Rahul Das 《Multimedia Tools and Applications》2017,76(2):2535-2556
Multimedia Tools and Applications - This paper proposes a simple yet effective novel classifier fusion strategy for multi-class texture classification. The resulting classification framework is... 相似文献
20.
In order to effectively optimize complex programs built in a layered or recursive fashion (possibly from reused general components), the programmer has a critical need for performance information connected directly to the design decisions and other optimization opportunities present in the code. Call path refinement profiles are novel tools for guiding the optimization of such programs, that: (1) provide detailed performance information about arbitrarily nested (direct or indirect) function call sequences, and (2) focus the user's attention on performance bottlenecks by limiting and aggregating the information presented. This paper discusses the motivation for such profiles, describes in detail their implementation in the CPPROF profiler, and relates them to previous profilers, showing how most widely available profilers can be expressed simply and efficiently in terms of call path refinements 相似文献