首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper we use genetic programming for changing the representation of the input data for machine learners. In particular, the topic of interest here is feature construction in the learning-from-examples paradigm, where new features are built based on the original set of attributes. The paper first introduces the general framework for GP-based feature construction. Then, an extended approach is proposed where the useful components of representation (features) are preserved during an evolutionary run, as opposed to the standard approach where valuable features are often lost during search. Finally, we present and discuss the results of an extensive computational experiment carried out on several reference data sets. The outcomes show that classifiers induced using the representation enriched by the GP-constructed features provide better accuracy of classification on the test set. In particular, the extended approach proposed in the paper proved to be able to outperform the standard approach on some benchmark problems on a statistically significant level.  相似文献   

2.
Many modern computer vision algorithms are built atop of a set of low-level feature operators (such as SIFT [23,24]; HOG [8,3]; or LBP [1,2]) that transform raw pixel values into a representation better suited to subsequent processing and classification. While the choice of feature representation is often not central to the logic of a given algorithm, the quality of the feature representation can have critically important implications for performance. Here, we demonstrate a large-scale feature search approach to generating new, more powerful feature representations in which a multitude of complex, nonlinear, multilayer neuromorphic feature representations are randomly generated and screened to find those best suited for the task at hand. In particular, we show that a brute-force search can generate representations that, in combination with standard machine learning blending techniques, achieve state-of-the-art performance on the Labeled Faces in the Wild (LFW) [19] unconstrained face recognition challenge set. These representations outperform previous state-of-the-art approaches, in spite of requiring less training data and using a conceptually simpler machine learning backend. We argue that such large-scale-search-derived feature sets can play a synergistic role with other computer vision approaches by providing a richer base of features with which to work.  相似文献   

3.
针对遥感图像中高光谱数据的分类问题,提出一种基于堆叠稀疏自动编码器(SSAE)深度学习特征表示的高光谱遥感图像分类方法。首先,将光谱数据样本进行预处理和归一化。然后,将其输入到SSAE中进行特征表示学习,并通过网格搜索来获得最优网络参数,以此获得有效的特征表示。最后通过支持向量机(SVM)分类器对输入图像特征进行分类,最终实现遥感图像中像素的分类。在两个标准数据集上的实验结果表明,该方法能够实现准确的高光谱地物分类。  相似文献   

4.
孙林  赵婧  徐久成  王欣雅 《计算机应用》2022,42(5):1355-1366
针对经典的帝王蝶优化(MBO)算法不能很好地处理连续型数据,以及粗糙集模型对于大规模、高维复杂的数据处理能力不足等问题,提出了基于邻域粗糙集(NRS)和MBO的特征选择算法。首先,将局部扰动和群体划分策略与MBO算法结合,并构建传输机制以形成一种二进制MBO(BMBO)算法;其次,引入突变算子增强算法的探索能力,设计了基于突变算子的BMBO(BMBOM)算法;然后,基于NRS的邻域度构造适应度函数,并对初始化的特征子集的适应度值进行评估并排序;最后,使用BMBOM算法通过不断迭代搜索出最优特征子集,并设计了一种元启发式特征选择算法。在基准函数上评估BMBOM算法的优化性能,并在UCI数据集上评价所提出的特征选择算法的分类能力。实验结果表明,在5个基准函数上,BMBOM算法的最优值、最差值、平均值以及标准差明显优于MBO和粒子群优化(PSO)算法;在UCI数据集上,与基于粗糙集的优化特征选择算法、结合粗糙集与优化算法的特征选择算法、结合NRS与优化算法的特征选择算法、基于二进制灰狼优化的特征选择算法相比,所提特征选择算法在分类精度、所选特征数和适应度值这3个指标上表现良好,能够选择特征数少且分类精度高的最优特征子集。  相似文献   

5.
Input feature selection for classification problems   总被引:30,自引:0,他引:30  
Feature selection plays an important role in classifying systems such as neural networks (NNs). We use a set of attributes which are relevant, irrelevant or redundant and from the viewpoint of managing a dataset which can be huge, reducing the number of attributes by selecting only the relevant ones is desirable. In doing so, higher performances with lower computational effort is expected. In this paper, we propose two feature selection algorithms. The limitation of mutual information feature selector (MIFS) is analyzed and a method to overcome this limitation is studied. One of the proposed algorithms makes more considered use of mutual information between input attributes and output classes than the MIFS. What is demonstrated is that the proposed method can provide the performance of the ideal greedy selection algorithm when information is distributed uniformly. The computational load for this algorithm is nearly the same as that of MIFS. In addition, another feature selection algorithm using the Taguchi method is proposed. This is advanced as a solution to the question as to how to identify good features with as few experiments as possible. The proposed algorithms are applied to several classification problems and compared with MIFS. These two algorithms can be combined to complement each other's limitations. The combined algorithm performed well in several experiments and should prove to be a useful method in selecting features for classification problems.  相似文献   

6.
Qinghua Hu  Jinfu Liu  Daren Yu 《Knowledge》2008,21(4):294-304
Feature subset selection presents a common challenge for the applications where data with tens or hundreds of features are available. Existing feature selection algorithms are mainly designed for dealing with numerical or categorical attributes. However, data usually comes with a mixed format in real-world applications. In this paper, we generalize Pawlak’s rough set model into δ neighborhood rough set model and k-nearest-neighbor rough set model, where the objects with numerical attributes are granulated with δ neighborhood relations or k-nearest-neighbor relations, while objects with categorical features are granulated with equivalence relations. Then the induced information granules are used to approximate the decision with lower and upper approximations. We compute the lower approximations of decision to measure the significance of attributes. Based on the proposed models, we give the definition of significance of mixed features and construct a greedy attribute reduction algorithm. We compare the proposed algorithm with others in terms of the number of selected features and classification performance. Experiments show the proposed technique is effective.  相似文献   

7.
Upon a change of input data, one usually wants an update of output computed from the data rather than recomputing the whole output over again. In Formal Concept Analysis, update of concept lattice of input data when introducing new objects to the data can be done by any of the so-called incremental algorithms for computing concept lattice. The algorithms use and update the lattice while introducing new objects to input data one by one. The present concept lattice of input data without the new objects is thus required by the computation. However, the lattice can be large and may not fit into memory. In this paper, we propose an efficient algorithm for updating the lattice from the present and new objects only, not requiring the possibly large concept lattice of present objects. The algorithm results as a modification of the Close-by-One algorithm for computing the set of all formal concepts, or its modifications like Fast Close-by-One, Parallel Close-by-One or Parallel Fast Close-by-One, to compute new and modified formal concepts and the changes of the lattice order relation only. The algorithm can be used not only for updating the lattice when new objects are introduced but also when some existing objects are removed from the input data or attributes of the objects are changed. We describe the algorithm, discuss efficiency issues and present an experimental evaluation of its performance and a comparison with the AddIntent incremental algorithm for computing concept lattice.  相似文献   

8.
《Image and vision computing》2002,20(9-10):631-638
In this paper, a novel approach for performing classification is presented. Discriminant functions are constructed by combining selected features from the feature set with simple mathematical functions such as +, −, ×, ÷ , max, min. These discriminant functions are capable of forming non-linear discontinuous hypersurfaces. For multimodal data, more than one discriminant function may be combined with logical operators before classification is performed. An algorithm capable of making decisions as to whether a combination of discriminant functions is needed to classify a data sample, or whether a single discriminant function will suffice, is developed. The algorithms used to perform classification are not written by a human. The algorithms are learnt, or rather evolved, using evolutionary computing techniques.  相似文献   

9.
针对不满足忠实分布的高维数据分类问题,一种新的基于粒子群算法的马尔科夫毯特征选择方法被提出。它通过有效地提取相关特征和剔除冗余特征,能够产生更好的分类结果。在特征预处理阶段,该算法通过最大信息系数衡量标准对特征的相关度和冗余性进行分析得到类属性的马尔科夫毯代表集和次最优特征子集;在搜索评价阶段,采用新的适应度函数通过粒子群算法选出最优特征子集;用此模型对测试集进行预测。实验结果表明,该算法在12个数据集上具有一定的优势。  相似文献   

10.
为了解决图像匹配算法中存在的匹配效率低、时间复杂度与计算量高等问题,通过结合稀疏表示和拓扑相似性,提出了一种图像匹配算法。该算法先对图像进行特征检测,计算轮廓相似度,找到待匹配图像中相似的最大轮廓区域,用稀疏编码对轮廓内特征进行稀疏表示,建立稀疏模型,将复杂特征变得单一化,但又不影响特征的分类方式,将相同类别或者相同属性的特征归为同一特征集,结合稀疏表示和邻域互信息的类属属性学习。计算得到变换矩阵,用以表示图像。利用结构化的拓扑相似性,对轮廓内外相关联的点进行优化。最后,分别从主观评价和客观评价两个方面对算法进行分析,结果表明提出的新算法与其他图像匹配算法相比较,具有明显匹配精度与效果,提出的算法在提高匹配效率及复杂度等方面具有较好优势。  相似文献   

11.
周靖  刘晋胜 《计算机应用》2011,31(7):1785-1788
特征参数分类泛化性差及分类计算量大影响着K近邻(KNN)的分类性能。提出了一种降维条件下基于联合熵的改进KNN算法,其具体思路是,通过计算任意两个条件属性下对应的特征参数的联合熵衡量数据特征针对分类影响程度的大小,建立特征分类特性与具体分类过程的内在联系,并给出根据特征联合熵集约简条件属性的方法。理论分析与仿真实验表明,与经典KNN等算法相比,提出的算法具有更高的分类性能。  相似文献   

12.
This paper describes an approach that uses multi-label classification methods for search tagged learning objects (LOs) by Learning Object Metadata (LOM), specifically the model offers a methodology that illustrates the task of multi-label mapping of LOs into types queries through an emergent multi-label space, and that can improve the first choice of learners or teachers. In order to build the model, the paper also proposes and preliminarily investigates the use of multi-label classification algorithm using only the LO features. As many LOs include textual material that can be indexed, and such indexes can also be used to filter the objects by matching them against user-provided keywords, we then did experiments using web classification with text features to compare the accuracy with the results from metadata (LO feature).  相似文献   

13.
粗糙集是一种能够有效处理不精确、不完备和不确定信息的数学工具,粗糙集的属性约简可以在保持文本情感分类能力不变的情况下对文本情感词特征进行约简。针对情感词特征空间维数过高、情感词特征表示缺少语义信息的问题,该文提出了RS-WvGv中文文本情感词特征表示方法。利用粗糙集决策表对整个语料库进行情感词特征建模,采用Johnson粗糙集属性约简算法对决策表进行化简,保留最小的文本情感词特征属性集,之后再对该集合中的所有情感特征词进行词嵌入表示,最后用逻辑回归分类器验证RS-WvGv方法的有效性。另外,该文还定义了情感词特征属性集覆盖力,用于表示文本情感词特征属性集合对语料库的覆盖能力。最后,在实验对比的过程中,用统计检验进一步验证了该方法的有效性。  相似文献   

14.
We propose an alternative approach to classification that differs from known approaches in that instead of comparing the tuple of values of a test object’s features with similar tuples of features for objects in the training set, in this approach we make independent pairwise comparisons of every pair of feature values for the objects being compared. Here instead of using the notion of a “nearest neighbors” for test object, we introduce the notion of “admissible proximity” for each feature value in the test object. In this approach, we propose an alternative algorithm for classification that has a number of significant practical features. The algorithm’s quality was evaluated on sample problems taken from the well-known UCI repository and related to various aspects of human activity. The results show that the algorithm is competitive compared to known classification algorithms.  相似文献   

15.
吴晟  李星 《计算机应用》2008,28(9):2345-2348
分布式搜索是解决对深层网络搜索的有效方案,各节点的索引量大小是分布式搜索引擎描述选择节点的重要参数。为了解决在非合作环境中估算节点索引量大小的问题,提出并实现了基于高频词汇再采样的高频再采样算法和基于文档捕获概率不同假设的异概捕获算法。高频再采样算法在随机采样后基于样本集中的高频词汇进行再采样;而异概捕获算法则利用Logistic函数和条件似然方法估算节点的索引量大小。通过真实网络数据的实验结果表明,这些算法优于已有的采样-再采样与捕获-再捕获算法。  相似文献   

16.
针对图像识别算法中图像集上几何曲面的特定分类会导致判别信息丢失的问题,提出一种融合卷积神经网络的改进型迭代深度学习算法(IIDLA)。该算法采用混合卷积网(PCL)进行底层的平移不变特征学习,以层次化的方式迭代应用卷积神经网络(CNN)对输入图像集的不同非线性特征进行学习。算法的图库和查询实例中包括了不同视角、背景、面部表情、解析度和照明度的人脸或物体图像集。采用数据集将提出的算法与其他算法进行评估对比,实验结果表明,提出的算法在被测数据集上的性能最优。  相似文献   

17.
Ultrasound imaging is the most suitable method for early detection of prostate cancer. It is very difficult to distinguish benign and malignant nature of the affliction in the early stage of cancer. This is reflected in the high percentage of unnecessary biopsies that are performed and many deaths caused by late detection or misdiagnosis. A computer based classification system can provide a second opinion to the radiologists. Generally, objects are described in terms of a set of measurable features in pattern recognition. The selection and quality of the features representing each pattern will have a considerable bearing on the success of subsequent pattern classification. Feature selection is a process of selecting the most wanted or dominating features set from the original features set in order to reduce the cost of data visualization and increasing classification efficiency and accuracy. The region of interest (ROI) is identified from transrectal ultrasound (TRUS) images using DBSCAN clustering with morphological operators after image enhancement using M3-filter. Then the 22 grey level co-occurrence matrix features are extracted from the ROIs. Soft computing model based feature selection algorithms genetic algorithm (GA), ant colony optimization (ACO) and QR are studied. In this paper, QR-ACO (hybridization of rough set based QR and ACO) and GA-ACO (hybridization GA and ACO) are proposed for reducing feature set in order to increase the accuracy and efficiency of the classification with regard to prostate cancer. The selected features may have the best discriminatory power for classifying prostate cancer based on TRUS images. Support vector machine is tailored for evaluation of the proposed feature selection methods through classification. Then, the comparative analysis is performed among these methods. Experimental results show that the proposed method QR-ACO produces significant results. Number of features selected using QR-ACO algorithm is minimal, is successful and has high detection accuracy.  相似文献   

18.
gMLC: a multi-label feature selection framework for graph classification   总被引:1,自引:1,他引:0  
Graph classification has been showing critical importance in a wide variety of applications, e.g. drug activity predictions and toxicology analysis. Current research on graph classification focuses on single-label settings. However, in many applications, each graph data can be assigned with a set of multiple labels simultaneously. Extracting good features using multiple labels of the graphs becomes an important step before graph classification. In this paper, we study the problem of multi-label feature selection for graph classification and propose a novel solution, called gMLC, to efficiently search for optimal subgraph features for graph objects with multiple labels. Different from existing feature selection methods in vector spaces that assume the feature set is given, we perform multi-label feature selection for graph data in a progressive way together with the subgraph feature mining process. We derive an evaluation criterion to estimate the dependence between subgraph features and multiple labels of graphs. Then, a branch-and-bound algorithm is proposed to efficiently search for optimal subgraph features by judiciously pruning the subgraph search space using multiple labels. Empirical studies demonstrate that our feature selection approach can effectively boost multi-label graph classification performances and is more efficient by pruning the subgraph search space using multiple labels.  相似文献   

19.
为了克服主成分分析(PCA)对共空间模式(CSP)提取脑电信号特征进行降维时,仅考虑主成分对输入变量的表征能力,而忽略了对输出变量进行解释的这一个缺点,提出偏最小二乘回归(PLS)进行降维,通过CSP对数据增强后的信号进行特征提取,采用PLS进行降维,将提取的主成分信息包含对因变量解释程度高的特征作为特征向量,使用PSO-SVM进行分类,用2005 BCI竞赛的数据集IIIa进行分类测试,结果得到3位被试的想象运动平均分类正确率91.71%,通过与CSP-LDS、WL-CSP和CSP等算法的比较,3位被试的平均分类正确率最高,验证了该算法的有效性。  相似文献   

20.
Given a large set of potential features, it is usually necessary to find a small subset with which to classify. The task of finding an optimal feature set is inherently combinatoric and therefore suboptimal algorithms are typically used to find feature sets. If feature selection is based directly on classification error, then a feature-selection algorithm must base its decision on error estimates. This paper addresses the impact of error estimation on feature selection using two performance measures: comparison of the true error of the optimal feature set with the true error of the feature set found by a feature-selection algorithm, and the number of features among the truly optimal feature set that appear in the feature set found by the algorithm. The study considers seven error estimators applied to three standard suboptimal feature-selection algorithms and exhaustive search, and it considers three different feature-label model distributions. It draws two conclusions for the cases considered: (1) depending on the sample size and the classification rule, feature-selection algorithms can produce feature sets whose corresponding classifiers possess errors far in excess of the classifier corresponding to the optimal feature set; and (2) for small samples, differences in performances among the feature-selection algorithms are less significant than performance differences among the error estimators used to implement the algorithms. Moreover, keeping in mind that results depend on the particular classifier-distribution pair, for the error estimators considered in this study, bootstrap and bolstered resubstitution usually outperform cross-validation, and bolstered resubstitution usually performs as well as or better than bootstrap.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号