首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 806 毫秒
1.
After projecting high dimensional data into a two-dimension map via the SOM, users can easily view the inner structure of the data on the 2-D map. In the early stage of data mining, it is useful for any kind of data to inspect their inner structure. However, few studies apply the SOM to transactional data and the related categorical domain, which are usually accompanied with concept hierarchies. Concept hierarchies contain information about the data but are almost ignored in such researches. This may cause mistakes in mapping. In this paper, we propose an extended SOM model, the SOMCD, which can map the varied kinds of data in the categorical domain into a 2-D map and visualize the inner structure on the map. By using tree structures to represent the different kinds of data objects and the neurons’ prototypes, a new devised distance measure which takes information embedded in concept hierarchies into consideration can properly find the similarity between the data objects and the neurons. Besides the distance measure, we base the SOMCD on a tree-growing adaptation method and integrate the U-Matrix for visualization. Users can hierarchically separate the trained neurons on the SOMCD's map into different groups and cluster the data objects eventually. From the experiments in synthetic and real datasets, the SOMCD performs better than other SOM variants and clustering algorithms in visualization, mapping and clustering.  相似文献   

2.
耦合场协同仿真中节点载荷插值的混合法   总被引:2,自引:0,他引:2  
宋少云  李世其 《计算机仿真》2006,23(8):73-75,125
节点载荷插值是使用协同设计方法进行耦合场仿真的一个关键。首先介绍了有限元软件内部的多场耦合的节点映射方法,然后把协同设计方式下多场耦合的插值算法归结为散乱点的插值。分析了映射法、自然邻居插值法、滑动最小二乘法和反距离移动平均法对于散乱点插值的优缺点,在此基础上提出了耦合场协同仿真的混合法。该法首先使用自然邻居插值法对大多数内点进行插值,接着使用滑动最小二乘法对外点插值,对余下的外点使用反距离移动平均法进行插值。对滑动最小二乘法做了改进以进行精确插值。最后用一个热应力的算例进行验证,表明该算法具有良好的精度、速度和稳定性。  相似文献   

3.
A cancers disease in virtually any of its types presents a significant reason behind death surrounding the world. In cancer analysis, classification of varied tumor types is of the greatest importance. Microarray gene expressions datasets investigation has been seemed to provide a successful framework for revising tumor and genetic diseases. Despite the fact that standard machine learning ML strategies have effectively been valuable to realize significant genes and classify category type for new cases, regular limitations of DNA microarray data analysis, for example, the small size of an instance, an incredible feature number, yet reason for limitation its investigative, medical and logical uses. Extending the interpretability of expectation and forecast approaches while holding a great precision would help to analysis genes expression profiles information in DNA microarray dataset all the most reasonable and proficiently. This paper presents a new methodology based on the gene expression profiles to classify human cancer diseases. The proposed methodology combines both Information Gain (IG) and Standard Genetic Algorithm (SGA). It first uses Information Gain for feature selection, then uses Genetic Algorithm (GA) for feature reduction and finally uses Genetic Programming (GP) for cancer types’ classification. The suggested system is evaluated by classifying cancer diseases in seven cancer datasets and the results are compared with most latest approaches. The use of proposed system on cancers datasets matching with other machine learning methodologies shows that no classification technique commonly outperforms all the others, however, Genetic Algorithm improve the classification performance of other classifiers generally.  相似文献   

4.
Extended Naive Bayes classifier for mixed data   总被引:2,自引:0,他引:2  
Naive Bayes induction algorithm is very popular in classification field. Traditional method for dealing with numeric data is to discrete numeric attributes data into symbols. The difference of distinct discredited criteria has significant effect on performance. Moreover, several researches had recently employed the normal distribution to handle numeric data, but using only one value to estimate the population easily leads to the incorrect estimation. Therefore, the research for classification of mixed data using Naive Bayes classifiers is not very successful. In this paper, we propose a classification method, Extended Naive Bayes (ENB), which is capable for handling mixed data. The experimental results have demonstrated the efficiency of our algorithm in comparison with other classification algorithms ex. CART, DT and MLP’s.  相似文献   

5.
Even though Self-Organizing Maps (SOMs) constitute a powerful and essential tool for pattern recognition and data mining, the common SOM algorithm is not apt for processing categorical data, which is present in many real datasets. It is for this reason that the categorical values are commonly converted into a binary code, a solution that unfortunately distorts the network training and the posterior analysis. The present work proposes a SOM architecture that directly processes the categorical values, without the need of any previous transformation. This architecture is also capable of properly mixing numerical and categorical data, in such a manner that all the features adopt the same weight. The proposed implementation is scalable and the corresponding learning algorithm is described in detail. Finally, we demonstrate the effectiveness of the presented algorithm by applying it to several well-known datasets.  相似文献   

6.
The problem of finding inner and outer estimations of the range of values of a real function is considered. An overview of interval arithmetics being used for improvement of outer estimations is given. The twin arithmetic is introduced for simultaneous inner and outer estimations and directed twin arithmetic is proposed as generalization of the directed interval arithmetic.  相似文献   

7.
Development of classification methods using case-based reasoning systems is an active area of research. In this paper, two new case-based reasoning systems with two similarity measures that support mixed categorical and numerical data as well as only categorical data are proposed. The principal difference between these two measures lies in the calculations of distance for categorical data. The first one, named distance in unsupervised learning (DUL), is derived from co-occurrence of values, and the other one, named distance in supervised learning (DSL), is used to calculate the distance between two values of the same feature with respect to every other feature for a given class. However, the distance between numerical data is computed using the Euclidean distance. Furthermore, the importance of numeric features is determined by linear discrimination analysis (LDA) and the weight assignment to categorical features depends on co-occurrence of feature values when calculating the similarity between a new case and the old one. The performance of the proposed case-based reasoning systems has been investigated on the University of California, Irvine (UCI) data sets by 5-fold cross validation. The results indicate that these case-based reasoning systems will produce a proper performance in predictive accuracy and interpretability.  相似文献   

8.
基于遗传建模的公式发现研究   总被引:1,自引:0,他引:1       下载免费PDF全文
本文通过对跗建模和公式发现进行分析,提出了基于遗传建模的公式发现算法,并给出了具体的描述,对算法中关键对象的数据结构给出了具体的设计,最后针对数列归纳和公式发现给出了运行结果。  相似文献   

9.
针对大规模类别数据的互信息计算量非常大的问题,利用Spark内存计算平台,提出了类别数据的并行互信息计算方法,该算法首先采用列变换将数据集转换成多个数据子集;然后采用两个变长数组缓存中间结果,解决了类别数据特征对间互信息计算量大、重复性强的问题;最后在配备了24个计算节点的Spark集群中,使用人工合成和真实数据集验证了算法。实验结果表明,该算法在效率、可伸缩性和可扩展性等方面都达到了较高的性能。  相似文献   

10.
传统[K]-modes算法在分类属性聚类中有着广泛的应用,但是传统算法并不区分有序分类属性与无序分类属性。在区分这两种属性的基础上,提出了一种新的距离公式,并优化了算法流程。基于无序分类属性的距离数值,确定了有序分类属性相邻属性值之间距离数值的合理范围。借助有序分类属性蕴含的顺序关系,构建了有序分类属性的距离公式。计算样本点与质心距离之时,引入了簇内各属性值的比例作为总体距离公式的重要参数。综上,新的距离公式良好地刻画了有序分类属性的距离,并且平衡了两种不同分类属性距离公式之间的差异性。实验结果表明,提出的改进算法和距离公式在UCI真实数据集上比原始[K]-modes算法及其改进算法均有显著的效果。  相似文献   

11.
Almost all subspace clustering algorithms proposed so far are designed for numeric datasets. In this paper, we present a k-means type clustering algorithm that finds clusters in data subspaces in mixed numeric and categorical datasets. In this method, we compute attributes contribution to different clusters. We propose a new cost function for a k-means type algorithm. One of the advantages of this algorithm is its complexity which is linear with respect to the number of the data points. This algorithm is also useful in describing the cluster formation in terms of attributes contribution to different clusters. The algorithm is tested on various synthetic and real datasets to show its effectiveness. The clustering results are explained by using attributes weights in the clusters. The clustering results are also compared with published results.  相似文献   

12.
时间序列数据通常是指一系列带有时间间隔的实值型数据,广泛存在于煤矿、金融和医疗等领域.为解决现有时间序列数据分类问题中存在的含有大量噪声、预测精度低和泛化性能差的问题,提出了一种基于正则化极限学习机(RELM)的时间序列数据加权集成分类方法.首先,针对时间序列数据中所含有的噪声,利用小波包变换方法对时间序列数据进行去噪...  相似文献   

13.
针对正则化极限学习机(RELM)中隐节点数影响分类准确性问题,提出一种灵敏度正则化极限学习机(SRELM)算法.首先根据隐含层激活函数的输出及其相对应的输出层权重系数,推导实际值与隐节点输出值残差相对于隐节点的灵敏度计算公式,然后根据不同隐节点的灵敏度进行排序,利用优化样本的分类准确率删减次要隐节点,从而有效提高SRELM的分类准确率.MNIST手写体数字库实验结果表明,相比于传统的SVM和RELM, SRELM方法的耗时与RELM相差不大,均明显低于SVM, SRELM对手写数字的识别准确率最高.  相似文献   

14.
针对多光谱图像分类这一多类别模式识别问题,将二进制纠错编码与GP(GeneticProgramming)算法相结合,并用改进后的编码矩阵代替原先的二进制编码矩阵对图像进行分类,从而建立了新的基于GP的多光谱图像分类算法,给出了用该方法对多光谱图像中地物进行分类的实例。结果表明与以往基于GP的分类方法相比,该文方法体现出较高的分类性能,为遗传规划在多类别模式识别问题中的应用提供了又一条可行的途径。  相似文献   

15.
提出了基于深度学习的异常数据检测的方法,精准检测到无线传感器异常数据并直观展现检测结果。基于无线传感器网络模型分簇原理,通过异常数据驱动的簇内数据融合机制,去除无线传感器网络中的无效数据,获取无线传感器网络有效数据融合结果。构建了具有4层隐含层的深度卷积神经网络,将预处理后的无线传感器网络数据作为模型输入,通过隐含层完成数据特征提取和映射后,由输出层输出异常数据检测结果。实验证明:该方法可有效融合不同类型数据,且网络节点平均能耗较低;包含4层隐含层的深度卷积神经网络平均分类精度高达98.44%,1000次迭代后隐含层的训练损失均趋于0,可实现无线传感器异常数据实时、直观、准确检测。  相似文献   

16.
Evolving heterogeneous networks, which contain different types of nodes and links that change over time, appear in many domains including protein–protein interactions, scientific collaborations, telecommunications. In this paper, we aim to discover temporal information from a heterogenous evolving network in order to improve node classification. We propose a framework, Genetic Algorithm enhanced Time Varying Relational Classifier for evolving Heterogeneous Networks (GA-TVRC-Het), to extract the effects of different relationship types in different time periods in the past. These effects are discovered adaptively by utilizing genetic algorithms. A relational classifier is extended as the classification method in order to be able to work with different types of nodes. The proposed framework is tested on two real world data sets. It is shown that using the optimal time effect improves the classification performance to a large extent. It is observed that the optimal time effect does not necessarily follow a certain functional trend, for example linear or exponential decay in time. Another observation is that the optimal time effect may be different for each type of interaction. Both observations reveal the reason why GA-TVRC-Het outperforms methods that rely on a predefined form of time effect or the same time effect for each link type.  相似文献   

17.

In this paper, we propose a domain learning process build on a machine learning-based process that, starting from plan traces with (partially known) intermediate states, returns a planning domain with numeric predicates, and expressive logical/arithmetic relations between domain predicates written in the planning domain definition language (PDDL). The novelty of our approach is that it can discover relations with little information about the ontology of the target domain to be learned. This is achieved by applying a selection of preprocessing, regression, and classification techniques to infer information from the input plan traces. These techniques are used to prepare the planning data, discover relational/numeric expressions, or extract the preconditions and effects of the domain’s actions. Our solution was evaluated using several metrics from the literature, taking as experimental data plan traces obtained from several domains from the International Planning Competition. The experiments demonstrate that our proposal—even with high levels of incompleteness—correctly learns a wide variety of domains discovering relational/arithmetic expressions, showing F-Score values above 0.85 and obtaining valid domains in most of the experiments.

  相似文献   

18.
罗会兰  危辉 《计算机科学》2010,37(11):234-238
提出了一种基于集成技术和谱聚类技术的混合数据聚类算法CBEST。它利用聚类集成技术产生混合数据间的相似性,这种相似性度量没有对数据特征值分布模型做任何的假设。基于此相似性度量得到的待聚类数据的相似性矩阵,应用谱聚类算法得到混合数据聚类结果。大量真实和人工数据上的实验结果验证了CBEST的有效性和它对噪声的鲁棒性。与其它混合数据聚类算法的比较研究也证明了CBEST的优越性能。CBEST还能有效融合先验知识,通过参数的调节来设置不同属性在聚类中的权重。  相似文献   

19.
Concrete is a composite construction material made primarily with aggregate, cement, and water. In addition to the basic ingredients used in conventional concrete, high-performance concrete incorporates supplementary cementitious materials, such as fly ash and blast furnace slag, and chemical admixture, such as superplasticizer. Hence, high-performance concrete is a highly complex material and modeling its behavior represents a difficult task. In this paper, we propose an intelligent system based on Genetic Programming for the prediction of high-performance concrete strength. The system we propose is called Geometric Semantic Genetic Programming, and it is based on recently defined geometric semantic genetic operators for Genetic Programming. Experimental results show the suitability of the proposed system for the prediction of concrete strength. In particular, the new method provides significantly better results than the ones produced by standard Genetic Programming and other machine learning methods, both on training and on out-of-sample data.  相似文献   

20.
类别型数据聚类被广泛应用于现实世界的不同领域中,如医学科学、计算机科学等。通常的类别型数据聚类,是在基于相异度量上进行研究,针对不同特点的数据集,聚类结果会受到数据集自身特点和噪音信息的影响。此外,基于表示学习的类别型数据聚类,实现复杂,聚类结果受到表示结果的影响较大。本文以共现矩阵为基础,提出一种可以直接考虑类别型数据原始信息关联关系的聚类方法———基于从共现矩阵提取关联的类别型数据聚类方法(CDCBCM)。共现矩阵可被看作是一种对原始数据空间中信息关联情况的汇总。本文通过计算不同对象在各个属性子空间下的共现频率值来构建共现矩阵,并从共现矩阵中去除一些噪音信息,再使用归一化切割来得到聚类结果。本文方法在16个不同领域的公开数据集中进行测试,与8种现有方法进行比较,并采用F1-score指标进行检测。实验结果表明,本文方法在7个数据集上效果最好,平均排名最高,能更好地完成对类别型数据的聚类任务。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号