共查询到20条相似文献,搜索用时 0 毫秒
1.
Almost all subspace clustering algorithms proposed so far are designed for numeric datasets. In this paper, we present a k-means type clustering algorithm that finds clusters in data subspaces in mixed numeric and categorical datasets. In this method, we compute attributes contribution to different clusters. We propose a new cost function for a k-means type algorithm. One of the advantages of this algorithm is its complexity which is linear with respect to the number of the data points. This algorithm is also useful in describing the cluster formation in terms of attributes contribution to different clusters. The algorithm is tested on various synthetic and real datasets to show its effectiveness. The clustering results are explained by using attributes weights in the clusters. The clustering results are also compared with published results. 相似文献
2.
This work is focused on the usage analysis of a citizen web portal, Infoville XXI (http://www.infoville.es) by means of Self-Organizing Maps (SOM). In this paper, a variant of the classical SOM has been used, the so-called Growing Hierarchical SOM (GHSOM). The GHSOM is able to find an optimal architecture of the SOM in a few iterations. There are also other variants which allow to find an optimal architecture, but they tend to need a long time for training, especially in the case of complex data sets. Another relevant contribution of the paper is the new visualization of the patterns in the hierarchical structure. Results show that GHSOM is a powerful and versatile tool to extract relevant and straightforward knowledge from the vast amount of information involved in a real citizen web portal. 相似文献
3.
使用散乱点集重建曲线曲面,在逆向工程和计算机视觉中有着广泛的应用。提出基于SOM网络的三次B样条曲线重建算法。给定某一曲线散乱点集和一初始神经网络,优化SOM网络中神经元位置,使网络逼近散乱点和映射散乱点空间特征。用特征点反求三次B样条曲线控制点,利用控制点重建三次B样条曲线。试验结果表明,算法取得的曲线重建效果良好。 相似文献
4.
研究时态数据库中多粒度时间下的近似周期的挖掘问题。在多粒度时间、多粒度时问格式的基础上引入多粒度时间间隔的定义以及相关性质,构造多粒度近似周期模型,提出一个基于SOM聚类的多粒度近似周期的挖掘算法。利用高频股票数据580000宝钢JBT1进行实验,证明了该算法的有效性。 相似文献
5.
研究时态数据库中多粒度时间下的近似周期的挖掘问题。在多粒度时间、多粒度时间格式的基础上引入多粒度时间间隔的定义以及相关性质,构造多粒度近似周期模型,提出一个基于SOM聚类的多粒度近似周期的挖掘算法。利用高频股票数据580000宝钢JBT1进行实验,证明了该算法的有效性。 相似文献
6.
WEBSOM is a recently developed neural method for exploring full-text document collections, for information retrieval, and for information filtering. In WEBSOM the full-text documents are encoded as vectors in a document space somewhat like in earlier information retrieval methods, but in WEBSOM the document space is formed in an unsupervised manner using the Self-Organizing Map algorithm. In this article the document representations the WEBSOM creates are shown to be computationally efficient approximations of the results of a certain probabilistic model. The probabilistic model incorporates information about the similarity of use of different words to take into account their semantic relations. 相似文献
7.
提出了一种利用SOM网络输出层可视化的特点进行语音训练的方法。SOM网络能够将输入向量映射到二维平面或曲面上,受试者通过视觉反馈的位置信息,指导其发音行为。为了提高SOM聚类效果,SOM还进行加强训练;讨论了SOM输出层神经元个数对聚类的影响。实验结果表明,提出的利用SOM语音训练方法,直观简单,能够有效地实现“看图说话”。 相似文献
8.
The prevention of subscriber churn through customer retention is a core issue of Customer Relationship Management (CRM). By minimizing customer churn a company maximizes its profit. This paper proposes a hybridized architecture to deal with customer retention problems. It does so not only through predicting churn probability but also by proposing retention policies. The architecture works in two modes: learning and usage. In the learning mode, the churn model learner seeks potential associations from the subscriber database. This historical information is used to form a churn model. This mode also calls for a policy model constructor to use the attributes identified in the churn model to divide all ‘churners’ into distinct groups. The policy model constructor is also responsible for developing a policy model for each churner group. In the usage mode, a churn predictor uses the churn model to predict the churn probability of a given subscriber. When the churn model finds that the subscriber has a high churn probability the policy model is used to suggest specific retention policies. This study’s experiments show that the churn model has an evaluation accuracy of approximately eighty-five percent. This suggests that policy model construction represents an interesting and important technique in investigating the characteristics of churner groups. Furthermore, this study indicates that understanding the relationships between churns is essential in creating effective retention policy models for dealing with ‘churners’. 相似文献
9.
K-means type clustering algorithms for mixed data that consists of numeric and categorical attributes suffer from cluster center initialization problem. The final clustering results depend upon the initial cluster centers. Random cluster center initialization is a popular initialization technique. However, clustering results are not consistent with different cluster center initializations. K-Harmonic means clustering algorithm tries to overcome this problem for pure numeric data. In this paper, we extend the K-Harmonic means clustering algorithm for mixed datasets. We propose a definition for a cluster center and a distance measure. These cluster centers and the distance measure are used with the cost function of K-Harmonic means clustering algorithm in the proposed algorithm. Experiments were carried out with pure categorical datasets and mixed datasets. Results suggest that the proposed clustering algorithm is quite insensitive to the cluster center initialization problem. Comparative studies with other clustering algorithms show that the proposed algorithm produce better clustering results. 相似文献
10.
由于网络混合属性集的冗余数据量多,影响数据检测的查全率,为此提出结合粗糙集理论的网络大数据混合属性特征检测方法。首先构建一个四元组,利用四元组的任意邻域信息测算其长度函数,以判断信息特征的相似性,结合粗糙集理论求解相似信息特征的邻域熵,以检测并分类重复数据属性。为优化数据分类效率,引入支持向量机分类思想,将大数据混合属性的分类问题变换为线性可分问题,实现网络大数据混合属性特征检测与分类。实验结果表明,所提方法能够有效根据数据特征筛选出无关数据信息,使用经过训练后的分类装置对约简后的特征集进行分类,与基于特征和分类器参数组合优化的网络属性特征检测方法比较,证明了所提方法的有效性,为网络大数据混合特征检测技术提供一种新的有效解决方式。 相似文献
11.
首先改进了自组织映射学习和分类算法,通过引入自定义变量匹配度、约简率和约简样本量化误差,提出了一种新的基于多层自组织映射和主成分分析入侵检测模型与算法。模型运用主成分分析算法对输入样本进行特征约简,运用分层思想对分类精度低的聚类进行逐层细分,解决了单层自组织映射分类不精确的问题。实验结果表明该模型用于入侵检测的效果良好,能准确区分攻击与否且能进一步指出攻击的具体类型。 相似文献
12.
为了实现在海量数据中的审计线索的快速发现,通过数据挖掘FMA算法对被审数据和审计专家经验库进行关联规则快速提取;再利用自组织神经网络改良CLARANS算法对审计专家经验库抽取的规则划分出相似规则群;然后通过对被审单位关联规则集合和专家经验的相似规则群进行相对强弱、趋近率和价值率的比较,最终得到审计线索集合。 相似文献
13.
本文提出一种模糊熵准则以及基于模糊熵准则的复合学习准则,可以较好地指导自组织特征遇射网络的学习过程。 相似文献
14.
This work presents a strategy for the classification of astronomical objects based on spectrophotometric data and the use of unsupervised neural networks and statistical classification algorithms. Our strategy constitutes an essential part of the preparation phase of the automatic classification and parameterization algorithms for the data that are to be collected by the Gaia satellite of the European Space Agency (ESA), whose launch is foreseen for the spring of 2012. The proposed algorithm is based on a hierarchical structure of neural networks composed of various tree-structured SOM networks. The classification of possible astronomical objects (stars, galaxies, quasars, multiple objects, etc.) basically consists in the iterative segmentation of the inputs space and the ensuing generation of initial classifications and increase in classification precision by means of a refining process. Apart from providing a classification, our technique also measures the quality and precision of the classifications and segments the objects for which it cannot determine whether or not they belong to a pre-established class of astronomical objects (outliers). 相似文献
15.
针对装甲车辆电源系统整流器内部二极管的断路和短路故障,提出了一种基于SOM和Elman神经网络相结合的诊断方法。通过建立整流器的仿真模型,利用快速傅里叶变换(FFT)提取各故障模式的谐波次数和幅值,并用SOM网络进行模式分类,由于各模式下具体故障类型存在相位差,通过采样其周期内的电压值,再利用Elman网络可以识别具体故障。从仿真结果来看,实现了整流器的模式分类和故障识别,验证了该方法的可行和正确性。 相似文献
16.
提出一种基于自组织增长分级神经网络(Growing Hierarchical Self-Organizing Map ,GHSOM)的遥感图像分类方法。首先详细分析了GHSOM方法的基本原理和算法,然后成功将其应用于遥感图像分类。实验结果表明了GHSOM通过分级的分类方法有效解决了SOM分类中的混分问题,大大提高了分类精度和效率,是一种新的有效的无监督遥感图像分类方法。 相似文献
17.
After projecting high dimensional data into a two-dimension map via the SOM, users can easily view the inner structure of the data on the 2-D map. In the early stage of data mining, it is useful for any kind of data to inspect their inner structure. However, few studies apply the SOM to transactional data and the related categorical domain, which are usually accompanied with concept hierarchies. Concept hierarchies contain information about the data but are almost ignored in such researches. This may cause mistakes in mapping. In this paper, we propose an extended SOM model, the SOMCD, which can map the varied kinds of data in the categorical domain into a 2-D map and visualize the inner structure on the map. By using tree structures to represent the different kinds of data objects and the neurons’ prototypes, a new devised distance measure which takes information embedded in concept hierarchies into consideration can properly find the similarity between the data objects and the neurons. Besides the distance measure, we base the SOMCD on a tree-growing adaptation method and integrate the U-Matrix for visualization. Users can hierarchically separate the trained neurons on the SOMCD's map into different groups and cluster the data objects eventually. From the experiments in synthetic and real datasets, the SOMCD performs better than other SOM variants and clustering algorithms in visualization, mapping and clustering. 相似文献
18.
自组织映射(SOM)算法作为一种聚类和高维可视化的无监督学习算法,为进行中文Web文档聚类提供了有力的手段。但是SOM算法天然存在着对网络初始权值敏感的缺陷,从而影响聚类质量。为此,引进遗传算法对SOM网络加以优化。提出了以遗传算法优化SOM网络的文本聚类算法(GSTCA);进行了对比实验,实验表明,改进后的算法GSTCA比SOM算法在Web中文文档聚类中具有更高的准确率,其F-measure值平均提高了14%,同时,实验还表明,GSTCA算法对网络初始权值是不敏感的,从而提高了算法的稳定性。 相似文献
19.
Traditional field-based lithological mapping can be a time-consuming, costly and challenging endeavour when large areas need to be investigated, where terrain is remote and difficult to access and where the geology is highly variable over short distances. Consequently, rock units are often mapped at coarse-scales, resulting in lithological maps that have generalised contacts which in many cases are inaccurately located. Remote sensing data, such as aerial photographs and satellite imagery are commonly incorporated into geological mapping programmes to obtain geological information that is best revealed by overhead perspectives. However, spatial and spectral limitations of the imagery and dense vegetation cover can limit the utility of traditional remote sensing products. The advent of Airborne Light Detection And Ranging (LiDAR) as a remote sensing tool offers the potential to provide a novel solution to these problems because accurate and high-resolution topographic data can be acquired in either forested or non-forested terrain, allowing discrimination of individual rock types that typically have distinct topographic characteristics. This study assesses the efficacy of airborne LiDAR as a tool for detailed lithological mapping in the upper section of the Troodos ophiolite, Cyprus. Morphometric variables (including slope, curvature and surface roughness) were derived from a 4 m digital terrain model in order to quantify the topographic characteristics of four principal lithologies found in the area. An artificial neural network (the Kohonen Self-Organizing Map) was then employed to classify the lithological units based upon these variables. The algorithm presented here was used to generate a detailed lithological map which defines lithological contacts much more accurately than the best existing geological map. In addition, a separate map of classification uncertainty highlights potential follow-up targets for ground-based verification. The results of this study demonstrate the significant potential of airborne LiDAR for lithological discrimination and rapid generation of detailed lithological maps, as a contribution to conventional geological mapping programmes. 相似文献
20.
Data clustering is an important data mining technique which partitions data according to some similarity criterion. Abundant algorithms have been proposed for clustering numerical data and some recent research tackles the problem of clustering categorical or mixed data. Unlike the subtraction scheme used for numerical attributes, there is no standard for measuring distance between categorical values. In this article, we propose a distance representation scheme, distance hierarchy, which facilitates expressing the similarity between categorical values and also unifies distance measuring of numerical and categorical values. We then apply the scheme to mixed data clustering, in particular, to integrate with a hierarchical clustering algorithm. Consequently, this integrated approach can uniformly handle numerical data and categorical data, and also enables one to take the similarity between categorical values into consideration. Experimental results show that the proposed approach produces better clustering results than conventional clustering algorithms when categorical attributes are present and their values have different degree of similarity. 相似文献
|