首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Neural Computing and Applications - Naive Bayes makes an assumption regarding conditional independence, but this assumption rarely holds true in real-world applications, so numerous attempts have...  相似文献   

2.
Due to its simplicity, efficiency and efficacy, naive Bayes (NB) continues to be one of the top 10 data mining algorithms. A mass of improved approaches to NB have been proposed to weaken its conditional independence assumption. However, there has been little work, up to the present, on instance weighting filter approaches to NB. In this paper, we propose a simple, efficient, and effective instance weighting filter approach to NB. We call it attribute (feature) value frequency-based instance weighting and denote the resulting improved model as attribute value frequency weighted naive Bayes (AVFWNB). In AVFWNB, the weight of each training instance is defined as the inner product of its attribute value frequency vector and the attribute value number vector. The experimental results on 36 widely used classification problems show that AVFWNB significantly outperforms NB, yet at the same time maintains the computational simplicity that characterizes NB.  相似文献   

3.
《Knowledge》2007,20(2):120-126
The naive Bayes classifier continues to be a popular learning algorithm for data mining applications due to its simplicity and linear run-time. Many enhancements to the basic algorithm have been proposed to help mitigate its primary weakness – the assumption that attributes are independent given the class. All of them improve the performance of naive Bayes at the expense (to a greater or lesser degree) of execution time and/or simplicity of the final model. In this paper we present a simple filter method for setting attribute weights for use with naive Bayes. Experimental results show that naive Bayes with attribute weights rarely degrades the quality of the model compared to standard naive Bayes and, in many cases, improves it dramatically. The main advantages of this method compared to other approaches for improving naive Bayes is its run-time complexity and the fact that it maintains the simplicity of the final model.  相似文献   

4.
Due to data sparseness and attribute redundancy in high-dimensional data, clusters of objects often exist in subspaces rather than in the entire space. To effectively address this issue, this paper presents a new optimization algorithm for clustering high-dimensional categorical data, which is an extension of the k-modes clustering algorithm. In the proposed algorithm, a novel weighting technique for categorical data is developed to calculate two weights for each attribute (or dimension) in each cluster and use the weight values to identify the subsets of important attributes that categorize different clusters. The convergence of the algorithm under an optimization framework is proved. The performance and scalability of the algorithm is evaluated experimentally on both synthetic and real data sets. The experimental studies show that the proposed algorithm is effective in clustering categorical data sets and also scalable to large data sets owning to its linear time complexity with respect to the number of data objects, attributes or clusters.  相似文献   

5.
This paper represents another step in overcoming a drawback of K-Means, its lack of defense against noisy features, using feature weights in the criterion. The Weighted K-Means method by Huang et al. (2008, 2004, 2005) [5], [6], [7] is extended to the corresponding Minkowski metric for measuring distances. Under Minkowski metric the feature weights become intuitively appealing feature rescaling factors in a conventional K-Means criterion. To see how this can be used in addressing another issue of K-Means, the initial setting, a method to initialize K-Means with anomalous clusters is adapted. The Minkowski metric based method is experimentally validated on datasets from the UCI Machine Learning Repository and generated sets of Gaussian clusters, both as they are and with additional uniform random noise features, and appears to be competitive in comparison with other K-Means based feature weighting algorithms.  相似文献   

6.
Artificial Intelligence Review - Attribute weighting is a task of paramount relevance in multi-attribute decision-making (MADM). Over the years, different approaches have been developed to face...  相似文献   

7.

In the fields of pattern recognition and machine learning, the use of data preprocessing algorithms has been increasing in recent years to achieve high classification performance. In particular, it has become inevitable to use the data preprocessing method prior to classification algorithms in classifying medical datasets with the nonlinear and imbalanced data distribution. In this study, a new data preprocessing method has been proposed for the classification of Parkinson, hepatitis, Pima Indians, single proton emission computed tomography (SPECT) heart, and thoracic surgery medical datasets with the nonlinear and imbalanced data distribution. These datasets were taken from UCI machine learning repository. The proposed data preprocessing method consists of three steps. In the first step, the cluster centers of each attribute were calculated using k-means, fuzzy c-means, and mean shift clustering algorithms in medical datasets including Parkinson, hepatitis, Pima Indians, SPECT heart, and thoracic surgery medical datasets. In the second step, the absolute differences between the data in each attribute and the cluster centers are calculated, and then, the average of these differences is calculated for each attribute. In the final step, the weighting coefficients are calculated by dividing the mean value of the difference to the cluster centers, and then, weighting is performed by multiplying the obtained weight coefficients by the attribute values in the dataset. Three different attribute weighting methods have been proposed: (1) similarity-based attribute weighting in k-means clustering, (2) similarity-based attribute weighting in fuzzy c-means clustering, and (3) similarity-based attribute weighting in mean shift clustering. In this paper, we aimed to aggregate the data in each class together with the proposed attribute weighting methods and to reduce the variance value within the class. Thus, by reducing the value of variance in each class, we have put together the data in each class and at the same time, we have further increased the discrimination between the classes. To compare with other methods in the literature, the random subsampling has been used to handle the imbalanced dataset classification. After attribute weighting process, four classification algorithms including linear discriminant analysis, k-nearest neighbor classifier, support vector machine, and random forest classifier have been used to classify imbalanced medical datasets. To evaluate the performance of the proposed models, the classification accuracy, precision, recall, area under the ROC curve, κ value, and F-measure have been used. In the training and testing of the classifier models, three different methods including the 50–50% train–test holdout, the 60–40% train–test holdout, and tenfold cross-validation have been used. The experimental results have shown that the proposed attribute weighting methods have obtained higher classification performance than random subsampling method in the handling of classifying of the imbalanced medical datasets.

  相似文献   

8.
The quantifier-guided aggregation is used for aggregating the multiple-criteria input. Therefore, the selection of appropriate quantifiers is crucial in multicriteria aggregation since the weights for the aggregation are generated from the selected quantifier. Since Yager proposed a method for obtaining the ordered weighted averaging (OWA) vector via the three relative quantifiers used for the quantifier-guided aggregation, limited efforts have been devoted to developing new quantifiers that are suitable for use in multicriteria aggregation. In this correspondence, we propose some new quantifier functions that are based on the weighting functions characterized by showing a constant value of orness independent of the number of criteria aggregated. The proposed regular increasing monotone and regular decreasing monotone quantifiers produce the same orness as the weighting functions from which each quantifier function originates. Further, the quantifier orness rapidly converges into the value of orness of the weighting functions having a constant value of orness. This result indicates that a quantifier-guided OWA aggregation will result in a similar aggregate in case the number of criteria is not too small.  相似文献   

9.
针对K—Modes算法的不足,提出了一种基于信任值的分类属性聚类算法TrustCCluster,该算法不需预先给定聚类个数,聚类结果稳定且不依赖于初始值的选取。在真实数据上验证了TrustC—Cluster聚类算法,并与K—Modes和P—Modes算法进行了对比,实验结果表明TmstCCluster算法是有效、可行的。  相似文献   

10.
11.
一种高效属性可撤销的属性基加密方案   总被引:2,自引:0,他引:2  
王锦晓  张旻  陈勤 《计算机应用》2012,32(Z1):39-43
密文策略的属性基加密(CP-ABE)可以实现数据拥有者定义的基于外挂加密数据的访问控制,使它在数据分享细粒度访问控制中有着良好的应用前景,然而在实际应用系统中仍然存在属性撤销方面急需解决的问题.在代理重加密技术和CP-ABE技术相结合的方案基础上,引入Shamir的秘密分享技术和树访问结构,实现了门限运算和布尔运算的结合,并且缩短了密钥和密文长度.与之前方案相比,在效率和表达能力方面有了明显的提高,此外该方案在判定双线性假设下是安全的.  相似文献   

12.
刘军  贾宏慧 《计算机工程与设计》2006,27(21):4115-4116,4119
空袭目标的威胁程度的评估将涉及许多不确定的因素,甚至是无法得到的。如何在空中目标的指标值不完全确定的情况下对目标进行威胁评估,这是一个棘手的问题。提出的基于不完全信息的多属性决策的目标威胁评估方法,该方法为解决这样的问题提供了新的途径,并通过实例分析了该方法的可行性。  相似文献   

13.
This paper is concerned with a method for multiple attribute decision making under fuzzy environment, in which the preference values take the form of triangular fuzzy numbers. Based on the idea that the attribute with a larger deviation value among alternatives should be assessed a larger weight, a linear programming model about the maximal deviation of weighted attribute values is established. Therefore, an approach to deal with attribute weights which are completely unknown is developed by using expected value operator of fuzzy variables. Furthermore, in order to make a decision or choose the optimum alternative, an expected value method is presented under the assumption that attribute weights are known fully. The method not only avoids complex comparing for fuzzy numbers, but also has the advantages of simple operation and easy calculation. Finally, a numerical example is used to illustrate the proposed approach at the end of this paper.  相似文献   

14.
Attribute grammars (AGs) are a suitable formalism for the development of language processing systems. However, for languages including unrestricted labeled jumps, such as “goto” in C, the optimizers in compilers are difficult to write in AGs. This is due to two problems that few previous researchers could deal with simultaneously, i.e., references of attribute values on distant nodes and circularity in attribute dependency. This paper proposescircular remote attribute grammars (CRAGs), an extension of AGs that allows (1) direct relations between two distant attribute instances through pointers referring to other nodes in the derivation tree, and (2) circular dependencies, under certain conditions including those that arise from remote references. This extension gives AG programmers a natural means of describing language processors and programming environments for languages that include any type of jump structure. We also show a method of constructing an efficient evaluator for CRAGs called amostly static evaluator. The performance of the proposed evaluator has been measured and compared with dynamic and static evaluators. Akira Sasaki: He is a research fellow of the Advanced Clinical Research Center in the Institute of Medical Science at the University of Tokyo. He received his BSc and MSc from Tokyo Institute of Technology, Japan, in 1994 and 1996, respectively. His research interests include programming languages, programming language processors and programming environments, especially compiler compilers, attribute grammars and systematic debugging. He is a member of the Japan Society for Software Science and Technology. Masataka Sassa, D.Sc.: He is Professor of Computer Science at Tokyo Institute of Technology. He received his BSc, MSc and DSc from the University of Tokyo, Japan, in 1970, 1972 and 1978, respectively. His research interests include programming languages, programming language processors and programming environments, currently he is focusing on compiler optimization, compiler infrastructure, attribute grammars and systematic debugging. He is a member of the ACM, IEEE Computer Society, Japan Society for Software Science and Technology, and Information Processing Society of Japan.  相似文献   

15.
16.
在k-匿名隐私保护策略的发展中,数据表的数据质量与安全性是相互制约的关系,在多样化敏感值数据表的隐私保护研究中,如何平衡数据质量与安全性之间的矛盾,也是备受关注的重点。但是,对相同敏感属性值的数据表进行泛化保护时,此方面的评价理论不适用于度量该类数据的可用性与安全性,文章针对这一不足,提出了一个基于熵理论的相同敏感值数据表泛化算法的评价方案。该方案引入了加权属性熵和链接匹配熵的概念,加权属性熵根据不同属性的重要程度计算数据损失量,链接匹配熵将链接攻击数据表消耗的正确匹配元组的信息量作为安全性度量。最后,利用提出的评价方案对两种泛化算法处理后的数据表进行评价,丰富了在相同敏感值条件下泛化算法的评价体系。  相似文献   

17.
基于Rough Set的属性值约简算法研究   总被引:1,自引:0,他引:1  
从逻辑的角度分析了属性值约简的本质及过程,在此基础上构造辨识矩阵,提出了一种基于Rough set的属性值约简新算法,并对此进行了证明。该算法比以往的算法更简便、直观,易于编程实现,也更易从本质上理解属性值约简的实质及过程,并且算法不破坏决策系统中的不一致规则所蕴含的信息量。实例分析表明该算法是有效可行的。  相似文献   

18.
Resolving domain incompatibility among independently developed databases often involves uncertain information. DeMichiel (1989) showed that uncertain information can be generated by the mapping of conflicting attributes to a common domain, based on some domain knowledge. We show that uncertain information can also arise when the database integration process requires information not directly represented in the component databases, but can be obtained through some summary of data. We therefore propose an extended relational model based on Dempster-Shafer theory of evidence to incorporate such uncertain knowledge about the source databases. The extended relation uses evidence sets to represent uncertainty in information, which allow probabilities to be attached to subsets of possible domain values. We also develop a full set of extended relational operations over the extended relations. In particular, an extended union operation has been formalized to combine two extended relations using Dempster's rule of combination. The closure and boundedness properties of our proposed extended operations are formulated. We also illustrate the use of extended operations by some query examples  相似文献   

19.
20.
Estimation by analogy (EBA) predicts effort for a new project by aggregating effort information of similar projects from a given historical data set. Existing research results have shown that a careful selection and weighting of attributes may improve the performance of the estimation methods. This paper continues along that research line and considers weighting of attributes in order to improve the estimation accuracy. More specifically, the impact of weighting (and selection) of attributes is studied as extensions to our former EBA method AQUA, which has shown promising results and also allows estimation in the case of data sets that have non-quantitative attributes and missing values. The new resulting method is called AQUA+. For attribute weighting, a qualitative analysis pre-step using rough set analysis (RSA) is performed. RSA is a proven machine learning technique for classification of objects. We exploit the RSA results in different ways and define four heuristics for attribute weighting. AQUA+ was evaluated in two ways: (1) comparison between AQUA+ and AQUA, along with the comparative analysis between the proposed four heuristics for AQUA+, (2) comparison of AQUA+ with other EBA methods. The main evaluation results are: (1) better estimation accuracy was obtained by AQUA+ compared to AQUA over all six data sets; and (2) AQUA+ obtained better results than, or very close to that of other EBA methods for the three data sets applied to all the EBA methods. In conclusion, the proposed attribute weighing method using RSA can improve the estimation accuracy of EBA method AQUA+ according to the empirical studies over six data sets. Testing more data sets is necessary to get results that are more statistical significant.
Guenther RuheEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号