首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In data mining applications, it is important to develop evaluation methods for selecting quality and profitable rules. This paper utilizes a non-parametric approach, Data Envelopment Analysis (DEA), to estimate and rank the efficiency of association rules with multiple criteria. The interestingness of association rules is conventionally measured based on support and confidence. For specific applications, domain knowledge can be further designed as measures to evaluate the discovered rules. For example, in market basket analysis, the product value and cross-selling profit associated with the association rule can serve as essential measures to rule interestingness. In this paper, these domain measures are also included in the rule ranking procedure for selecting valuable rules for implementation. An example of market basket analysis is applied to illustrate the DEA based methodology for measuring the efficiency of association rules with multiple criteria.  相似文献   

2.
A two-class classification problem is considered where the objects to be classified are bags of instances in d-space. The classification rule is defined in terms of an open d-ball. A bag is labeled positive if it meets the ball and labeled negative otherwise. Determining the center and radius of the ball is modeled as a SVM-like margin optimization problem. Necessary optimality conditions are derived leading to a polynomial algorithm in fixed dimension. A VNS type heuristic is developed and experimentally tested. The methodology is extended to classification by several balls and to more than two classes.  相似文献   

3.
In this paper we introduce a method called CL.E.D.M. (CLassification through ELECTRE and Data Mining), that employs aspects of the methodological framework of the ELECTRE I outranking method, and aims at increasing the accuracy of existing data mining classification algorithms. In particular, the method chooses the best decision rules extracted from the training process of the data mining classification algorithms, and then it assigns the classes that correspond to these rules, to the objects that must be classified. Three well known data mining classification algorithms are tested in five different widely used databases to verify the robustness of the proposed method.  相似文献   

4.
The fuzzy c partition of a set of qualitative data is the problem of selecting the optimal c   centroids that are the most representative of the whole population. Moreover, a set of weights wijwij must be determined, describing the fuzzy membership function of pattern i to the cluster represented by centroid j. Both problems are formulated by a single mathematical programming problem, that is an extension of the classic p-median models often used for clustering. The new objective function is neither concave nor convex and the application requires the clustering of many thousands of data, therefore heuristic methods are to be developed to find the best fuzzy partition.  相似文献   

5.
Elicitation of classification rules by fuzzy data mining   总被引:1,自引:0,他引:1  
Data mining techniques can be used to find potentially useful patterns from data and to ease the knowledge acquisition bottleneck in building prototype rule-based systems. Based on the partition methods presented in simple-fuzzy-partition-based method (SFPBM) proposed by Hu et al. (Comput. Ind. Eng. 43(4) (2002) 735), the aim of this paper is to propose a new fuzzy data mining technique consisting of two phases to find fuzzy if–then rules for classification problems: one to find frequent fuzzy grids by using a pre-specified simple fuzzy partition method to divide each quantitative attribute, and the other to generate fuzzy classification rules from frequent fuzzy grids. To improve the classification performance of the proposed method, we specially incorporate adaptive rules proposed by Nozaki et al. (IEEE Trans. Fuzzy Syst. 4(3) (1996) 238) into our methods to adjust the confidence of each classification rule. For classification generalization ability, the simulation results from the iris data demonstrate that the proposed method may effectively derive fuzzy classification rules from training samples.  相似文献   

6.
霍纬纲  高小霞 《控制与决策》2012,27(12):1833-1838
提出一种适用于多类不平衡分布情形下的模糊关联分类方法,该方法以最小化AdaBoost.M1W集成学习迭代过程中训练样本的加权分类错误率和子分类器中模糊关联分类规则数目及规则中所含模糊项的数目为遗传优化目标,实现了AdaBoost.M1W和模糊关联分类建模过程的较好融合.通过5个多类不平衡UCI标准数据集和现有的针对不平衡分类问题的数据预处理方法实验对比结果,表明了所提出的方法能显著提高多类不平衡情形下的模糊关联分类模型的分类性能.  相似文献   

7.
In the supervised classification framework, human supervision is required for labeling a set of learning data which are then used for building the classifier. However, in many applications, human supervision is either imprecise, difficult or expensive. In this paper, the problem of learning a supervised multi-class classifier from data with uncertain labels is considered and a model-based classification method is proposed to solve it. The idea of the proposed method is to confront an unsupervised modeling of the data with the supervised information carried by the labels of the learning data in order to detect inconsistencies. The method is able afterward to build a robust classifier taking into account the detected inconsistencies into the labels. Experiments on artificial and real data are provided to highlight the main features of the proposed method as well as an application to object recognition under weak supervision.  相似文献   

8.
Graph is a powerful representation formalism that has been widely employed in machine learning and data mining. In this paper, we present a graph-based classification method, consisting of the construction of a special graph referred to as K-associated graph, which is capable of representing similarity relationships among data cases and proportion of classes overlapping. The main properties of the K-associated graphs as well as the classification algorithm are described. Experimental evaluation indicates that the proposed technique captures topological structure of the training data and leads to good results on classification task particularly for noisy data. In comparison to other well-known classification techniques, the proposed approach shows the following interesting features: (1) A new measure, called purity, is introduced not only to characterize the degree of overlap among classes in the input data set, but also to construct the K-associated optimal graph for classification; (2) nonlinear classification with automatic local adaptation according to the input data. Contrasting to K-nearest neighbor classifier, which uses a fixed K, the proposed algorithm is able to automatically consider different values of K, in order to best fit the corresponding overlap of classes in different data subspaces, revealing both the local and global structure of input data. (3) The proposed classification algorithm is nonparametric, implicating high efficiency and no need for model selection in practical applications.  相似文献   

9.
Repeatable approaches for mapping saltcedar (Tamarix spp.) at regional scales, with the ability to detect low density stands, is crucial for the species' effective control and management, as well as for an improved understanding of its current and potential future dynamics. This study had the objective of testing subpixel classification techniques based on linear and nonlinear spectral mixture models in order to identify the best possible classification technique for repeatable mapping of saltcedar canopy cover along the Forgotten River reach of the Rio Grande. The suite of methods tested were meant to represent various levels of constraints imposed in the solution as well as varying levels of classification details (species level and landscape level), sources for endmembers (space-borne multispectral image, airborne hyperspectral image and in situ spectra measurements) and mixture modes (linear and nonlinear). A multiple scattering approximation (MSA) model was proposed as a means to represent canopy (image) reflectance spectra as a nonlinear combination of subcanopy (field) reflectance spectra. The accuracy of subpixel canopy cover was assessed through a 1-m spatial-resolution hyperspectral image and field measurements. Results indicated that: 1) When saltcedar was represented by one single image spectrum (endmember), the unconstrained linear spectral unmixing with post-classification normalization produced comparable accuracy (OA = 72%) to those delivered by partially and fully constrained linear spectral unmixing (63-72%) and even by nonlinear spectral unmixing (73%). 2) The accuracy of the fully constrained linear spectral unmixing method increased (from 67% to 77%) when the classes were represented with several image spectra. 3) Saltcedar canopy reflectance showed the strongest nonlinear relationship with respect to subcanopy reflectance, as indicated through a range of estimated canopy recollision probabilities. 4) Despite the considerations of these effects on canopy reflectance, the inversion of the nonlinear spectral mixing model with subcanopy reflectance (field) measurements yielded slightly lower accuracy (73%) than the linear counterpart (77%). Implications of these results for region-wide monitoring of saltcedar invasion are also discussed.  相似文献   

10.
The k nearest neighbor (k-NN) classifier has been a widely used nonparametric technique in Pattern Recognition, because of its simplicity and good performance. In order to decide the class of a new prototype, the k-NN classifier performs an exhaustive comparison between the prototype to classify and the prototypes in the training set T. However, when T is large, the exhaustive comparison is expensive. For this reason, many fast k-NN classifiers have been developed, some of them are based on a tree structure, which is created during a preprocessing phase using the prototypes in T. Then, in a search phase, the tree is traversed to find the nearest neighbor. The speed up is obtained, while the exploration of some parts of the tree is avoided using pruning rules which are usually based on the triangle inequality. However, in soft sciences as Medicine, Geology, Sociology, etc., the prototypes are usually described by numerical and categorical attributes (mixed data), and sometimes the comparison function for computing the similarity between prototypes does not satisfy metric properties. Therefore, in this work an approximate fast k most similar neighbor classifier, for mixed data and similarity functions that do not satisfy metric properties, based on a tree structure (Tree k-MSN) is proposed. Some experiments with synthetic and real data are presented.  相似文献   

11.
加权KNN(k-nearest neighbor)方法,仅利用了k个最近邻训练样本所提供的类别信息,而没考虑测试样本的贡献,因而常会导致一些误判。针对这个缺陷,提出了半监督KNN分类方法。该方法对序列样本和非序列样本,均能够较好地执行分类。在分类决策时,还考虑了c个最近邻测试样本的贡献,从而提高了分类的正确性。在Cohn-Kanade人脸库上,序列图像的识别率提高了5.95%,在CMU-AMP人脸库上,非序列图像的识别率提高了7.98%。实验结果表明,该方法执行效率高,分类效果好。  相似文献   

12.
We propose an efficient approach, FSKNN, which employs fuzzy similarity measure (FSM) and k nearest neighbors (KNN), for multi-label text classification. One of the problems associated with KNN-like approaches is its demanding computational cost in finding the k nearest neighbors from all the training patterns. For FSKNN, FSM is used to group the training patterns into clusters. Then only the training documents in those clusters whose fuzzy similarities to the document exceed a predesignated threshold are considered in finding the k nearest neighbors for the document. An unseen document is labeled based on its k nearest neighbors using the maximum a posteriori estimate. Experimental results show that our proposed method can work more effectively than other methods.  相似文献   

13.
In this paper, we introduce a method for the identification of fuzzy measures from sample data. It is implemented using genetic algorithms and is flexible enough to allow the use of different subfamilies of fuzzy measures for the learning, as k-additive or p-symmetric measures. The experiments performed to test the algorithm suggest that it is robust in situations where there exists noise in the considered data. We also explore some possibilities for the choice of the initial population, which lead to the study of the extremes of some subfamilies of fuzzy measures, as well as the proposal of a method for random generation of fuzzy measures.  相似文献   

14.
Extracting fuzzy classification rules from partially labeled data   总被引:1,自引:1,他引:0  
The interpretability and flexibility of fuzzy if-then rules make them a popular basis for classifiers. It is common to extract them from a database of examples. However, the data available in many practical applications are often unlabeled, and must be labeled manually by the user or by expensive analyses. The idea of semi-supervised learning is to use as much labeled data as available and try to additionally exploit the information in the unlabeled data. In this paper we describe an approach to learn fuzzy classification rules from partially labeled datasets.  相似文献   

15.
In recent years, there have been numerous attempts to extend the k-means clustering protocol for single database to a distributed multiple database setting and meanwhile keep privacy of each data site. Current solutions for (whether two or more) multiparty k-means clustering, built on one or more secure two-party computation algorithms, are not equally contributory, in other words, each party does not equally contribute to k-means clustering. This may lead a perfidious attack where a party who learns the outcome prior to other parties tells a lie of the outcome to other parties. In this paper, we present an equally contributory multiparty k-means clustering protocol for vertically partitioned data, in which each party equally contributes to k-means clustering. Our protocol is built on ElGamal's encryption scheme, Jakobsson and Juels's plaintext equivalence test protocol, and mix networks, and protects privacy in terms that each iteration of k-means clustering can be performed without revealing the intermediate values.  相似文献   

16.
In fuzzy rule-based classification systems, rule weight has often been used to improve the classification accuracy. In past research, a number of heuristic methods for rule weight specification have been proposed. In this paper, a method of fuzzy rule weight specification using Receiver Operating Characteristic (ROC) analysis is proposed. In order to specify the weight of a fuzzy rule, using 2-class ROC analysis, the threshold that the rule achieves its maximum accuracy is found. This threshold is used as the weight of the rule. The proposed method is compared with existing ones through computer simulations on some well-known classification problems with continuous attributes. Simulation results show that the proposed method performs better than existing methods of fuzzy rule weight specification.  相似文献   

17.
This paper proposed a novel approach to ranking fuzzy numbers based on the left and right deviation degree (L-R deviation degree). In the approach, the maximal and minimal reference sets are defined to measure L-R deviation degree of fuzzy number, and then the transfer coefficient is defined to measure the relative variation of L-R deviation degree of fuzzy number. Furthermore, the ranking index value is obtained based on the L-R deviation degree and relative variation of fuzzy numbers. Additionally, to compare the proposed approach with the existing approaches, five numerical examples are used. The comparative results illustrate that the approach proposed in this paper is simpler and better.  相似文献   

18.
After the introduction of fuzzy sets by Zadeh, there have been a number of generalizations of this fundamental concept. The notion of intuitionistic fuzzy sets introduced by Atanassov is one among them. In this paper, we apply the concept of an intuitionistic fuzzy set to Hv-modules. The notion of an intuitionistic fuzzy Hv-submodule of an Hv-module is introduced, and some related properties are investigated. Characterizations of intuitionistic fuzzy Hv-submodules are given.  相似文献   

19.
BackgroundSedentary behaviors are associated to the development of noncommunicable diseases (NCD) such as cardiovascular diseases (CVD), type 2 diabetes, and cancer. Accelerometers and inclinometers have been used to estimate sedentary behaviors, however a major limitation is that these devices do not provide enough contextual information in order to recognize the specific sedentary behavior performed, e.g., sitting or lying watching TV, using the PC, sitting at work, driving, etc.ObjectivePropose and evaluate the precision of a mobile system for objectively measuring six sedentary behaviors using accelerometer and location data.ResultsThe system is implemented as an Android Mobile App, which identifies individual’s sedentary behaviors based on accelerometer data taken from the smartphone or a smartwatch, and symbolic location data obtained from Bluetooth Low Energy (BLE) beacons. The system infers sedentary behaviors by means of a supervised Machine Learning Classifier. The precision of the classification of five of the six studied sedentary behaviors exceeded 95% using accelerometer data from a smartwatch attached to the wrist and 98% using accelerometer data from a smartphone put into the pocket. Statistically significant improvement in the average precision of the classification due to the use of BLE beacons was found by comparing the precision of the classification using accelerometer data only, and BLE beacons localization technology.ConclusionsThe proposed system provides contextual information of specific sedentary behaviors by inferring with very high precision the physical location where the sedentary event occurs. Moreover, it was found that, when accelerometers are put in the user’s pocket, instead of the wrist and, when symbolic location is inferred using BLE beacons; the precision in the classification is improved. In practice, the proposed system has the potential to contribute to the understanding of the context and determinants of sedentary behaviors, necessary for the implementation and monitoring of personalized noncommunicable diseases prevention programs, for instance, sending sedentary behavior alerts, or providing personalized recommendations on physical activity. The system could be used at work to promote active breaks and healthy habits.  相似文献   

20.
To protect individual privacy in data mining, when a miner collects data from respondents, the respondents should remain anonymous. The existing technique of Anonymity-Preserving Data Collection partially solves this problem, but it assumes that the data do not contain any identifying information about the corresponding respondents. On the other hand, the existing technique of Privacy-Enhancing k-Anonymization can make the collected data anonymous by eliminating the identifying information. However, it assumes that each respondent submits her data through an unidentified communication channel. In this paper, we propose k-Anonymous Data Collection, which has the advantages of both Anonymity-Preserving Data Collection and Privacy-Enhancing k-Anonymization but does not rely on their assumptions described above. We give rigorous proofs for the correctness and privacy of our protocol, and experimental results for its efficiency. Furthermore, we extend our solution to the fully malicious model, in which a dishonest participant can deviate from the protocol and behave arbitrarily.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号