首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
分类问题是数据挖掘中的基本问题之一,时间序列的特征表示及相似性度量是时间序列数据挖掘中分类、聚类及模式发现等任务的基础。SAX方法是一种典型的时间序列符号化表示方法,在采用该方法的基础上对时间序列进行分类,不仅可以有效地降维、降噪,而且具有简单、直观等特点,但是该方法有可能造成信息损失并影响到分类结果的准确性。为了弥补信息损失对分类结果的影响,采用了集成学习中大多数投票方法来弥补BOP表示后的信息损失,从而提高整个分类器的效率。针对一些样本在BOP表示中都损失了相似的重要信息,以至于大多数投票无法进一步提高分类效率的问题,进一步提出了结合集成学习中AdaBoost算法,通过对训练样本权重的调整,从而达到以提高分类器性能来弥补信息损失的效果。实验结果表明,将BOP方法与集成学习相结合的方法框架,不仅能很好地处理SAX符号化表示中的信息损失问题,而且与已有方法相比,在分类准确度方面也有显著的提高。  相似文献   

2.
3.
Multimedia data mining refers to pattern discovery, rule extraction and knowledge acquisition from multimedia database. Two typical tasks in multimedia data mining are of visual data classification and clustering in terms of semantics. Usually performance of such classification or clustering systems may not be favorable due to the use of low-level features for image representation, and also some improper similarity metrics for measuring the closeness between multimedia objects as well. This paper considers a problem of modeling similarity for semantic image clustering. A collection of semantic images and feed-forward neural networks are used to approximate a characteristic function of equivalence classes, which is termed as a learning pseudo metric (LPM). Empirical criteria on evaluating the goodness of the LPM are established. A LPM based k-Mean rule is then employed for the semantic image clustering practice, where two impurity indices, classification performance and robustness are used for performance evaluation. An artificial image database with 11 semantics is employed for our simulation studies. Results demonstrate the merits and usefulness of our proposed techniques for multimedia data mining.  相似文献   

4.
关联分类是一项重要的分类技术,目前普遍采用基于支持度和置信度的关联分类模式。但是,用支持度度量项集的分类能力过于简单,且置信度不能度量项集与类的相关性,所以利用支持度和置信度容易产生质量不好的规则。提出改进的关联分类算法—ACSER。ACSER不仅考虑项集到本类的支持度,也考虑项集到补类的支持度。首先,提取频繁增比模式作为分类候选规则集;其次,利用置信度和增比率度量规则的强度,按照其强度进行排序和剪枝;最后,选择k条最优的规则进行预测。在16个UCI数据集上的实验结果表明,改进的分类算法ACSER与传统的分类算法相比有更高的分类准确率。  相似文献   

5.
Privacy preservation is becoming a critical issue to data‐mining processes. In practice, a data transformation process is often needed to preserve privacy. However, data transformation would introduce a data quality issue. In this case, the impact on data quality due to the data transformation should be estimated and made clear to the user of the data transformation process. In this article, we consider the problem of k‐anonymization transformation in associative classification. The privacy preservation and data quality issues are considered in twofold. First, we propose a frequency‐based data quality metric to represent the data quality for associative classification. Second, a novel heuristic algorithm, namely minimum classification correction rate transformation, is proposed. The algorithm is guided by the classification correction rate of the given datasets. We validate our proposed metric and algorithm with University of California–Irvine repository datasets. The experiment results have shown that our proposed metric can effectively demonstrate the data quality for associative classification. The results also show that the proposed algorithm is not only efficient but also highly effective.  相似文献   

6.
In the database retrieval and nearest neighbor classification tasks, the two basic problems are to represent the query and database objects, and to learn the ranking scores of the database objects to the query. Many studies have been conducted for the representation learning and the ranking score learning problems, however, they are always learned independently from each other. In this paper, we argue that there are some inner relationships between the representation and ranking of database objects, and try to investigate their relationships by learning them in a unified way. To this end, we proposed the Unified framework for Representation and Ranking (UR2) of objects for the database retrieval and nearest neighbor classification tasks. The learning of representation parameter and the ranking scores are modeled within one single unified objective function. The objective function is optimized alternately with regard to representation parameter and the ranking scores. Based on the optimization results, iterative algorithms are developed to learn the representation parameter and the ranking scores on a unified way. Moreover, with two different formulas of representation (feature selection and subspace learning), we give two versions of UR2. The proposed algorithms are tested on two challenging tasks – MRI image based brain tumor retrieval and nearest neighbor classification based protein identification. The experiments show the advantage of the proposed unified framework over the state-of-the-art independent representation and ranking methods.  相似文献   

7.
增量式关联分类方法在病毒检测中的应用   总被引:2,自引:2,他引:0       下载免费PDF全文
传统关联规则挖掘算法主要基于支持度一可信度构架,时空开销的限制使其无法深入挖掘非频繁项集。171前对带类属性的关联分类增量学习研究较少,该文提出一种新的增量式关联分类方法,解决了带类属性数据的增量学习问题,在数据频繁更新时,实现有限时空开销下关联规则的快速提取和维护。实验结果表明,该方法能有效维护并更新关联规则,避免重复学习历史样本,保证分类模型的预测能力。  相似文献   

8.
Text categorization is one of the most common themes in data mining and machine learning fields. Unlike structured data, unstructured text data is more difficult to be analyzed because it contains complicated both syntactic and semantic information. In this paper, we propose a two-level representation model (2RM) to represent text data, one is for representing syntactic information and the other is for semantic information. Each document, in syntactic level, is represented as a term vector where the value of each component is the term frequency and inverse document frequency. The Wikipedia concepts related to terms in syntactic level are used to represent document in semantic level. Meanwhile, we designed a multi-layer classification framework (MLCLA) to make use of the semantic and syntactic information represented in 2RM model. The MLCLA framework contains three classifiers. Among them, two classifiers are applied on syntactic level and semantic level in parallel. The outputs of these two classifiers will be combined and input to the third classifier, so that the final results can be obtained. Experimental results on benchmark data sets (20Newsgroups, Reuters-21578 and Classic3) have shown that the proposed 2RM model plus MLCLA framework improves the text classification performance by comparing with the existing flat text representation models (Term-based VSM, Term Semantic Kernel Model, Concept-based VSM, Concept Semantic Kernel Model and Term + Concept VSM) plus existing classification methods.  相似文献   

9.
We propose a novel framework for generating classification rules from relational data. This is a specialized version of the general framework intended for mining relational data and is defined in granular computing theory. In the framework proposed in this paper we define a method for deriving information granules from relational data. Such granules are the basis for generating relational classification rules. In our approach we follow the granular computing idea of switching between different levels of granularity of the universe. Thanks to this a granule-based relational data representation can easily be replaced by another one and thereby adjusted to a given data mining task, e.g. classification. A generalized relational data representation, as defined in the framework, can be treated as the search space for generating rules. On account of this the size of the search space may significantly be limited. Furthermore, our framework, unlike others, unifies not only the way the data and rules to be derived are expressed and specified, but also partially the process of generating rules from the data. Namely, the rules can be directly obtained from the information granules or constructed based on them.  相似文献   

10.
Concept drift constitutes a challenging problem for the machine learning and data mining community that frequently appears in real world stream classification problems. It is usually defined as the unforeseeable concept change of the target variable in a prediction task. In this paper, we focus on the problem of recurring contexts, a special sub-type of concept drift, that has not yet met the proper attention from the research community. In the case of recurring contexts, concepts may re-appear in future and thus older classification models might be beneficial for future classifications. We propose a general framework for classifying data streams by exploiting stream clustering in order to dynamically build and update an ensemble of incremental classifiers. To achieve this, a transformation function that maps batches of examples into a new conceptual representation model is proposed. The clustering algorithm is then applied in order to group batches of examples into concepts and identify recurring contexts. The ensemble is produced by creating and maintaining an incremental classifier for every concept discovered in the data stream. An experimental study is performed using (a) two new real-world concept drifting datasets from the email domain, (b) an instantiation of the proposed framework and (c) five methods for dealing with drifting concepts. Results indicate the effectiveness of the proposed representation and the suitability of the concept-specific classifiers for problems with recurring contexts.  相似文献   

11.
应用于空间关联规则挖掘的ILP方法   总被引:2,自引:0,他引:2  
李宏  蔡之华 《计算机工程与应用》2003,39(16):188-191,197
文章介绍了应用于空间关联规则挖掘的ILP方法。ILP方法全称为归纳逻辑程序设计,这种方法有利于从空间领域发现有价值的知识,系统地研究地理层的层次结构,处理诸多空间对象的空间特性。这种方法已在一个ILP系统SPADA中实现,该文将通过SPADA应用空间数据的一些实例来说明ILP方法的特点。  相似文献   

12.
In this paper we introduce a method called CL.E.D.M. (CLassification through ELECTRE and Data Mining), that employs aspects of the methodological framework of the ELECTRE I outranking method, and aims at increasing the accuracy of existing data mining classification algorithms. In particular, the method chooses the best decision rules extracted from the training process of the data mining classification algorithms, and then it assigns the classes that correspond to these rules, to the objects that must be classified. Three well known data mining classification algorithms are tested in five different widely used databases to verify the robustness of the proposed method.  相似文献   

13.
《Applied Soft Computing》2007,7(3):1102-1111
Classification and association rule discovery are important data mining tasks. Using association rule discovery to construct classification systems, also known as associative classification, is a promising approach. In this paper, a new associative classification technique, Ranked Multilabel Rule (RMR) algorithm is introduced, which generates rules with multiple labels. Rules derived by current associative classification algorithms overlap in their training objects, resulting in many redundant and useless rules. However, the proposed algorithm resolves the overlapping between rules in the classifier by generating rules that does not share training objects during the training phase, resulting in a more accurate classifier. Results obtained from experimenting on 20 binary, multi-class and multi-label data sets show that the proposed technique is able to produce classifiers that contain rules associated with multiple classes. Furthermore, the results reveal that removing overlapping of training objects between the derived rules produces highly competitive classifiers if compared with those extracted by decision trees and other associative classification techniques, with respect to error rate.  相似文献   

14.
Inducing Multi-Level Association Rules from Multiple Relations   总被引:4,自引:0,他引:4  
  相似文献   

15.
王省  康昭 《计算机科学》2021,48(3):124-129
近年来,基于图的半监督分类是机器学习与数据挖掘领域的研究热点之一。该类方法一般通过构造图来挖掘数据中隐含的信息,并利用图的结构信息来对无标签样本进行分类。因此,半监督分类的效果严重依赖于图的质量。文中提出了一种基于光滑表示的半监督分类算法。具体来说,此方法通过应用一个低通滤波器来实现数据的平滑,然后将光滑数据用于半监督分类。此外,所提方法将常见的图构造和标签传播集成到一个统一的优化框架中,使它们互相促进,从而避免低质量图导致的次优解。对人脸和物品数据集进行大量实验,结果表明,所提SRSSC算法在大部分情况下都优于其他算法,从而证明了光滑表示的重要性。  相似文献   

16.
Classification, a data mining technique, has widespread applications including medical diagnosis, targeted marketing, and others. Knowledge discovery from databases in the form of association rules is one of the important data mining tasks. An integrated approach, classification based on association rules, has drawn the attention of the data mining community over the last decade. While attention has been mainly focused on increasing classifier accuracies, not much efforts have been devoted towards building interpretable and less complex models. This paper discusses the development of a compact associative classification model using a hill-climbing approach and fuzzy sets. The proposed methodology builds the rule-base by selecting rules which contribute towards increasing training accuracy, thus balancing classification accuracy with the number of classification association rules. The results indicated that the proposed associative classification model can achieve competitive accuracies on benchmark datasets with continuous attributes and lend better interpretability, when compared with other rule-based systems.  相似文献   

17.
关联分类及较多的改进算法很难同时既具有较高的整体准确率又有较好的小类分类性能。针对此问题,提出了一种基于类支持度阈值独立挖掘的关联分类改进算法—ACCS。ACCS算法的主要特点是:(1)根据训练集中各类数量大小给出每个类类支持度阈值的设定方法,并基于各类的类支持度阈值独立挖掘该类的关联分类规则,尽量使小类生成更多高置信度的规则;(2)采用类支持度对置信度相同的规则排序,提高小类规则的优先级;(3)用综合考虑置信度和提升度的新的规则度量预测未知实例。在多个数据集上的实验结果表明,相比多种关联分类改进算法,ACCS算法有更高的整体分类准确率,且在不平衡数据上也能取得较好的小类分类性能。  相似文献   

18.
多关系数据分类是多关系数据挖掘重要任务之一,它能够直接从多关系数据表中发现有效模式,比命题分类方法具有更大优势。根据知识表示形式及相关策略的不同将多关系数据分类分为归纳逻辑程序设计关系分类方法、图的关系分类方法和基于关系数据库的关系分类方法。着重论述了它们所采用的具体关系分类技术及其特点,对这些方法进行了对比,最后讨论了它们当前所面临的挑战性问题。  相似文献   

19.
Using Wikipedia knowledge to improve text classification   总被引:7,自引:7,他引:0  
Text classification has been widely used to assist users with the discovery of useful information from the Internet. However, traditional classification methods are based on the “Bag of Words” (BOW) representation, which only accounts for term frequency in the documents, and ignores important semantic relationships between key terms. To overcome this problem, previous work attempted to enrich text representation by means of manual intervention or automatic document expansion. The achieved improvement is unfortunately very limited, due to the poor coverage capability of the dictionary, and to the ineffectiveness of term expansion. In this paper, we automatically construct a thesaurus of concepts from Wikipedia. We then introduce a unified framework to expand the BOW representation with semantic relations (synonymy, hyponymy, and associative relations), and demonstrate its efficacy in enhancing previous approaches for text classification. Experimental results on several data sets show that the proposed approach, integrated with the thesaurus built from Wikipedia, can achieve significant improvements with respect to the baseline algorithm.
Pu WangEmail:
  相似文献   

20.
基于Rough Set的空间数据分类方法   总被引:17,自引:1,他引:17  
石云  孙玉芳  左春 《软件学报》2000,11(5):673-678
近来,数据采掘的研究已从关系型和事务型数据库扩展到空间数据库.空间数据采掘是一个很有发展前景的领域,其中空间数据分类的研究尚处在起步阶段.该文分析和比较了现有的几个空间数据分类方法的利和弊,提出利用Rough Set的三阶段空间分类过程.实验结果表明,该算法对于解决包含不完整空间信息的问题是有效的.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号