首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 828 毫秒
1.
一种连续值属性约简方法ReCA   总被引:1,自引:1,他引:0  
属性约简是Rough集理论的主要应用和研究内容之一.现有的各种属性约简方法大多适用于离散值属性.对于连续值属性的数据处理,通常做法是先对其进行离散化.这种先期对数据进行的处理会丢失一些信息,易于使约简产生错误.针对连续值信息系统,提出了一种新的属性约简方法ReCA,该方法将连续值属性离散化与属性约简过程融为一体,以基于信息熵的不确定性度量作为适应度函数。通过进化计算同时得到约简属性集合和离散化的断点集合.实验表明,该方法不仅可以有效地进行属性约简,而且与Rough集及C4.5两种方法相比,得到的属性数目少、测试精度较高.  相似文献   

2.
基于粗糙集的医疗数据挖掘研究与应用   总被引:1,自引:0,他引:1       下载免费PDF全文
医疗数据挖掘能够对现有病历数据库中数据进行自动分析并且提供有价值的医学知识。针对临床病历数据库中存在大量重复样本和冗余属性,从而影响医疗诊断的精度和速度这一问题,建立了基于信息论的粗糙集理论模型和SQL语言之间的关系,提出了基于SQL语言的条件信息熵属性约简算法,利用数据库查询语言实现了数据清洗、求核和属性约简等过程。实验结果表明该算法实现简单,运行效率高,为粗糙集理论更广泛地应用于具体的医疗数据挖掘提供了一种方法。  相似文献   

3.
We examine the issue of mining association rules among items in a large database of sales transactions. Mining association rules means that, given a database of sales transactions, to discover all associations among items such that the presence of some items in a transaction will imply the presence of other items in the same transaction. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items that appear in a sufficient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets first, and then, identifying, within this candidate set, these itemsets that meet the large itemset requirement. Generally, this is done iteratively for each large k-itemset in increasing order of k, where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate sets in early iterations is usually the dominating factor for the overall data mining performance. To address this issue, we develop an effective algorithm for the candidate set generation. It is a hash-based algorithm and is especially effective for the generation of a candidate set for large 2-itemsets. Explicitly, the number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods, thus resolving the performance bottleneck. Note that the generation of smaller candidate sets enables us to effectively trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations significantly. The advantage of the proposed algorithm also provides us the opportunity of reducing the amount of disk I/O required. An extensive simulation study is conducted to evaluate performance of the proposed algorithm  相似文献   

4.
现有的混合信息系统知识发现模型涵盖的数据类型大多为符号型、数值型条件属性及符号型决策属性,且大多数模型的关注点是属性约简或特征选择,针对规则提取的研究相对较少。针对涵盖更多数据类型的混合信息系统构建一个动态规则提取模型。首先修正了现有的属性值距离的计算公式,对错层型属性值的距离给出了一种定义形式,从而定义了一个新的混合距离。其次提出了针对数值型决策属性诱导决策类的3种方法。其后构造了广义邻域粗糙集模型,提出了动态粒度下的上下近似及规则提取算法,构建了基于邻域粒化的动态规则提取模型。该模型可用于具有以下特点的信息系统的规则提取: (1)条件属性集可包括单层符号型、错层符号型、数值型、区间型、集值型、未知型等; (2)决策属性集可包括符号型、数值型。利用UCI数据库中的数据集进行了对比实验,分类精度表明了规则提取算法的有效性。  相似文献   

5.
In recent years, some methods have been proposed to estimate values in relational database systems. However, the estimated accuracy of the existing methods are not good enough. In this paper, we present a new method to generate weighted fuzzy rules from relational database systems for estimating values using genetic algorithms (GAs), where the attributes appearing in the antecedent part of generated fuzzy rules have different weights. After a predefined number of evolutions of the GA, the best chromosome contains the optimal weights of the attributes, and they can be translated into a set of rules to be used for estimating values. The proposed method can get a higher average estimated accuracy rate than the methods we presented in two previous papers.  相似文献   

6.
移动复制数据库系统冲突检测及消解策略   总被引:9,自引:0,他引:9  
复制技术是提高移动库系统性能的一项关键技术,该文提出了一种新的移动复制数据库系统模型-事务级吉果集传递(TLRSP)移动复制模型,重点分析了该模型中的冲突检测及消解策略,并给出具体的实现算法,TLRSP移动复制模型允许移动用户在系统断连时存取数据库的本地副本并提交事务,重新连接时进行冲突的检测及消解,同时进行事务结果集的合并,最后通过增量刷新的方式进行同步处理,使得系统最终收敛于一致性的状态。此外,通过引入简化的事务日志,数据牌本号以及权限控制等概念,TLRSP模型有效地降低了移动数据库系统的资源消耗,保证了数据库的一致性,从而为移动数据库系统复制提供了一个可行的解决方案。  相似文献   

7.
LEARNING IN RELATIONAL DATABASES: A ROUGH SET APPROACH   总被引:49,自引:0,他引:49  
Knowledge discovery in databases, or dala mining, is an important direction in the development of data and knowledge-based systems. Because of the huge amount of data stored in large numbers of existing databases, and because the amount of data generated in electronic forms is growing rapidly, it is necessary to develop efficient methods to extract knowledge from databases. An attribute-oriented rough set approach has been developed for knowledge discovery in databases. The method integrates machine-learning paradigm, especially learning-from-examples techniques, with rough set techniques. An attribute-oriented concept tree ascension technique is first applied in generalization, which substantially reduces the computational complexity of database learning processes. Then the cause-effect relationship among the attributes in the database is analyzed using rough set techniques, and the unimportant or irrelevant attributes are eliminated. Thus concise and strong rules with little or no redundant information can be learned efficiently. Our study shows that attribute-oriented induction combined with rough set theory provide an efficient and effective mechanism for knowledge discovery in database systems.  相似文献   

8.
Mining association rules and mining sequential patterns both are to discover customer purchasing behaviors from a transaction database, such that the quality of business decision can be improved. However, the size of the transaction database can be very large. It is very time consuming to find all the association rules and sequential patterns from a large database, and users may be only interested in some information.

Moreover, the criteria of the discovered association rules and sequential patterns for the user requirements may not be the same. Many uninteresting information for the user requirements can be generated when traditional mining methods are applied. Hence, a data mining language needs to be provided such that users can query only interesting knowledge to them from a large database of customer transactions. In this paper, a data mining language is presented. From the data mining language, users can specify the interested items and the criteria of the association rules or sequential patterns to be discovered. Also, the efficient data mining techniques are proposed to extract the association rules and the sequential patterns according to the user requirements.  相似文献   


9.
印勇  田逢春 《计算机测量与控制》2002,10(11):759-761,770
利用粗集理论分析了关系数据库中属性间的因果关系,研究了从关系数据库中挖掘规则的方法,对该方法中条件属性的简化、提取规则的最小简化策略进行了详细讨论,给出了相应的算法。为从数据库中进行知识获取提供了一种新的途径。  相似文献   

10.
佟强  周园春  吴开超    阎保平 《计算机工程》2007,33(10):34-35,69
提出了一种新的挖掘量化关联规则的方法.该方法使用聚类算法把数据库中的交易记录分成若干个簇,把簇投影到数值型属性所在的域,形成重叠的、有意义的区间.实验结果显示,这种方法能够有效地挖掘量化关联规则,并且能够发现以前的算法可能遗漏的重要的规则.  相似文献   

11.
Set-valued ordered information systems   总被引:2,自引:0,他引:2  
Set-valued ordered information systems can be classified into two categories: disjunctive and conjunctive systems. Through introducing two new dominance relations to set-valued information systems, we first introduce the conjunctive/disjunctive set-valued ordered information systems, and develop an approach to queuing problems for objects in presence of multiple attributes and criteria. Then, we present a dominance-based rough set approach for these two types of set-valued ordered information systems, which is mainly based on substitution of the indiscernibility relation by a dominance relation. Through the lower/upper approximation of a decision, some certain/possible decision rules from a so-called set-valued ordered decision table can be extracted. Finally, we present attribute reduction (also called criteria reduction in ordered information systems) approaches to these two types of ordered information systems and ordered decision tables, which can be used to simplify a set-valued ordered information system and find decision rules directly from a set-valued ordered decision table. These criteria reduction approaches can eliminate those criteria that are not essential from the viewpoint of the ordering of objects or decision rules.  相似文献   

12.
Landslide incidence can be affected by a variety of environmental factors. Past studies have focused on the identification of these environmental factors, but most are based on statistical analysis. In this paper, spatial information techniques were applied to a case study of landslide occurrence in China by combining remote sensing and geographical information systems with an innovative data mining approach (rough set theory) and statistical analyses. Core and reducts of data attributes were obtained by data mining based on rough set theory. Rules for the impact factors, which can contribute to landslide occurrence, were generated from the landslide knowledge database. It was found that all 11 rules can be classified as both exact and approximate rules. In terms of importance, three main rules were then extracted as the key decision-making rules for landslide predictions. Meanwhile, the relationship between landslide occurrence and environmental factors was statistically analyzed to validate the accuracy of rules extracted by the rough set-based method. It was shown that the rough set-based approach is of use in analyzing environmental factors affecting landslide occurrence, and thus facilitates the decision-making process for landslide prediction.  相似文献   

13.
发掘多值属性的关联规则   总被引:45,自引:1,他引:45  
张朝晖  陆玉昌  张钹 《软件学报》1998,9(11):801-805
属性值可以取布尔量或多值量.从以布尔量描述的数据中发掘关联规则已经有比较成熟的系统和方法,而对于多值量则不然.将多值量的数据转化为布尔型的数据是一条方便、有效的途径.提出一种算法,根据数据本身的情况决定多值量的划分,进而将划分后的区段映射为布尔量,在此基础上可发掘容易理解且具有概括性的、有效的关联规则.  相似文献   

14.
一种基于粗集的目标识别信息提取算法   总被引:3,自引:0,他引:3  
目标识别的原始信息往往很粗糙难以直接用于计算.粗集理论是一种对数据进行处理和挖掘的不确定性系统理论,基于此理论提出一种对原始数据进行信息提取的算法.采用关系表存储原始信息,通过简化关系表删去冗余信息,达到提取有用信息的目的.该方法运用粗集理论强大的属性约简和规则生成能力,生成的规则简单准确.与其它计算方法比较,粗集在处理粗糙信息方面有计算量小、抗扰性和传递性好的优点.  相似文献   

15.
《Information Systems》2001,26(1):1-14
In this paper, we examine the two issues of mining association rules and mining sequential patterns in a large database of sales transactions. The problems of mining association rules and mining sequential patterns focus on discovering large itemsets and large sequences, respectively. We present PSI and PSI_seq for efficient large itemsets generation and large sequences generation, respectively. The main ideas of these two algorithms are using prestored information to minimize the numbers of candidate itemsets and candidate sequences counted in each database scan. The prestored informations for PSI and PSI_seq include the itemsets and the sequences along with their support counts found in the last mining, respectively. Typically a user may require to tune the value of the minimum support many times before a set of useful association rules can be obtained from the transaction database. Using prestored information, the total computation time will be reduced effectively. Empirical results show that our approaches outperform previous methods by an order of magnitude, using little storage space for the prestored information.  相似文献   

16.
支持分布式合作实时事务处理的协同检验点方法   总被引:1,自引:0,他引:1  
在实时事务执行时,事务故障或数据竞争会导致事务重启,为减少事务重启损失的工作量,可以采用检验点技术保证事务的时间正确性.在一类分布式实时数据库应用中,不同结点的事务通过消息交换形成合作关系,为保证合作事务间的全局一致性,当某一事务记检验点时,相关事务也要记检验点.传统协同检验点方法没有考虑应用的定时约束,不能很好地支持分布式合作实时事务处理.该文提出了一种基于图论的协同检验点方法,利用在每个计算结点上为每个合作事务集维护的局部有向图,使用一个基于图论的计算过程标识出应记检验点的事务,该方法既具有最小协同检验点特性,又使全局检验点的时延最小.实验表明该算法减少了全局检验点时延,有利于实时事务截止期的满足.  相似文献   

17.
18.
Classical data mining algorithms require expensive passes over the entire database to generate frequent items and hence to generate association rules. With the increase in the size of database, it is becoming very difficult to handle large amount of data for computation. One of the solutions to this problem is to generate sample from the database that acts as representative of the entire database for finding association rules in such a way that the distance of the sample from the complete database is minimal. Choosing correct sample that could represent data is not an easy task. Many algorithms have been proposed in the past. Some of them are computationally fast while others give better accuracy. In this paper, we present an algorithm for generating a sample from the database that can replace the entire database for generating association rules and is aimed at keeping a balance between accuracy and speed. The algorithm that is proposed takes into account the average number of small, medium and large 1-itemset in the database and average weight of the transactions to define threshold condition for the transactions. Set of transactions that satisfy the threshold condition is chosen as the representative for the entire database. The effectiveness of the proposed algorithm has been tested over several runs of database generated by IBM synthetic data generator. A vivid comparative performance evaluation of the proposed technique with the existing sampling techniques for comparing the accuracy and speed has also been carried out.  相似文献   

19.
Parallel Algorithms for Discovery of Association Rules   总被引:2,自引:0,他引:2  
Discovery of association rules is an important data mining task. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. Almost all of these algorithms make repeated passes over the database to determine the set of frequent itemsets (a subset of database items), thus incurring high I/O overhead. In the parallel case, most algorithms perform a sum-reduction at the end of each pass to construct the global counts, also incurring high synchronization cost. In this paper we describe new parallel association mining algorithms. The algorithms use novel itemset clustering techniques to approximate the set of potentially maximal frequent itemsets. Once this set has been identified, the algorithms make use of efficient traversal techniques to generate the frequent itemsets contained in each cluster. We propose two clustering schemes based on equivalence classes and maximal hypergraph cliques, and study two lattice traversal techniques based on bottom-up and hybrid search. We use a vertical database layout to cluster related transactions together. The database is also selectively replicated so that the portion of the database needed for the computation of associations is local to each processor. After the initial set-up phase, the algorithms do not need any further communication or synchronization. The algorithms minimize I/O overheads by scanning the local database portion only twice. Once in the set-up phase, and once when processing the itemset clusters. Unlike previous parallel approaches, the algorithms use simple intersection operations to compute frequent itemsets and do not have to maintain or search complex hash structures. Our experimental testbed is a 32-processor DEC Alpha cluster inter-connected by the Memory Channel network. We present results on the performance of our algorithms on various databases, and compare it against a well known parallel algorithm. The best new algorithm outperforms it by an order of magnitude.  相似文献   

20.
机群系统的可视化管理研究与实现:VisualNPC   总被引:2,自引:0,他引:2  
本文提出一种机群管理系统的实现方案 :Visual NPC.首先 ,采用分布式关系数据库系统来存储管理数据、资源情况、检查点等信息 ,使得数据检索与访问的速度优于采用文件系统存储时的速度 ;其次 ,采用 WEB用户界面来操控机群系统 ,使得在与机群系统联通的网络中的任意一台计算机上 ,都可使用 WEB浏览器控制与访问机群系统 ;最后 ,采用独立的管理服务器 ,使得这些管理操作对机群系统本身的运算影响最小 ,并且对这个独立的管理服务器作镜像容错处理 ,在成本和效率上要优于为每个计算节点作镜像容错处理  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号