首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
关联规则相关性的度量   总被引:1,自引:0,他引:1  
用Apriori算法生成的关联规则包含有无用规则,甚至误导规则。为了使生成的规则更有效,引入了统计学中的卡方检验从统计意义上检验规则是否关联,并找到卡方检验值与相关系数的数量关系,实现了两种方法的统一,并用基于相关系数的算法去生成关联规则。  相似文献   

2.
关联模式挖掘研究是数据挖掘研究领域的重要分支之一,旨在发现项集之间存在的关联或相关关系。然而,传统的基于支持度一可信度框架的挖掘方法存在着一些不足:一是会产生过多的模式(包括频繁项集和规则);二是挖掘出来的规则有些是用户不感兴趣的,无用的,甚至是错误的;所以在挖掘过程中能有效地对无用模式进行剪枝是必要的。将卡方分析引入到模式的相关性度量中,利用卡方检验对项集之间、规则前件与后件之间的相关性进行度量是一种有效的剪枝方法。实验结果分析表明,在支持度度量的基础上引入卡方检验可以有效地对非相关模式进行剪枝,从而减小频繁项集和规则的规模。  相似文献   

3.
基于邻域粗糙集的属性约简算法在进行属性约简时只考虑单一属性对决策属性的影响,未能考虑各属性间的相关性,针对这个问题,提出了一种基于卡方检验的邻域粗糙集属性约简算法(ChiS-NRS)。首先,利用卡方检验计算相关性,在筛选重要属性时考虑相关属性之间的影响,在降低时间复杂度的同时提高了分类准确率;然后,将改进的算法与梯度提升决策树(GBDT)算法组合以建立分类模型,并在UCI数据集上对模型进行验证;最后,将该模型应用于预测肝癌微血管侵犯的发生。实验结果表明,与未约简、邻域粗糙集约简等几种约简算法相比,改进算法在一些UCI数据集上的分类准确率最高;在肝癌微血管侵犯预测中,与卷积神经网络(CNN)、支持向量机(SVM)、随机森林(RF)等预测模型相比,提出的模型在测试集上的预测准确率达到了88.13%,其灵敏度、特异度和受试者操作曲线(ROC)的曲线下面积(AUC)分别为87.10%、89.29%和0.90,各指标都达到了最好。因此,所提模型能更好地预测肝癌微血管侵犯的发生,能辅助医生进行更精确的诊断。  相似文献   

4.
基于卡方拟合优度检验的序列等概性测试组   总被引:1,自引:0,他引:1  
等概性是随机性中的一个重要特性,频数检验是最常见的等概性检验方法.分析了传统频数检验的片面性,利用卡方拟合优度检验提出了全面进行等概性检验的测试组,分别是码元频数检验、等长子序列频数检验和检验结果数据频数检验.使用测试组对国际上常用的随机样本的二进制展开序列进行检验,实验结果表明,该测试组具有较强的等概性检验能力.  相似文献   

5.
针对径向基函数(Radial Basis Functions,RBF)神经网络结构参数确定问题,提出了一种基于蛙跳算法优化RBF神经网络参数的新方法。将RBF神经网络参数组成一个多维向量,作为蛙跳算法中的参数进行优化。以适应度函数为标准,在可行解空间中搜索最优解,并对蛙跳算法进行了改进。非线性函数逼近实验结果表明,该优化算法相对标准遗传优化算法、粒子群优化算法有较小的均方误差,具有更好的逼近能力。  相似文献   

6.
关联模式挖掘研究是数据挖掘研究领域的重要分支之一,旨在发现项集之间存在的关联或相关关系.然而,传统的基于支持度-可信度框架的挖掘方法存在着一些不足:一是会产生过多的模式(包括频繁项集和规则);二是挖掘出来的规则有些是用户不感兴趣的、无用的,甚至是错误的.所以,在挖掘过程中有效地对无用模式进行剪枝是必要的.将卡方分析引入到模式的相关性度量中,利用卡方检验对项集之间、规则前件与后件之间的相关性进行度量是一种有效的剪枝方法.结果分析表明,在支持度度量的基础上引入卡方检验可以有效地对非相关模式进行剪枝,从而缩小频繁项集和规则的规模.  相似文献   

7.
特征选择是中文文本自动分类领域中极其重要的研究内容,其目的是为了解决特征空间高维性和文档表示向量稀疏性之间的矛盾。常用的特征选择方法有:文档频数、信息增益、互信息、期望交叉熵、卡方统计量和文本证据权等。在该本自动分类器KNN上对以上方法进行了比较研究,分析了各个特征评估函数的优劣,检测了这些方法在特征维数变化情况下的性能。  相似文献   

8.
特征选择是中文文本自动分类领域中极其重要的研究内容,其目的是为了解决特征空间高维性和文档表示向量稀疏性之间的矛盾。常用的特征选择方法有:文档频数、信息增益、互信息、期望交叉熵、卡方统计量和文本证据权等。在该本自动分类器KNN上对以上方法进行了比较研究,分析了各个特征评估函数的优劣,检测了这些方法在特征维数变化情况下的性能。  相似文献   

9.
特征选择通过去除无关和冗余特征提高学习算法性能,本质是组合优化问题。黑寡妇算法是模拟黑寡妇蜘蛛生命周期的元启发式算法,在收敛速度、适应度值优化等方面具有诸多优势。针对黑寡妇算法不能进行特征选择的问题,设计五种优化策略:二进制策略“、或门”策略、种群限制策略、快速生殖策略以及适应度优先策略,提出黑寡妇特征选择算法(black widow optimization feature selection algorithm,BWOFS)和生殖调控黑寡妇特征选择算法(procreation controlled black widow optimization feature selection algorithm,PCBWOFS),从特征空间中搜索有效特征子集。在多个分类、回归公共数据集上验证新方法,实验结果表明,相较其他对比方法(全集、AMB、SFS、SFFS、FSFOA),BWOFS和PCBWOFS能找到预测精度更高的特征子集,可提供有竞争力、有前景的结果,而且与BWOFS相比,PCBWOFS计算量更小,性能更好。  相似文献   

10.
用FoxproforWindours开发多媒体文物资料管理系统的开发过程中,由于文物档案中有卡片附照、拓片、铭文、绘图、声音资料等信息,如果这些信息逐个进行归档,显得比较繁琐。经过实践,笔者找到了一条对多媒体信息自动归档的途径为:1.建立文物藏品卡数据库ck,dbf(此程序未涉及到该数据库),其中有总登记号字段,另外建一个专门存放卡片附照、拓片、铭文、绘图、声音资料等信息的数据库gen.dbf其中也有总登记号字段,与文物藏品卡数据库产生联系。2建立一个数据库文件filell.dbf用来存放位图、声音文件所在子目录及文件名。为了区别卡…  相似文献   

11.
In breast cancer studies, researchers often use clustering algorithms to investigate similarity/dissimilarity among different cancer cases. The clustering algorithm design becomes a key factor to provide intrinsic disease information. However, the traditional algorithms do not meet the latest multiple requirements simultaneously for breast cancer objects. The Variable parameters, Variable densities, Variable weights, and Complicated Objects Clustering Algorithm (V3COCA) presented in this paper can handle these problems very well. The V3COCA (1) enables alternative inputs of none or a series of objects for disease research and computer aided diagnosis; (2) proposes an automatic parameter calculation strategy to create clusters with different densities; (3) enables noises recognition, and generates arbitrary shaped clusters; and (4) defines a flexibly weighted distance for measuring the dissimilarity between two complicated medical objects, which emphasizes certain medically concerned issues in the objects. The experimental results with 10,000 patient cases from SEER database show that V3COCA can not only meet the various requirements of complicated objects clustering, but also be as efficient as the traditional clustering algorithms.  相似文献   

12.
13.
Supersaturated designs (SSDs) are widely researched because they can greatly reduce the number of experiments. However, analyzing the data from SSDs is not easy as their run size is not large enough to estimate all the main effects. This paper introduces contrast-orthogonality cluster and anticontrast-orthogonality cluster to reflect the inner structure of SSDs which are helpful for experimenters to arrange factors to the columns of SSDs. A new strategy for screening active factors is proposed and named as contrast-orthogonality cluster analysis (COCA) method. Simulation studies demonstrate that this method performs well compared to most of the existing methods. Furthermore, the COCA method has lower type II errors and it is easy to be understood and implemented.  相似文献   

14.
Communication services that provide enhanced Quality of Service (QoS) guarantees related to dependability and real time are important for many applications in distributed systems. This paper presents real-time dependable (RTD) channels, a communication-oriented abstraction that can be configured to meet the QoS requirements of a variety of distributed applications. This customization ability is based on using CactusRT, a system that supports the construction of middleware services out of software modules called micro-protocols. Each micro-protocol implements a different semantic property or property variant and interacts with other micro-protocols using an event-driven model supported by the CactusRT runtime system. In addition to RTD channels CactusRT and its implementation are described. This prototype executes on a cluster of Pentium PCs running the OpenGroup/RI MK 7.3 Mach real-time operating system and CORDS, a system for building network protocols based on the x-kernel  相似文献   

15.
Today, development of e-commerce has provided many transaction databases with useful information for investigators exploring dependencies among the items. In data mining, the dependencies among different items can be shown using an association rule. The new fuzzy-genetic (FG) approach is designed to mine fuzzy association rules from a quantitative transaction database. Three important advantages are associated with using the FG approach: (1) the association rules can be extracted from the transaction database with a quantitative value; (2) extracting proper membership functions and support threshold values with the genetic algorithm will exert a positive effect on the mining process results; (3) expressing the association rules in a fuzzy representation is more understandable for humans. In this paper, we design a comprehensive and fast algorithm that mines level-crossing fuzzy association rules on multiple concept levels with learning support threshold values and membership functions using the cluster-based master–slave integrated FG approach. Mining the fuzzy association rules on multiple concept levels helps find more important, useful, accurate, and practical information.  相似文献   

16.
According to Pedraz-Delhaes, users evaluate both the product and the vendor on the basis of provided documentation. Thus, a question arises as to what quality characteristics should be taken into account when making a decision about accepting a given user manual. There are some proposals (e.g., ISO Std. 26513 and 26514), but they contain too many quality characteristics and lack orthogonality. The goal of this paper is to propose a simple quality model for user documentation, along with acceptance methods based on it. The model is to be orthogonal and complete. As a result, the COCA quality model is presented, which comprises four orthogonal quality characteristics: Completeness, Operability, Correctness, and Appearance. To check completeness, the proposed quality model has been compared with many other quality models that are directly or indirectly concerned with user documentation. Moreover, two acceptance methods are described in the paper: pure review based on ISO Std. 1028:2008, and documentation evaluation test (a type of browser evaluation test), which is aimed at assessing the operability of user documentation. Initial quality profiles have been empirically collected for both methods—they can be used when interpreting evaluation results obtained for a given user manual.  相似文献   

17.
罗兰  曾斌 《计算机工程》2010,36(19):110-112
针对目前周期关联规则难以划分时间区域和基础算法效率低等问题,提出一种基于周期关联规则的发现算法(CARDSATSV)。采用由项目支持度组成的时序向量作为时域数据特征点进行聚类,用DB Index准则控制聚类个数以达到最佳的聚类效果。给出CFP-tree算法来发现周期关联规则,利用基于条件FP-tree 的周期性剪裁技术提高算法效率。实验表明,和目前周期关联规则发现算法相比,CARDSATSV可以发现更多有用的周期关联规则,时空效率有一定的提高。  相似文献   

18.
In existing Active Access Control (AAC) models, the scalability and flexibility of security policy specification should be well balanced, especially: (1) authorizations to plenty of tasks should be simplified; (2) team workflows should be enabled; (3) fine-grained constraints should be enforced. To address this issue, a family of Association-Based Active Access Control (ABAAC) models is proposed. In the minimal model ABAAC0, users are assigned to roles while permissions are assigned to task-role associations. In a workflow case, to execute such an association some users assigned to its component role will be allocated. The association's assigned permissions can be performed by them during the task is running in the case. In ABAAC1, a generalized association is employed to extract common authorizations from multiple associations. In ABAAC2, a fine-grained separation of duty (SoD) is enforced among associations. In the maximal model ABAAC3, all these features are integrated, and similar constraints can be specified more concisely. Using a software workflow, case validation is performed. Comparison with a representative association based AAC model and the most scalable AAC model so far indicates that: (1) enough scalability is achieved; (2) without decomposition of a task, different permissions can be authorized to multiple roles in it; (3) separation of more fine-grained duties than roles and tasks can be enforced.  相似文献   

19.
一种新的关联规则挖掘算法研究   总被引:1,自引:0,他引:1  
:通过分析数据关联的特点和已有的关联规则挖掘算法 ,在定量描述的准确性和算法高效性方面作了进一步研究 ,提出了更准确的支持度和置信度定量描述方法和关联关系强弱的定量描述方法。同时 ,改进了 FP-growth挖掘算法 ,并应用于中医舌诊临床病例数据库挖掘实验中 ,可成功准确地提取中医舌诊诊断规则。测试结果表明该算法速度快、准确度高。  相似文献   

20.
Mining optimized association rules with categorical and numericattributes   总被引:1,自引:0,他引:1  
Mining association rules on large data sets has received considerable attention in recent years. Association rules are useful for determining correlations between attributes of a relation and have applications in marketing, financial, and retail sectors. Furthermore, optimized association rules are an effective way to focus on the most interesting characteristics involving certain attributes. Optimized association rules are permitted to contain uninstantiated attributes and the problem is to determine instantiations such that either the support or confidence of the rule is maximized. In this paper, we generalize the optimized association rules problem in three ways: (1) association rules are allowed to contain disjunctions over uninstantiated attributes, (2) association rules are permitted to contain an arbitrary number of uninstantiated attributes, and (3) uninstantiated attributes can be either categorical or numeric. Our generalized association rules enable us to extract more useful information about seasonal and local patterns involving multiple attributes. We present effective techniques for pruning the search space when computing optimized association rules for both categorical and numeric attributes. Finally, we report the results of our experiments that indicate that our pruning algorithms are efficient for a large number of uninstantiated attributes, disjunctions, and values in the domain of the attributes  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号