首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
关联规则相关性的度量   总被引:1,自引:0,他引:1  
用Apriori算法生成的关联规则包含有无用规则,甚至误导规则。为了使生成的规则更有效,引入了统计学中的卡方检验从统计意义上检验规则是否关联,并找到卡方检验值与相关系数的数量关系,实现了两种方法的统一,并用基于相关系数的算法去生成关联规则。  相似文献   

2.
关联模式挖掘研究是数据挖掘研究领域的重要分支之一,旨在发现项集之间存在的关联或相关关系。然而,传统的基于支持度一可信度框架的挖掘方法存在着一些不足:一是会产生过多的模式(包括频繁项集和规则);二是挖掘出来的规则有些是用户不感兴趣的,无用的,甚至是错误的;所以在挖掘过程中能有效地对无用模式进行剪枝是必要的。将卡方分析引入到模式的相关性度量中,利用卡方检验对项集之间、规则前件与后件之间的相关性进行度量是一种有效的剪枝方法。实验结果分析表明,在支持度度量的基础上引入卡方检验可以有效地对非相关模式进行剪枝,从而减小频繁项集和规则的规模。  相似文献   

3.
基于邻域粗糙集的属性约简算法在进行属性约简时只考虑单一属性对决策属性的影响,未能考虑各属性间的相关性,针对这个问题,提出了一种基于卡方检验的邻域粗糙集属性约简算法(ChiS-NRS)。首先,利用卡方检验计算相关性,在筛选重要属性时考虑相关属性之间的影响,在降低时间复杂度的同时提高了分类准确率;然后,将改进的算法与梯度提升决策树(GBDT)算法组合以建立分类模型,并在UCI数据集上对模型进行验证;最后,将该模型应用于预测肝癌微血管侵犯的发生。实验结果表明,与未约简、邻域粗糙集约简等几种约简算法相比,改进算法在一些UCI数据集上的分类准确率最高;在肝癌微血管侵犯预测中,与卷积神经网络(CNN)、支持向量机(SVM)、随机森林(RF)等预测模型相比,提出的模型在测试集上的预测准确率达到了88.13%,其灵敏度、特异度和受试者操作曲线(ROC)的曲线下面积(AUC)分别为87.10%、89.29%和0.90,各指标都达到了最好。因此,所提模型能更好地预测肝癌微血管侵犯的发生,能辅助医生进行更精确的诊断。  相似文献   

4.
基于卡方拟合优度检验的序列等概性测试组   总被引:1,自引:0,他引:1  
等概性是随机性中的一个重要特性,频数检验是最常见的等概性检验方法.分析了传统频数检验的片面性,利用卡方拟合优度检验提出了全面进行等概性检验的测试组,分别是码元频数检验、等长子序列频数检验和检验结果数据频数检验.使用测试组对国际上常用的随机样本的二进制展开序列进行检验,实验结果表明,该测试组具有较强的等概性检验能力.  相似文献   

5.
针对径向基函数(Radial Basis Functions,RBF)神经网络结构参数确定问题,提出了一种基于蛙跳算法优化RBF神经网络参数的新方法。将RBF神经网络参数组成一个多维向量,作为蛙跳算法中的参数进行优化。以适应度函数为标准,在可行解空间中搜索最优解,并对蛙跳算法进行了改进。非线性函数逼近实验结果表明,该优化算法相对标准遗传优化算法、粒子群优化算法有较小的均方误差,具有更好的逼近能力。  相似文献   

6.
关联模式挖掘研究是数据挖掘研究领域的重要分支之一,旨在发现项集之间存在的关联或相关关系.然而,传统的基于支持度-可信度框架的挖掘方法存在着一些不足:一是会产生过多的模式(包括频繁项集和规则);二是挖掘出来的规则有些是用户不感兴趣的、无用的,甚至是错误的.所以,在挖掘过程中有效地对无用模式进行剪枝是必要的.将卡方分析引入到模式的相关性度量中,利用卡方检验对项集之间、规则前件与后件之间的相关性进行度量是一种有效的剪枝方法.结果分析表明,在支持度度量的基础上引入卡方检验可以有效地对非相关模式进行剪枝,从而缩小频繁项集和规则的规模.  相似文献   

7.
特征选择是中文文本自动分类领域中极其重要的研究内容,其目的是为了解决特征空间高维性和文档表示向量稀疏性之间的矛盾。常用的特征选择方法有:文档频数、信息增益、互信息、期望交叉熵、卡方统计量和文本证据权等。在该本自动分类器KNN上对以上方法进行了比较研究,分析了各个特征评估函数的优劣,检测了这些方法在特征维数变化情况下的性能。  相似文献   

8.
特征选择是中文文本自动分类领域中极其重要的研究内容,其目的是为了解决特征空间高维性和文档表示向量稀疏性之间的矛盾。常用的特征选择方法有:文档频数、信息增益、互信息、期望交叉熵、卡方统计量和文本证据权等。在该本自动分类器KNN上对以上方法进行了比较研究,分析了各个特征评估函数的优劣,检测了这些方法在特征维数变化情况下的性能。  相似文献   

9.
特征选择通过去除无关和冗余特征提高学习算法性能,本质是组合优化问题。黑寡妇算法是模拟黑寡妇蜘蛛生命周期的元启发式算法,在收敛速度、适应度值优化等方面具有诸多优势。针对黑寡妇算法不能进行特征选择的问题,设计五种优化策略:二进制策略“、或门”策略、种群限制策略、快速生殖策略以及适应度优先策略,提出黑寡妇特征选择算法(black widow optimization feature selection algorithm,BWOFS)和生殖调控黑寡妇特征选择算法(procreation controlled black widow optimization feature selection algorithm,PCBWOFS),从特征空间中搜索有效特征子集。在多个分类、回归公共数据集上验证新方法,实验结果表明,相较其他对比方法(全集、AMB、SFS、SFFS、FSFOA),BWOFS和PCBWOFS能找到预测精度更高的特征子集,可提供有竞争力、有前景的结果,而且与BWOFS相比,PCBWOFS计算量更小,性能更好。  相似文献   

10.
为了高效求解具有单连续变量的背包问题(KPC),首先基于高斯误差函数提出了一个新颖S型转换函数,给出了利用该转换函数将一个实向量转换为0-1向量的新方法,由此提出了一个新的二进制粒子群优化(NBPSO)算法;然后,利用KPC的第二数学模型,并且把NBPSO与处理KPC不可行解的有效算法相结合,提出了求解KPC的一个新方法。为了检验NBPSO求解KPC的性能,利用NBPSO求解四类大规模KPC实例,并把所得计算结果与基于其他S、V型转换函数的二进制粒子群优化算法(BPSO)、具有混合编码的单种群二进制差分演化算法(S-HBDE)、具有混合编码的双种群二进制差分演化算法(B-HBDE)和二进制粒子群优化算法(BPSO)等的计算结果相比较。比较结果表明NBPSO不仅平均计算结果更优,而且稳定性更佳,说明NBPSO的性能比其他算法有显著提升。  相似文献   

11.
In breast cancer studies, researchers often use clustering algorithms to investigate similarity/dissimilarity among different cancer cases. The clustering algorithm design becomes a key factor to provide intrinsic disease information. However, the traditional algorithms do not meet the latest multiple requirements simultaneously for breast cancer objects. The Variable parameters, Variable densities, Variable weights, and Complicated Objects Clustering Algorithm (V3COCA) presented in this paper can handle these problems very well. The V3COCA (1) enables alternative inputs of none or a series of objects for disease research and computer aided diagnosis; (2) proposes an automatic parameter calculation strategy to create clusters with different densities; (3) enables noises recognition, and generates arbitrary shaped clusters; and (4) defines a flexibly weighted distance for measuring the dissimilarity between two complicated medical objects, which emphasizes certain medically concerned issues in the objects. The experimental results with 10,000 patient cases from SEER database show that V3COCA can not only meet the various requirements of complicated objects clustering, but also be as efficient as the traditional clustering algorithms.  相似文献   

12.
13.
Supersaturated designs (SSDs) are widely researched because they can greatly reduce the number of experiments. However, analyzing the data from SSDs is not easy as their run size is not large enough to estimate all the main effects. This paper introduces contrast-orthogonality cluster and anticontrast-orthogonality cluster to reflect the inner structure of SSDs which are helpful for experimenters to arrange factors to the columns of SSDs. A new strategy for screening active factors is proposed and named as contrast-orthogonality cluster analysis (COCA) method. Simulation studies demonstrate that this method performs well compared to most of the existing methods. Furthermore, the COCA method has lower type II errors and it is easy to be understood and implemented.  相似文献   

14.
Communication services that provide enhanced Quality of Service (QoS) guarantees related to dependability and real time are important for many applications in distributed systems. This paper presents real-time dependable (RTD) channels, a communication-oriented abstraction that can be configured to meet the QoS requirements of a variety of distributed applications. This customization ability is based on using CactusRT, a system that supports the construction of middleware services out of software modules called micro-protocols. Each micro-protocol implements a different semantic property or property variant and interacts with other micro-protocols using an event-driven model supported by the CactusRT runtime system. In addition to RTD channels CactusRT and its implementation are described. This prototype executes on a cluster of Pentium PCs running the OpenGroup/RI MK 7.3 Mach real-time operating system and CORDS, a system for building network protocols based on the x-kernel  相似文献   

15.
According to Pedraz-Delhaes, users evaluate both the product and the vendor on the basis of provided documentation. Thus, a question arises as to what quality characteristics should be taken into account when making a decision about accepting a given user manual. There are some proposals (e.g., ISO Std. 26513 and 26514), but they contain too many quality characteristics and lack orthogonality. The goal of this paper is to propose a simple quality model for user documentation, along with acceptance methods based on it. The model is to be orthogonal and complete. As a result, the COCA quality model is presented, which comprises four orthogonal quality characteristics: Completeness, Operability, Correctness, and Appearance. To check completeness, the proposed quality model has been compared with many other quality models that are directly or indirectly concerned with user documentation. Moreover, two acceptance methods are described in the paper: pure review based on ISO Std. 1028:2008, and documentation evaluation test (a type of browser evaluation test), which is aimed at assessing the operability of user documentation. Initial quality profiles have been empirically collected for both methods—they can be used when interpreting evaluation results obtained for a given user manual.  相似文献   

16.
Today, development of e-commerce has provided many transaction databases with useful information for investigators exploring dependencies among the items. In data mining, the dependencies among different items can be shown using an association rule. The new fuzzy-genetic (FG) approach is designed to mine fuzzy association rules from a quantitative transaction database. Three important advantages are associated with using the FG approach: (1) the association rules can be extracted from the transaction database with a quantitative value; (2) extracting proper membership functions and support threshold values with the genetic algorithm will exert a positive effect on the mining process results; (3) expressing the association rules in a fuzzy representation is more understandable for humans. In this paper, we design a comprehensive and fast algorithm that mines level-crossing fuzzy association rules on multiple concept levels with learning support threshold values and membership functions using the cluster-based master–slave integrated FG approach. Mining the fuzzy association rules on multiple concept levels helps find more important, useful, accurate, and practical information.  相似文献   

17.
罗兰  曾斌 《计算机工程》2010,36(19):110-112
针对目前周期关联规则难以划分时间区域和基础算法效率低等问题,提出一种基于周期关联规则的发现算法(CARDSATSV)。采用由项目支持度组成的时序向量作为时域数据特征点进行聚类,用DB Index准则控制聚类个数以达到最佳的聚类效果。给出CFP-tree算法来发现周期关联规则,利用基于条件FP-tree 的周期性剪裁技术提高算法效率。实验表明,和目前周期关联规则发现算法相比,CARDSATSV可以发现更多有用的周期关联规则,时空效率有一定的提高。  相似文献   

18.
In existing Active Access Control (AAC) models, the scalability and flexibility of security policy specification should be well balanced, especially: (1) authorizations to plenty of tasks should be simplified; (2) team workflows should be enabled; (3) fine-grained constraints should be enforced. To address this issue, a family of Association-Based Active Access Control (ABAAC) models is proposed. In the minimal model ABAAC0, users are assigned to roles while permissions are assigned to task-role associations. In a workflow case, to execute such an association some users assigned to its component role will be allocated. The association's assigned permissions can be performed by them during the task is running in the case. In ABAAC1, a generalized association is employed to extract common authorizations from multiple associations. In ABAAC2, a fine-grained separation of duty (SoD) is enforced among associations. In the maximal model ABAAC3, all these features are integrated, and similar constraints can be specified more concisely. Using a software workflow, case validation is performed. Comparison with a representative association based AAC model and the most scalable AAC model so far indicates that: (1) enough scalability is achieved; (2) without decomposition of a task, different permissions can be authorized to multiple roles in it; (3) separation of more fine-grained duties than roles and tasks can be enforced.  相似文献   

19.
一种新的关联规则挖掘算法研究 *   总被引:1,自引:0,他引:1  
:通过分析数据关联的特点和已有的关联规则挖掘算法 ,在定量描述的准确性和算法高效性方面作了进一步研究 ,提出了更准确的支持度和置信度定量描述方法和关联关系强弱的定量描述方法。同时 ,改进了 FP-growth挖掘算法 ,并应用于中医舌诊临床病例数据库挖掘实验中 ,可成功准确地提取中医舌诊诊断规则。测试结果表明该算法速度快、准确度高。  相似文献   

20.
谢皝  张平伟  罗晟 《计算机工程》2011,37(19):44-46
在模糊关联规则的挖掘过程中,很难预先知道每个属性合适的模糊集。针对该问题,提出基于次胜者受罚竞争学习的模糊关联规则挖掘算法,无需先验知识,即可根据每个属性的性质找出对应的模糊集,并确定模糊集的数目。实验结果表明,与同类算法相比,该算法可以挖掘出更多有趣的关联规则。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号