首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
一种基于Apriori思想的频繁子图发现算法   总被引:1,自引:0,他引:1       下载免费PDF全文
如今,关联规则技术应用在许多非传统领域,许多已有的频繁项集搜索方法已经不适用了。一种解决的方法就是用图的形式表示这些领域的事务,然后利用基于图论的数据挖掘技术发现频繁子图。本文提出了一种基于Aproiri思想的频繁子图发现算法SLAGM,它可以有效地挖掘简单图中的频繁子图。实验证明,该算法在性能上优于另一种子图挖掘算法AGM。  相似文献   

2.
目前已提出了许多基于Apriori算法思想的频繁项目集挖掘算法,这些算法可以有效地挖掘出事务数据库中的短频繁项目集,但对于长频繁项目集的挖掘而言,其性能将明显下降.为此,提出了一种频繁闭项目集挖掘算法MFCIA,该算法可以有效地挖掘出事务数据库中所有的频繁项目集,并对其更新问题进行了研究,提出了一种相应的频繁闭项目集增量式更新算法UMFCIA,该算法将充分利用先前的挖掘结果来节省发现新的频繁闭项目集的时间开销.实验结果表明算法MFCIA是有效可行的.  相似文献   

3.
目前大多数频繁子树算法都是挖掘频繁子树完全集,这些算法数据搜索空间的内存开销和输出的结果集都非常庞大.为了减小结果集,提出基于子树约束的最大频繁子树算法--CSMTreeMiner,采用垂直和层次扩展的方法来枚举频繁子树,并使用覆盖关系来对不可能生成最大频繁子树的模式进行删除.实验结果验证CSMTreeMiner算法的有效性和稳定性.  相似文献   

4.
数据流闭频繁项集挖掘算法得到了广泛的研究,其中一个典型的工作就是NewMomen、算法。针对New- Moment算法存在搜索空间大而造成算法时间效率低的问题,提出了一种改进的数据流闭频繁项集挖掘算法A-Ncw- Moment。它设计了一个二进制位表示项目与扩展的频繁项目列表相结合的数据结构,来记录数据流信息及闭频繁项 集。在窗体初始阶段,首先挖掘频繁1一项集所产生的支持度为最大的最长闭频繁项集,接着提出新的“不需扩展策略” 和“向下扩展策略”来避免生成大量中间结果,快速发现其余闭频繁项集,达到极大缩小搜索空间的目的。在窗体滑动 阶段,提出“动态不频繁剪枝策略”来从已生成的闭频繁项集中快速删除非闭频繁项集,并提出“动态不搜索策略”来动 态维护所有闭频繁项集的生成,以降低闭频繁项集的维护代价,提高算法的效率。理论分析与实验结果表明,A-New- Moment算法具有较好的性能。  相似文献   

5.
一种多空间聚类算法   总被引:1,自引:0,他引:1  
CLARANS算法是经典的划分聚类算法,其核心思想是采用随机重启的局部搜索方式搜索中心点.由于搜索空阀布满了局部最优解的“陷阱”,因此它难以获得全局最优解,从而影响了聚类质量.针对这个缺点,本文将多空闻思想与CLARANS算法相结合,提出了基于多空间思想的CLARANS算法-CABMS(CLARANS Algorithm Based on Multi—Space).该算法的基本思路是采用空间变换策略构造一系列光滑程度不同的搜索空间,在不同的搜索空间中执行CLARANS算法,并利用前层搜索空间的聚类结果来引导本层搜索空间的聚类.CABMS能够跳过局部最优解的“陷阱”,增大获得全局最优解的概率,达到提高聚类质量的目的.本文给出了等距法多空间构造策略,并通过实验对比了CLARANS算法与CABMS算法的聚类质量.实验结果表明,CABMS的聚类质量较CLARANS有较大改进.  相似文献   

6.
讨论分布式数据库系统中最小支持度变化时频繁项目集如何高效更新问题,提出了一种基于最小支持度变化的局部频繁项目集的更新算法ULFS和全局频繁项目集的更新算法UGFS.该算法能够充分利用已挖掘的结果.并且产生较少数量的候选频繁项目集,在求解全局频繁项目集过程中.候选局部频繁项目集支持数的通信量为O(n).将文章提出的算法用Java加以实现.并时算法性能进行了研究.实验结果表明这些算法是可行、有效的.并且具有较快的速度.  相似文献   

7.
最大频繁项目集的快速更新   总被引:29,自引:0,他引:29  
挖掘最大频繁项目集是多种数据挖掘应用中的关键问题.为克服基于Apriori的最大频繁项目集挖掘算法存在的不足,DMFIA采用FP-tree存储结构及自顶向下的搜索策略,有效地提高了最大频繁项目集的挖掘效率.但对于频繁项目多而最大频繁项目集维数相对较小的情况,DMFIA要经过多层搜索且在每一层产生大量的候选项目集,因而影响算法的执行效率.为此,该文提出了DMFIA的改进算法IDMFIA(the Improved algorithm of DMFIA).IDMFIA采用自顶向下和自底向上双向搜索策略,可尽早修剪掉较短最大频繁项目集的超集和较长最大频繁项目集的子集.另外,该文还提出最大频繁项目集更新算法FUMFIA(Fast Updating Maximum Frequent Itemsets Algorithm),该算法充分利用已建立的FP-tree和已挖掘的最大频繁项目集,可对已挖掘的最大频繁项目集进行高效维护.实验结果表明,IDMFIA和FUMFIA可有效提高最大频繁项目集的挖掘和更新效率.  相似文献   

8.
本文提出了一种基于升序FP-tree的频繁模式挖掘算法,该算法按照支持度升序构造升序FP-tree,并通过在其中搜索扩展频繁集及归并子树来挖据频繁模式。实验表明,与FP-growth算法相比,算法的挖掘速度提高了将近2倍,此外新算法还具有比较好的伸缩性。  相似文献   

9.
ESPM--频繁子树挖掘算法   总被引:15,自引:2,他引:13  
随着互联网的发展,频繁模式的挖掘由频繁项集扩展到结构化数据:树和图.在这些结构上的挖掘工作被应用于更为复杂的领域,比如生物信息学、网络日志和XML文档.提出了一个新颖的算法:ESPM,以挖掘有序标号树中的频繁子树.不同于以往的工作,把树同构的判断工作放到了算法的晚期,从而减少了整个挖掘过程的时间开销.人工数据集和真实数据集上的实验都证明ESPM相较于其他算法的优越性.还提出了一些可能的改进.  相似文献   

10.
张炘  廖频  郭波 《计算机应用》2010,30(3):806-809
频繁闭项集挖掘是许多数据挖掘应用中的重要问题。为减少候选项集数量和降低支持度计算的开销,提出一种新的深度优先搜索频繁闭项集(DFFCI)的算法。将改进的压缩频繁模式树(CFP-Tree)表示的数据集信息投影到划分矩阵,使用二进制向量逻辑运算计算支持度,简化了计算过程,减少了时间开销;采用基于支持度预计算技术的全局2-项剪枝和局部扩展剪枝,有效削减了搜索空间。实验结果表明该算法的性能优于其他主流深度优先算法。  相似文献   

11.
Currently, there are large collections of drawings from which users can select the desired ones to insert in their documents. However, to locate a particular drawing among thousands is not easy. In our prior work we proposed an approach to index and retrieve vector drawings by content, using topological and geometric information automatically extracted from figures. In this paper, we present a new approach to enrich the topological information by integrating spatial proximity in the topology graph, through the use of weights in adjacency links. Additionally, we developed a web search engine for clip art drawings, where we included the new technique. Experimental evaluation reveals that the use of topological proximity results in better retrieval results than topology alone. However, the increase in precision was not as high as we expected. To understand why, we analyzed sketched queries performed by users in previous experimental sessions and we present here the achieved conclusions.  相似文献   

12.
The web is nowadays one of the main information sources, and information search is an important area in which many advances have been registered. One approach to improve web search results is to consider contextual information. Usually, information about context has been provided through user logs on previous searches or the monitoring of clicks on first results, but different approaches can be used in specific environments. In a web based learning environment, existing documents and exchanged messages could provide contextual information. So, the main goal of this work is to provide a contextual web search engine based on shared documents and messages posted in a social network used for collaborative learning. Contextual search is provided through query expansion using learning documents (material provided by the teacher) and discussion messages (posts, links and comments that result from the participants’ interactions). A prototype was implemented and used in a learning scenario to acquire the context in a learning community. The proposed approach makes the context acquisition faster and more dynamic as it considers an automatic approach over text processing of documents and discussions. In addition, the results of the query engine with and without the contextual information were compared and the proposed approach using contextual information showed improvements in the precision of the results.  相似文献   

13.
应用链接分析的web搜索结果聚类   总被引:3,自引:0,他引:3  
随着web上信息的急剧增长,如何有效地从web上获得高质量的web信息已经成为很多研究领域里的热门研究主题之一,比如在数据库,信息检索等领域。在信息检索里,web搜索引擎是最常用的工具,然而现今的搜索引擎还远不能达到满意的要求,使用链接分析,提出了一种新的方法用来聚类web搜索结果,不同于信息检索中基于文本之间共享关键字或词的聚类算法,该文的方法是应用文献引用和匹配分析的方法,基于两web页面所共享和匹配的公共链接,并且扩展了标准的K-means聚类算法,使它更适合于处理噪音页面,并把它应用于web结果页面的聚类,为验证它的有效性,进行了初步实验,实验结果显示通过链接分析对web搜索结果聚类取得了预期效果  相似文献   

14.
Association Rule Mining (ARM) can be considered as a combinatorial problem with the purpose of extracting the correlations between items in sizeable datasets. The numerous polynomial exact algorithms already proposed for ARM are unadapted for large databases and especially for those existing on the web. Assuming that datasets are a large space search, intelligent algorithms was used to found high quality rules and solve ARM issue. This paper deals with a cooperative multi-swarm bat algorithm for association rule mining. It is based on the bat-inspired algorithm adapted to rule discovering problem (BAT-ARM). This latter suffers from absence of communication between bats in the population which lessen the exploration of search space. However, it has a powerful rule generation process which leads to perfect local search. Therefore, to maintain a good trade-off between diversification and intensification, in our proposed approach, we introduce cooperative strategies between the swarms that already proved their efficiency in multi-swarm optimization algorithm(Ring, Master-slave). Furthermore, we innovate a new topology called Hybrid that merges Ring strategy with Master-slave plan previously developed in our earlier work [23]. A series of experiments are carried out on nine well known datasets in ARM field and the performance of proposed approach are evaluated and compared with those of other recently published methods. The results show a clear superiority of our proposal against its similar approaches in terms of time and rule quality. The analysis also shows a competitive outcomes in terms of quality in-face-of multi-objective optimization methods.  相似文献   

15.
In this paper a novel approach is proposed for generating the optimal ranked clicked URLs using genetic algorithm (GA) based on clustered web query sessions for effective personalized web search. Experimental study was conducted on the data set of web query sessions captured in the domains academics, entertainment and sports to test the effectiveness of clusterwise optimal ranked clicked URLs for personalized web search (PWS). The results, which are verified statistically shows an improvement in the average precision of the personalized web search based on optimal ranked clicked URLs over both Classic IR and personalized web search without optimal ranked clicked URLs. Thus the effectiveness of personalized web search using optimal ranked clicked URLs is confirmed for better customizing the web search according to the information need of the user.  相似文献   

16.
搜索引擎中的网络数据挖掘技术   总被引:4,自引:0,他引:4  
万维网包含大量的信息,而且随着其快速的增长而变得越来越复杂,这就导致了现在用户定位相关和高质量信息的搜索变得越来越难。将网络数据挖掘技术应用于搜索引擎将大大改善搜索引擎的搜索效率以及搜索质量。提出了具体的算法,并阐述了此算法在搜索引擎中的应用。  相似文献   

17.
Many network services which process a large quantity of data and knowledge are available in the distributed network environment, and provide applications to users based on Service-Oriented Architecture (SOA) and Web services technology. Therefore, a useful web service discovery approach for data and knowledge discovery process in the complex network environment is a very significant issue. Using the traditional keyword-based search method, users find it difficult to choose the best web services from those with similar functionalities. In addition, in an untrustworthy real world environment, the QoS-based service discovery approach cannot verify the correctness of the web services’ Quality of Service (QoS) values, since such values guaranteed by a service provider are different from the real ones. This work proposes a trustworthy two-phase web service discovery mechanism based on QoS and collaborative filtering, which discovers and recommends the needed web services effectively for users in the distributed environment, and also solves the problem of services with incorrect QoS information. In the experiment, the theoretical analysis and simulation experiment results show that the proposed method can accurately recommend the needed services to users, and improve the recommendation quality.  相似文献   

18.
19.
Traditional search engines have become the most useful tools to search the World Wide Web. Even though they are good for certain search tasks, they may be less effective for others, such as satisfying ambiguous or synonym queries. In this paper, we propose an algorithm that, with the help of Wikipedia and collaborative semantic annotations, improves the quality of web search engines in the ranking of returned results. Our work is supported by (1) the logs generated after query searching, (2) semantic annotations of queries and (3) semantic annotations of web pages. The algorithm makes use of this information to elaborate an appropriate ranking. To validate our approach we have implemented a system that can apply the algorithm to a particular search engine. Evaluation results show that the number of relevant web resources obtained after executing a query with the algorithm is higher than the one obtained without it.  相似文献   

20.
Currently, web spamming is a serious problem for search engines. It not only degrades the quality of search results by intentionally boosting undesirable web pages to users, but also causes the search engine to waste a significant amount of computational and storage resources in manipulating useless information. In this paper, we present a novel ensemble classifier for web spam detection which combines the clonal selection algorithm for feature selection and under-sampling for data balancing. This web spam detection system is called USCS. The USCS ensemble classifiers can automatically sample and select sub-classifiers. First, the system will convert the imbalanced training dataset into several balanced datasets using the under-sampling method. Second, the system will automatically select several optimal feature subsets for each sub-classifier using a customized clonal selection algorithm. Third, the system will build several C4.5 decision tree sub-classifiers from these balanced datasets based on its specified features. Finally, these sub-classifiers will be used to construct an ensemble decision tree classifier which will be applied to classify the examples in the testing data. Experiments on WEBSPAM-UK2006 dataset on the web spam problem show that our proposed approach, the USCS ensemble web spam classifier, contributes significant classification performance compared to several baseline systems and state-of-the-art approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号