首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 875 毫秒
1.
现有的资源描述框架(RDF)数据分布式并行推理算法大多需要启动多个MapReduce任务,但有些算法对于含有实例三元组前件的RDFS/OWL规则的推理效率低下,整体推理效率不高。针对此问题,文中提出结合Rete的RDF数据分布式并行推理算法(DRRM)。首先结合RDF数据本体,构建模式三元组列表和规则标记模型。在RDFS/OWL推理阶段,结合MapReduce实现Rete算法中的alpha阶段和beta阶段。然后对推理结果进行去重处理,完成一次RDFS/OWL全部规则推理。实验表明,文中算法能高效正确地实现大规模数据的并行推理。  相似文献   

2.
现有的RDF数据分布式并行推理算法大多需要启动多个MapReduce任务,有些算法对于含有多个实例三元组前件的OWL规则的推理效率低下,使其整体的推理效率不高.针对这些问题,文中提出结合TREAT的基于Spark的分布式并行推理算法(DPRS).该算法首先结合RDF数据本体,构建模式三元组对应的alpha寄存器和规则标记模型;在OWL推理阶段,结合MapReduce实现TREAT算法中的alpha阶段;然后对推理结果进行去重处理,完成一次OWL全部规则推理.实验表明DPRS算法能够高效正确地实现大规模数据的并行推理.  相似文献   

3.
袁柳  张龙波 《计算机科学》2015,42(10):266-270, 296
如何有效管理并利用日益庞大的RDF数据是当今Web数据管理领域面临的挑战之一。对大规模的RDF数据集进行聚类操作从而得到数据集的有效划分是RDF数据存储和应用时通常采取的策略。针对现有RDF聚类过程中忽略RDF三元组自身模式特征的问题,在对RDF聚类结果的形式深入分析的基础上,定义了3种不同类型的聚类模式,从而提出基于模式的聚类方法。通过对RDF数据集的重新描述,自动生成适用于RDF数据集特征的聚类模式,在此基础上实现数据聚类的任务。在不同测试集上的实验结果验证了所提方法的正确性和有效性。  相似文献   

4.
The Linked Hypernyms Dataset (LHD) provides entities described by Dutch, English and German Wikipedia articles with types in the DBpedia namespace. The types are extracted from the first sentences of Wikipedia articles using Hearst pattern matching over part-of-speech annotated text and disambiguated to DBpedia concepts. The dataset covers 1.3 million RDF type triples from English Wikipedia, out of which 1 million RDF type triples were found not to overlap with DBpedia, and 0.4 million with YAGO2s. There are about 770 thousand German and 650 thousand Dutch Wikipedia entities assigned a novel type, which exceeds the number of entities in the localized DBpedia for the respective language. RDF type triples from the German dataset have been incorporated to the German DBpedia. Quality assessment was performed altogether based on 16.500 human ratings and annotations. For the English dataset, the average accuracy is 0.86, for German 0.77 and for Dutch 0.88. The accuracy of raw plain text hypernyms exceeds 0.90 for all languages. The LHD release described and evaluated in this article targets DBpedia 3.8, LHD version for the DBpedia 3.9 containing approximately 4.5 million RDF type triples is also available.  相似文献   

5.
6.
7.
8.
In real-world applications, transactions usually consist of quantitative values. Many fuzzy data mining approaches have thus been proposed for finding fuzzy association rules with the predefined minimum support from the give quantitative transactions. However, the common problems of those approaches are that an appropriate minimum support is hard to set, and the derived rules usually expose common-sense knowledge which may not be interesting in business point of view. In this paper, an algorithm for mining fuzzy coherent rules is proposed for overcoming those problems with the properties of propositional logic. It first transforms quantitative transactions into fuzzy sets. Then, those generated fuzzy sets are collected to generate candidate fuzzy coherent rules. Finally, contingency tables are calculated and used for checking those candidate fuzzy coherent rules satisfy the four criteria or not. If yes, it is a fuzzy coherent rule. Experiments on the foodmart dataset are also made to show the effectiveness of the proposed algorithm.  相似文献   

9.
Discovery of unapparent association rules based on extracted probability   总被引:1,自引:0,他引:1  
Association rule mining is an important task in data mining. However, not all of the generated rules are interesting, and some unapparent rules may be ignored. We have introduced an “extracted probability” measure in this article. Using this measure, 3 models are presented to modify the confidence of rules. An efficient method based on the support-confidence framework is then developed to generate rules of interest. The adult dataset from the UCI machine learning repository and a database of occupational accidents are analyzed in this article. The analysis reveals that the proposed methods can effectively generate interesting rules from a variety of association rules.  相似文献   

10.
随着语义Web的发展,越来越多的RDF数据发布到Web上,需要一个可以提供存储和查询功能的数据管理系统来对海量的RDF数据进行管理。针对上述问题,设计并实现了一种大规模RDF语义数据的分布式存储方案。该方案通过RDF数据装载和预处理,可以有效地管理海量的RDF数据,并通过构建索引可以有效地对大规模RDF数据进行查询。工作包括底层的RDF存储方案的设计与实现,数据的预处理与装载。同时,设计了一系列实验来评估和对比不同节点数目的Cassandra集群之间的性能,数据采用的是从DBpedia获得的13 million行RDF的数据集。实验结果显示,方案对大规模RDF语义数据的存储和查询具有性能优势。  相似文献   

11.
针对关联数据集合呈现出的大数据特性和蕴含的语义信息,提出了首先建立关联数据集的模式级链接,再进行关联规则挖掘的方法。在同领域RDF数据集上定义RDF数据项模式并提出数据项模式的产生规则;利用RDF数据查询技术从数据项模式获得RDF数据项集合,进而再推导出特定领域内的关联规则。提出的基于关联数据RDF数据项模式的关联规则挖掘方法将关联规则挖掘扩展到同一领域内的数据集合而不再局限于单一数据集,同时给出了基于Hadoop的大规模RDF数据集上的关联规则挖掘的实现方案。实验结果验证了模式级链接对于关联规则挖掘的价值和所提方法的有效性。  相似文献   

12.

Association rules mining is a popular data mining modeling tool. It discovers interesting associations or correlation relationships among a large set of data items, showing attribute values that occur frequently together in a given dataset. Despite their great potential benefit, current association rules modeling tools are far from optimal. This article studies how visualization techniques can be applied to facilitate the association rules modeling process, particularly what visualization elements should be incorporated and how they can be displayed. Original designs for visualization of rules, integration of data and rule visualizations, and visualization of rule derivation process for supporting interactive visual association rules modeling are proposed in this research. Experimental results indicated that, compared to an automatic association rules modeling process, the proposed interactive visual association rules modeling can significantly improve the effectiveness of modeling, enhance understanding of the applied algorithm, and bring users greater satisfaction with the task. The proposed integration of data and rule visualizations can significantly facilitate understanding rules compared to their nonintegrated counterpart.  相似文献   

13.
Wang  Ling  Gui  Lingpeng  Zhu  Hui 《Applied Intelligence》2022,52(2):1389-1405

Traditional temporal association rules mining algorithms cannot dynamically update the temporal association rules within the valid time interval with increasing data. In this paper, a new algorithm called incremental fuzzy temporal association rule mining using fuzzy grid table (IFTARMFGT) is proposed by combining the advantages of boolean matrix with incremental mining. First, multivariate time series data are transformed into discrete fuzzy values that contain the time intervals and fuzzy membership. Second, in order to improve the mining efficiency, the concept of boolean matrices was introduced into the fuzzy membership to generate a fuzzy grid table to mine the frequent itemsets. Finally, in view of the Fast UPdate (FUP) algorithm, fuzzy temporal association rules are incrementally mined and updated without repeatedly scanning the original database by considering the lifespan of each item and inheriting the information from previous mining results. The experiments show that our algorithm provides better efficiency and interpretability in mining temporal association rules than other algorithms.

  相似文献   

14.
Classification plays an important role in decision support systems. A lot of methods for mining classification rules have been developed in recent years, such as C4.5 and ILA. These methods are, however, based on heuristics and greedy approaches to generate rule sets that are either too general or too overfitting for a given dataset. They thus often yield high error ratios. Recently, a new method for classification from data mining, called the Classification Based on Associations (CBA), has been proposed for mining class-association rules (CARs). This method has more advantages than the heuristic and greedy methods in that the former could easily remove noise, and the accuracy is thus higher. It can additionally generate a rule set that is more complete than C4.5 and ILA. One of the weaknesses of mining CARs is that it consumes more time than C4.5 and ILA because it has to check its generated rule with the set of the other rules. We thus propose an efficient pruning approach to build a classifier quickly. Firstly, we design a lattice structure and propose an algorithm for fast mining CARs using this lattice. Secondly, we develop some theorems and propose an algorithm for pruning redundant rules quickly based on these theorems. Experimental results also show that the proposed approach is more efficient than those used previously.  相似文献   

15.
Association rule mining has contributed to many advances in the area of knowledge discovery. However, the quality of the discovered association rules is a big concern and has drawn more and more attention recently. One problem with the quality of the discovered association rules is the huge size of the extracted rule set. Often for a dataset, a huge number of rules can be extracted, but many of them can be redundant to other rules and thus useless in practice. Mining non-redundant rules is a promising approach to solve this problem. In this paper, we first propose a definition for redundancy, then propose a concise representation, called a Reliable basis, for representing non-redundant association rules. The Reliable basis contains a set of non-redundant rules which are derived using frequent closed itemsets and their generators instead of using frequent itemsets that are usually used by traditional association rule mining approaches. An important contribution of this paper is that we propose to use the certainty factor as the criterion to measure the strength of the discovered association rules. Using this criterion, we can ensure the elimination of as many redundant rules as possible without reducing the inference capacity of the remaining extracted non-redundant rules. We prove that the redundancy elimination, based on the proposed Reliable basis, does not reduce the strength of belief in the extracted rules. We also prove that all association rules, their supports and confidences, can be retrieved from the Reliable basis without accessing the dataset. Therefore the Reliable basis is a lossless representation of association rules. Experimental results show that the proposed Reliable basis can significantly reduce the number of extracted rules. We also conduct experiments on the application of association rules to the area of product recommendation. The experimental results show that the non-redundant association rules extracted using the proposed method retain the same inference capacity as the entire rule set. This result indicates that using non-redundant rules only is sufficient to solve real problems needless using the entire rule set.  相似文献   

16.
为解决本体异构、实现不同本体应用程序间互操作以及数据集成,提出一种基于RDF图的改进相似度传播匹配算法。首先通过WordNet发现初始相似对种子,经过预处理把本体表示成RDF三元组形式,针对RDF图的特点,将相似度传播的条件扩展到三元组中,发现可能相似对;然后采用综合元素特征的方法计算相似度。相似度传播、发现可能相似对种子、相似度计算是一个循环迭代的过程,直到满足收敛条件。实验表明了该算法的有效性,并在时间性能上也有所提高。  相似文献   

17.
Classification With Ant Colony Optimization   总被引:2,自引:0,他引:2  
Ant colony optimization (ACO) can be applied to the data mining field to extract rule-based classifiers. The aim of this paper is twofold. On the one hand, we provide an overview of previous ant-based approaches to the classification task and compare them with state-of-the-art classification techniques, such as C4.5, RIPPER, and support vector machines in a benchmark study. On the other hand, a new ant-based classification technique is proposed, named AntMiner+. The key differences between the proposed AntMiner+ and previous AntMiner versions are the usage of the better performing MAX-MIN ant system, a clearly defined and augmented environment for the ants to walk through, with the inclusion of the class variable to handle multiclass problems, and the ability to include interval rules in the rule list. Furthermore, the commonly encountered problem in ACO of setting system parameters is dealt with in an automated, dynamic manner. Our benchmarking experiments show an AntMiner+ accuracy that is superior to that obtained by the other AntMiner versions, and competitive or better than the results achieved by the compared classification techniques.  相似文献   

18.
传统的规则挖掘算法通常先约简属性再约简属性值. 该方法存在冗余计算, 当样本集增大时, 复杂性急剧增加. 对此提出一种基于粒计算的最简决策规则挖掘算法. 首先, 在不同粒度空间下计算条件粒与决策粒之间的粒关系矩阵; 然后, 将粒关系矩阵中隐含的信息H 1、H 2 作为启发式算子, 按信息粒约简属性值; 最后, 去除冗余属性并设置终止条件, 实现决策规则的快速挖掘. 理论分析和实验结果表明, 所提出的算法可以获得更简洁的规则, 且规则的泛化能力更强.  相似文献   

19.
20.
Efficient Adaptive-Support Association Rule Mining for Recommender Systems   总被引:25,自引:0,他引:25  
Collaborative recommender systems allow personalization for e-commerce by exploiting similarities and dissimilarities among customers' preferences. We investigate the use of association rule mining as an underlying technology for collaborative recommender systems. Association rules have been used with success in other domains. However, most currently existing association rule mining algorithms were designed with market basket analysis in mind. Such algorithms are inefficient for collaborative recommendation because they mine many rules that are not relevant to a given user. Also, it is necessary to specify the minimum support of the mined rules in advance, often leading to either too many or too few rules; this negatively impacts the performance of the overall system. We describe a collaborative recommendation technique based on a new algorithm specifically designed to mine association rules for this purpose. Our algorithm does not require the minimum support to be specified in advance. Rather, a target range is given for the number of rules, and the algorithm adjusts the minimum support for each user in order to obtain a ruleset whose size is in the desired range. Rules are mined for a specific target user, reducing the time required for the mining process. We employ associations between users as well as associations between items in making recommendations. Experimental evaluation of a system based on our algorithm reveals performance that is significantly better than that of traditional correlation-based approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号