Fuzzy utility mining has been an emerging research issue because of its simplicity and comprehensibility. Different from traditional fuzzy data mining, fuzzy utility mining considers not only quantities of items in transactions but also their profits for deriving high fuzzy utility itemsets. In this paper, we introduce a new fuzzy utility measure with the fuzzy minimum operator to evaluate the fuzzy utilities of itemsets. Besides, an effective fuzzy utility upper-bound model based on the proposed measure is designed to provide the downward-closure property in fuzzy sets, thus reducing the search space of finding high fuzzy utility itemsets. A two-phase fuzzy utility mining algorithm, named TPFU, is also proposed and described for solving the problem of fuzzy utility mining. At last, the experimental results on both synthetic and real datasets show that the proposed algorithm has good performance.  相似文献   

Recently, high utility sequential pattern mining has been an emerging popular issue due to the consideration of quantities, profits and time orders of items. The utilities of subsequences in sequences in the existing approach are difficult to be calculated due to the three kinds of utility calculations. To simplify the utility calculation, this work then presents a maximum utility measure, which is derived from the principle of traditional sequential pattern mining that the count of a subsequence in the sequence is only regarded as one. Hence, the maximum measure is properly used to simplify the utility calculation for subsequences in mining. Meanwhile, an effective upper-bound model is designed to avoid information losing in mining, and also an effective projection-based pruning strategy is designed as well to cause more accurate sequence-utility upper-bounds of subsequences. The indexing strategy is also developed to quickly find the relevant sequences for prefixes in mining, and thus unnecessary search time can be reduced. Finally, the experimental results on several datasets show the proposed approach has good performance in both pruning effectiveness and execution efficiency.  相似文献   

Truong  Tin  Duong  Hai  Le  Bac  Fournier-Viger  Philippe  Yun  Unil 《Applied Intelligence》2022,52(6):6106-6128
Applied Intelligence - High utility sequence mining is a popular data mining task, which aims at finding sequences having a high utility (importance) in a quantitative sequence database. Though it...  相似文献   

On-shelf utility mining has recently received interest in the data mining field due to its practical considerations. On-shelf utility mining considers not only profits and quantities of items in transactions but also their on-shelf time periods in stores. Profit values of items in traditional on-shelf utility mining are considered as being positive. However, in real-world applications, items may be associated with negative profit values. This paper proposes an efficient three-scan mining approach to efficiently find high on-shelf utility itemsets with negative profit values from temporal databases. In particular, an effective itemset generation method is developed to avoid generating a large number of redundant candidates and to effectively reduce the number of data scans in mining. Experimental results for several synthetic and real datasets show that the proposed approach has good performance in pruning effectiveness and execution efficiency.  相似文献   

针对多最小效用阈值高效用项集挖掘算法(MHUI)中出现的重复计算、挖掘的结果项集不是频繁的问题,提出两个新的快速挖掘算法FMHUI和SFMHUI。FMHUI算法在计算项集的最小效用阈值时利用前一次计算结果,避免了项之间的重复比较;另外定义了项的扩展项的最小效用阈值表EMMU-table快速计算出扩展项的最小效用阈值,提高了运行效率。SFMHUI算法在FMHUI的基础上增加了支持度约束,使挖掘的项集既是高效用的也是频繁的。通过仿真实验验证了所提出算法的高效性和可行性。  相似文献   

High utility pattern mining has been studied as an essential topic in the field of pattern mining in order to satisfy requirements of many real-world applications that need to process non-binary databases including item importance such as market analysis. In this paper, we propose an efficient algorithm with a novel indexed list-based data structure for mining high utility patterns. Previous approaches first generate an enormous number of candidate patterns on the basis of overestimation methods in their mining processes and then identify actual high utility patterns from the candidates through an additional database scan, which leads to high computational overheads. Although several list-based algorithms to discover high utility patterns without candidate generation have been suggested in recent years, they require a large number of comparison operations. Our method facilitates efficient mining of high utility patterns with the proposed indexed list by effectively reducing the total number of such operations. Moreover, we develop two techniques based on this novel data structure to more enhance mining performance of the proposed method. Experimental results on real and synthetic datasets show that the proposed algorithm mines high utility patterns more efficiently than the state-of-the-art algorithms.  相似文献   

含负项高效用项集(HUI)挖掘是新兴的数据挖掘任务之一.为了挖掘满足用户需求的含负项HUI结果集,提出了含负项top-k高效用项集(THN)挖掘算法.为了提升THN算法的时空性能,提出了自动提升最小效用阈值的策略,并采用模式增长方法进行深度优先搜索;使用重新定义的子树效用和重新定义的本地效用修剪搜索空间;使用事务合并技...  相似文献   

针对序列模式的高效用模式挖掘过程中搜索空间大、计算复杂度高的问题,提出一种基于多效用阈值的分布式高效用序列模式挖掘算法。采用数组结构保存模式的效用信息,解决效用矩阵导致的内存消耗大的缺点。设计1-项集与2-项集的深度剪枝策略,深入地缩小候选模式的搜索空间,减少搜索时间成本与缓存成本。提出挖掘算法的分布式实现方案,通过并行处理进一步降低模式挖掘的时间。基于中等规模与大规模的序列数据集分别进行实验,实验结果表明,该算法有效减少了候选模式的数量,降低了挖掘的时间成本与存储成本,对于大数据集表现出较好的可扩展能力与稳定性。  相似文献   

A new definition is given for the average growth of a functionf: * N with respect to a probability measure on * This allows us to define meaningful average distributional complexity classes for arbitrary time bounds (previously, one could not guarantee arbitrary good precision). It is shown that, basically, only the ranking of the inputs by decreasing probabilities is of importance.To compare the average and worst case complexity of problems, we study average complexity classes defined by a time bound and a bound on the complexity of possible distributions. Here, the complexity is measured by the time to compute the rank functions of the distributions. We obtain tight and optimal separation results between these average classes. Also, the worst case classes can be embedded into this hierarchy. They are shown to be identical to average classes with respect to distributions of exponential complexity.  相似文献   

Digital images are normally taken by focusing on an object, resulting in defocused background regions. A popular approach to produce an all-in-focus image without defocused regions is to capture several input images at varying focus settings, and then fuse them into an image using offline image processing software. This paper describes an all-in-focus imaging method that can operate on digital cameras. The proposed method consists of an automatic focus-bracketing algorithm that determines at which focuses to capture images and an image-fusion algorithm that computes a high-quality all-in-focus image. While most previous methods use the focus measure calculated independently for each input image, the proposed method calculates the relative focus measure between a pair of input images. We note that a well-focused region in an image shows better contrast, sharpness, and details than the corresponding region that is defocused in another image. Based on the observation that the average filtered version of a well-focused region in an image shows a higher correlation to the corresponding defocused region in another image than the original well-focused version, a new focus measure is proposed. Experimental results of various sample image sequences show the superiority of the proposed measure in terms of both objective and subjective evaluation and the proposed method allows the user to capture all-in-focus images directly on their digital camera without using offline image processing software.  相似文献   

High utility itemset mining problem uses the notion of utilities to discover interesting and actionable patterns. Several data structures and heuristic methods have been proposed in the literature to efficiently mine high utility itemsets. This paper advances the state-of-the-art and presents HMiner, a high utility itemset mining method. HMiner utilizes a few novel ideas and presents a compact utility list and virtual hyperlink data structure for storing itemset information. It also makes use of several pruning strategies for efficiently mining high utility itemsets. The proposed ideas were evaluated on a set of benchmark sparse and dense datasets. The execution time improvements ranged from a modest thirty percent to three orders of magnitude across several benchmark datasets. The memory consumption requirements also showed up to an order of magnitude improvement over the state-of-the-art methods. In general, HMiner was found to work well in the dense regions of both sparse and dense benchmark datasets.  相似文献   

空间并置(co-location)模式是指其实例在空间邻域内频繁共现的空间特征集的子集。现有的空间co-location模式挖掘的有趣性度量指标,没有充分地考虑特征之间以及同一特征的不同实例之间的差异;另外,传统的基于数据驱动的空间co-location模式挖掘方法的结果常常包含大量无用或是用户不感兴趣的知识。针对上述问题,提出一种更为一般的研究对象--带效用值的空间实例,并定义了新的效用参与度(UPI)作为高效用co-location模式的有趣性度量指标;将领域知识形式化为三种语义规则并应用于挖掘过程中,提出一种领域驱动的多次迭代挖掘框架;最后通过大量实验对比分析不同有趣性度量指标下的挖掘结果在效用占比和频繁性两方面的差异,以及引入基于领域知识的语义规则前后挖掘结果的变化情况。实验结果表明所提出的UPI度量是一种兼顾频繁和效用的更为合理的度量指标;同时,领域驱动的挖掘方法能有效地挖掘到用户真正感兴趣的模式。  相似文献   

Techniques for mining rare patterns have been researched in the association rule mining area because traditional frequent pattern mining methods have to generate a large amount of unnecessary patterns in order to find rare patterns from large databases. One such technique, the multiple minimum support threshold framework was devised to extract rare patterns by using a different minimum item support threshold for each item in a database. Nevertheless, this framework cannot sufficiently reflect environments of the real world. The reason is that it does not consider weights of items, such as market prices of products and fatality rates of diseases, in its mining process. Therefore, an algorithm has been proposed to mine rare patterns with utilities exceeding a user-specified minimum utility by considering rarity and utility information of items. However, since this algorithm employs the concept of traditional high utility pattern mining, patterns’ lengths are not considered for determining utilities of the patterns. If the length of a pattern is sufficiently long, the pattern is more likely to have an enough utility to become a high utility pattern regardless of item utilities within the pattern. Therefore, the algorithm cannot guarantee that all items in a mined pattern have high utilities. In this paper, we propose a novel algorithm that effectively reduces such dependency of patterns on their lengths by considering their lengths in the mining process in order to mine more meaningful rare patterns compared to patterns mined by previous algorithms. Experimental results demonstrate that our algorithm extracts a lesser number of more meaningful patterns and consumes less computational resources compared to state-of-the-art algorithms.  相似文献   

Processing changeable data streams in real time is one of the most important issues in the data mining field due to its broad applications such as retail market analysis, wireless sensor networks, and stock market prediction. In addition, it is an interesting and challenging problem to deal with the stream data since not only the data have unbounded, continuous, and high speed characteristics but also their environments have limited resources. High utility pattern mining, meanwhile, is one of the essential research topics in pattern mining to overcome major drawbacks of the traditional framework for frequent pattern mining that takes only binary databases and identical item importance into consideration. This approach conducts mining processes by reflecting characteristics of real world databases, non-binary quantities and relative importance of items. Although relevant algorithms were proposed for finding high utility patterns in stream environments, they suffer from a level-wise candidate generation-and-test and a large number of candidates by their overestimation techniques. As a result, they consume a huge amount of execution time, which is a significant performance issue since a rapid process is necessary in stream data analysis. In this paper, we propose an algorithm for mining high utility patterns from resource-limited environments through efficient processing of data streams in order to solve the problems of the overestimation-based methods. To improve mining performance with fewer candidates and search space than the previous ones, we develop two techniques for reducing overestimated utilities. Moreover, we suggest a tree-based data structure to maintain information of stream data and high utility patterns. The proposed tree is restructured by our updating method with decreased overestimation utilities to keep up-to-date stream information whenever the current window slides. Our approach also has an important effect on expert and intelligent systems in that it can provide users with more meaningful information than traditional analysis methods by reflecting the characteristics of real world non-binary databases in stream environments and emphasizing on recent data. Comprehensive experimental results show that our algorithm outperforms the existing sliding window-based one in terms of runtime efficiency and scalability.  相似文献   

大数据环境下高效用项集挖掘算法中过多的候选项集极大地降低了算法的时空效率,提出了一种减少候选项集的数据流高效用项集挖掘算法。首先,通过数据流中当前窗口的一次扫描建立一个全局树,并降低全局树中头表入口与节点的冗余效用值;然后,基于全局树生成候选模式,基于增长算法降低局部树的候选项集效用;最终,从候选模式中选出高效用模式。基于真实数据流的实验结果表明,本算法的时空效率与内存占用比均优于其他数据流的高效用模式挖掘算法。  相似文献   

张妮  韩萌  王乐  李小娟  程浩东 《计算机应用》2022,42(4):999-1010
高效用模式挖掘(HUPM)是新兴的数据科学研究内容之一,通过考虑事务数据库中项的单位利润和数量,以提取出更有用的信息。传统的HUPM方法假定所有项的效用值均为正,但是在实际应用中,某些数据项的效用值可能为负(如商品因产生亏损而导致利润值为负),含负项的模式挖掘与仅含正项的模式挖掘同样重要。首先,阐述了HUPM的相关概念,并分别给出相应正负效用的实例;然后,以正与负角度划分了HUPM方法,其中带有正效用的模式挖掘方法进一步以动态与静态的数据库新颖角度划分,带有负效用的模式挖掘方法中包括了基于先验、基于树、基于效用列表和基于数组等关键技术,并从不同方面对这些方法进行了讨论和总结;最后,给出了现有HUPM方法的不足和下一步研究方向。  相似文献   

Effective data mining using neural networks   总被引:4,自引:0,他引:4  
Classification is one of the data mining problems receiving great attention recently in the database community. The paper presents an approach to discover symbolic classification rules using neural networks. Neural networks have not been thought suited for data mining because how the classifications were made is not explicitly stated as symbolic rules that are suitable for verification or interpretation by humans. With the proposed approach, concise symbolic rules with high accuracy can be extracted from a neural network. The network is first trained to achieve the required accuracy rate. Redundant connections of the network are then removed by a network pruning algorithm. The activation values of the hidden units in the network are analyzed, and classification rules are generated using the result of this analysis. The effectiveness of the proposed approach is clearly demonstrated by the experimental results on a set of standard data mining test problems  相似文献   

