首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
分档布鲁姆过滤器的查询算法   总被引:8,自引:0,他引:8  
布鲁姆过滤器是一种能够简洁地表示集合并支持集合查询的数据结构,广泛应用于数据库、网络和分布式系统中.针对现有的布鲁姆过滤器没有考虑查询失效代价这一缺陷,文中提出一种新的代价敏感的分档布鲁姆过滤器查询算法.它将元素根据不同的查询代价分为不同的子集,通过考查每档子集最低查询失效率的关系,建立由每档子集合最低查询失效假阳性概率表示的集合最低查询失效总代价目标函数,使用类目标函数梯度遗传算法获得每档的最优Hash函数个数ki,完成集合到向量的映射与查找.仿真实验结果表明,使用新结构的查询算法和标准布鲁姆过滤器算法相比,所用的查询计算时间基本相同,因为区分对待集合元素,查询失效总代价仅为标准算法的27%.  相似文献   

2.
随着网络的发展,越来越多的场景需要在不完整数据下进行近似成员查询,传统成员查询的布鲁姆过滤器不能满足上述要求。提出面向缺失数据的布鲁姆近似查询算法,先对高维不完整数据的缺失部分进行预填充,通过PCA算法,将高维数据转换到低维数据,使用局部敏感哈希函数与标准哈希函数结合的方式将低维数据存储到布鲁姆过滤器中。使用两个真实数据集验证了所提算法的功能,所提面向缺失数据的布鲁姆近似查询算法,能有效地解决存在缺失数据的近似成员查询问题。  相似文献   

3.
文中探讨计数布鲁姆过滤器的代数运算和集合运算的一致性关系,研究使用计数布鲁姆过滤器代数运算进行集合成员查询的性能.理论分析和实验结果表明,计数布鲁姆过滤器的并、交、补、减、异或运算产生的新过滤器依然保持计数布鲁姆过滤器的特征,支持元素的删除操作,不会出现假阴性,能用于集合并集、交集、补集、差集及对称差的成员查询;当使用两个原始的计数布鲁姆过滤器查询补集、差集及对称差元素时,会存在部分本来属于补集、差集或对称差的元素被判为不属于补集、差集或对称差的问题,而使用计数布鲁姆过滤器代数运算后的过滤器进行补集、差集及对称差成员查询,则不存在上述问题,空间效率能提高一倍,时间效率亦能显著地得到改善.计数布鲁姆过滤器代数运算的使用有利于进一步扩展计数布鲁姆过滤器的应用范围.譬如计数布鲁姆过滤器减运算可用作一种新的集合调和方法,用于分布式系统中大型文件的分发.  相似文献   

4.
探讨双布鲁姆过滤器查询法查询集合并集、交集、补集、差集或对称差成员的性能问题。理论分析和实验结果表明,双布鲁姆过滤器查询法能够较好地支持集合并集、交集、补集、差集及对称差的成员查询问题,其中双布鲁姆过滤器并集及交集查询不会产生假阴性,仅有少量假阳性的存在,而双布鲁姆过滤器补集、差集及对称差查询则除存在少量假阳性外,还存在少量假阴性。  相似文献   

5.
针对文件级单布鲁姆过滤器排重算法只能以文件为单位进行数据排重,数据块级单布鲁姆过滤器排重算法耗时过多的缺点,采用2个布鲁姆过滤器,创建文件级和数据块级2级数据排重的算法结构。实验结果表明,双布鲁姆过滤器排重算法可以以数据块为单位对数据排重,在保持低假阳性误判率的同时,相比数据块级单布鲁姆过滤器排重算法耗时缩短了43%~68%。  相似文献   

6.
布鲁姆过滤器(Bloom filter)对数据集合采用一个位串表示并能有效支持元素的哈希查找,是一种精简的信息表示方案,广泛应用于数据库、网络和分布式系统中.本文研究布鲁姆过滤器的序列分析方法,通过定义布鲁姆过滤器距离,用概率统计方法分析动态数据集合元素增加和删除的变化对布鲁姆过滤器的影响,提出了基于计数式布鲁姆过滤器距离的集合变动定量评估算法.理论分析和仿真实验表明,该评估算法评估准确率高达90%以上.  相似文献   

7.
针对目前重复数据处理技术的低效性和不可靠性,本文提出了一种基于MD5算法和布鲁姆过滤器的重复数据删除算法。新算法采用两级布鲁姆过滤器并有效结合MDS算法的方式,在发挥布鲁姆过滤器空间效率的同时汲取了MD5算法的可靠性,使得文件级别和数据块级别的重复数据删除策略交替工作。测试分析表明,新算法性能稳定并且实现了高效且可靠的重复数据删除功能。  相似文献   

8.
静态取证时效性不足,动态取证则可获得更为真实、实时的证据。动态取证最关键的是证据识别。证据识别本质上是对网络数据流进行分类,以判断其行为是合法还是非法。布鲁姆过滤器采用一个位串表示数据集合并有效支持元素的哈希查询,从而被广泛应用于包分类算法中。首先对标准布鲁姆过滤器算法进行分析,并对算法存在的缺陷进行改进。在此基础上,设计一个基于布鲁姆过滤器的分布式计算机动态取证系统,并采取安全有效措施对证据进行传送。实验表明:该系统能对网络攻击行为进行动态地检测,且检测准确率高、误判率低。  相似文献   

9.
在分布式系统中,覆盖查询对于保持文件的完整性以及数据的一致性有重要作用。虽然布鲁姆过滤器可以支持快速的元素从属查询,但是布鲁姆过滤器只能存储和表示离散的数据集合。为此,用前缀集合表示范围规则,并提出一个前缀编码的转化函数,将每一个前缀码转化为唯一对应的二进制串。为了支持覆盖查询,将计数布鲁姆过滤器与一组链表相结合,设计一个BFrange系统来存储包含规则标识以及具体存储元素的二元组。通过BFrange进行覆盖查询,使查询时间与存储的规则个数无关,复杂度仅为O(1)。仿真实验结果验证了BFrange能实现高效和准确的覆盖查询。  相似文献   

10.
一种基于折半层次搜索的包分类算法   总被引:3,自引:2,他引:1  
潘登  张大方  谢鲲  张继 《计算机应用》2009,29(2):500-502
折半层次搜索(BSOL)算法是一种高效的包分类算法,容易拓展至多维包分类,并支持range类型的规则。但由于其核心结构是在特里树(Trie)的每一层创建hash表,因此当hash装载因子较大或hash冲突较大时,会影响其效率。分析折半层次搜索算法的优缺点,引入布鲁姆过滤器,提出了一种新的改进算法,为Trie树的每一层建立了一个布鲁姆过滤器,在进行hash查找之前先进行一次布鲁姆查询运算,能够在hash冲突较大的情况下依然具有良好的性能。仿真实验结果表明,在数据包的命中率低于90%并且hash装载因子较大的情况下,新算法在运行时间上要优于以前的算法。  相似文献   

11.
One widely used mechanism for representing membership of a set of items is the simple space-efficient randomized data structure known as Bloom filters. Yet, Bloom filters are not entirely suitable for many new network applications that support network services like the representation and querying of items that have multiple attributes as opposed to a single attribute. In this paper, we present an approach to the accurate and efficient representation and querying of multiattribute items using Bloom filters. The approach proposes three variant structures of Bloom filters: Parallel Bloom Filter (referred as PBF) structure, PBF with a hash table (PBF-HT), and PBF with a Bloom filter (PBF-BF). PBF stores multiple attributes of an item in parallel Bloom filters. The auxiliary HT and BF provide functions to capture the inherent dependency of all attributes of an item. Compared to standard Bloom filters to represent items with multiple attributes, the proposed PBF facilitates much faster query service and both PBF-HT and PBF-BF structures achieve much lower false positive probability with a result to save storage space. Simulation and experimental results demonstrate that the new space-efficient Bloom filter structures can efficiently and accurately represent multiattribute items and quickly respond queries at the cost of a relatively small false positive probability.  相似文献   

12.
Synchronization between two sets is an important requirement for many distributed applications. A basic prerequisite is to find out which elements of set A are not in set B and vice versa. A very space efficient data structure for such membership queries that has been used a lot in networking applications is the Bloom filter. Unfortunately, the Bloom filter owes its high efficiency to the fact that there is a chance of false positives when querying the filter. This precludes the adoption of Bloom filters in applications that cannot tolerate such errors. In this paper we present an approach that augments Bloom filters with a trie-based mechanism that deterministically and efficiently finds the false positives after using the Bloom filter to synchronize two sets. We show that the added communication overhead for our approach is negligible compared to the overhead of a plain Bloom filter.  相似文献   

13.
Bloom filter (BF) is a space-efficient data structure that represents a large set of items and supports efficient membership queries. It has been widely proposed to employ Bloom filters in the routing entries so as to facilitate data-centric routing in network applications. The existing designs of Bloom filters, however, cannot effectively support in-network queries. Given a query for a data item at a node in the network, the noise in unrelated routing entries very likely equals to the useful information of the item in the right routing entries. Consequently, the majority of queries are routed towards many wrong nodes besides those destinations, wasting large quantities of network traffic. To address this issue, we classified the existing designs as CUBF (Cumulative Bloom filters) and ABF (Aggregated Bloom filters), and then evaluate their performance in routing queries under the noisy environments. Based on the evaluation results, we propose a receiver-oriented design of Bloom filters to sufficiently restrict the probability of a wrong routing decision. Moreover, we significantly decrease the delay of a routing decision in the case of CUBF by using the bit slice approach, and reduce the transmission size of each BF in the case of ABF by using the compression approach. Both the theoretical analysis and experimental results demonstrate that our receiver-oriented design of Bloom filters apparently outperforms the existing approaches in terms of the success probability of routing and network traffic cost.  相似文献   

14.
A Bloom filter is a simple but powerful data structure that can check membership to a static set. As Bloom filters become more popular for network applications, a membership query for a dynamic set is also required. Some network applications require high-speed processing of packets. For this purpose, Bloom filters should reside in a fast and small memory, SRAM. In this case, due to the limited memory size, stale data in the Bloom filter should be deleted to make space for new data. Namely the Bloom filter needs aging like LRU caching. In this paper, we propose a new aging scheme for Bloom filters. The proposed scheme utilizes the memory space more efficiently than double buffering, the current state of the art. We prove theoretically that the proposed scheme outperforms double buffering. We also perform experiments on real Internet traces to verify the effectiveness of the proposed scheme.  相似文献   

15.
On the false-positive rate of Bloom filters   总被引:1,自引:0,他引:1  
Bloom filters are a randomized data structure for membership queries dating back to 1970. Bloom filters sometimes give erroneous answers to queries, called false positives. Bloom analyzed the probability of such erroneous answers, called the false-positive rate, and Bloom's analysis has appeared in many publications throughout the years. We show that Bloom's analysis is incorrect and give a correct analysis.  相似文献   

16.
Invertible Bloom Lookup Tables (IBLTs) have been recently introduced as an extension of traditional Bloom filters. IBLTs store key-value pairs. Unlike traditional Bloom filters, IBLTs support both a lookup operation (given a key, return a value) and an operation that lists out all the key-value pairs stored. One issue with IBLTs is that there is a probability that a lookup operation will return “not found” for a key. In this paper, a technique to reduce this probability without affecting the storage requirement and only moderately increasing the search time is presented and evaluated. The results show that it can significantly reduce the probability of not returning a value that is actually stored in the IBLT. The overhead of the modified search procedure, compared to the standard IBLT search procedure, is small and has little impact on the average search time.  相似文献   

17.
Where distributed agents must share voluminous set membership information, Bloom filters provide a compact, though lossy, way for them to do so. Numerous recent networking papers have examined the trade-offs between the bandwidth consumed by the transmission of Bloom filters, and the error rate, which takes the form of false positives. This paper is about the retouched Bloom filter (RBF). An RBF is an extension that makes the Bloom filter more flexible by permitting the removal of false positives, at the expense of introducing false negatives, and that allows a controlled trade-off between the two. We analytically show that creating RBFs through a random process decreases the false positive rate in the same proportion as the false negative rate that is generated. We further provide some simple heuristics that decrease the false positive rate more than the corresponding increase in the false negative rate, when creating RBFs. These heuristics are more effective than the ones we have presented in prior work. We further demonstrate the advantages of an RBF over a Bloom filter in a distributed network topology measurement application. We finally discuss several networking applications that could benefit from RBFs instead of standard Bloom filters.  相似文献   

18.
19.
Membership determination of text strings has been an important procedure for analyzing textual data of a tremendous amount, especially when time is a crucial factor. Bloom filter has been a well-known approach for dealing with such a problem because of its succinct structure and simple determination procedure. As determination of membership with classification is becoming increasingly desirable, parallel Bloom filters are often implemented for facilitating the additional classification requirement. The parallel Bloom filters, however, tend to produce additional false-positive errors since membership determination must be performed on each of the parallel layers. We propose a scheme based on CMAC, a neural network mapping, which only requires a single-layer calculation to simultaneously obtain information of both the membership and classification. A hash function specifically designed for text strings is also proposed. The proposed scheme could effectively reduce false-positive errors by converging the range of membership acceptance to the minimum for each class during the neural network mapping. Simulation results show that the proposed scheme committed significantly less errors than the benchmark, parallel Bloom filters, with limited and identical memory usage at different classification levels.  相似文献   

20.
Bloom filters provide space-efficient storage of sets at the cost of a probability of false positives on membership queries. The size of the filter must be defined a priori based on the number of elements to store and the desired false positive probability, being impossible to store extra elements without increasing the false positive probability. This leads typically to a conservative assumption regarding maximum set size, possibly by orders of magnitude, and a consequent space waste. This paper proposes Scalable Bloom Filters, a variant of Bloom filters that can adapt dynamically to the number of elements stored, while assuring a maximum false positive probability.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号