首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Vector similarity join, which finds similar pairs of vector objects, is a computationally expensive process. As its number of vectors increases, the time needed for join operation increases proportional to the square of the number of vectors. Various filtering techniques have been proposed to reduce its computational load. On the other hand, MapReduce algorithms have been studied to manage large datasets. The recent improvements, however, still suffer from its computational time and scalability. In this paper, we propose a MapReduce algorithm FACET(FAst and sCalable maprEduce similariTy join) to efficiently solve the vector similarity join problem on large datasets. FACET is an all-pair exact join algorithm, composed of two stages. In the first stage, we use our own novel filtering techniques to eliminate dissimilar pairs to generate non-redundant candidate pairs. The second stage matches candidate pairs with the vector data so that similar pairs are produced as the output. Both stages employ parallelism offered by MapReduce. The algorithm is currently designed for cosine similarity and Self Join case. Extensions to other similarity measures and R-S Join case are also discussed. We provide the I/O analysis of the algorithm. We evaluate the performance of the algorithm on multiple real world datasets. The experiment results show that our algorithm performs, on average, 40 % upto 800 % better than the previous state-of-the-art MapReduce algorithms.  相似文献   

2.
F.  G. 《Computer Networks》2003,42(6):717-735
Packet filters provide rules for classifying packets based on header fields. High speed packet classification has received much study. However, the twin problems of fast updates and fast conflict detection have not received much attention. A conflict occurs when two classifiers overlap, potentially creating ambiguity for packets that match both filters. For example, if Rule 1 specifies that all packets going to CNN be rate controlled and Rule 2 specifies that all packets coming from Walmart be given high priority, the rules conflict for traffic from Walmart to CNN. There has been prior work on efficient conflict detection for two-dimensional classifiers. However, the best known algorithm for conflict detection for general classifiers is the naive O(N2) algorithm of comparing each pair of rules for a conflict. In this paper, we describe an efficient and scalable conflict detection algorithm for the general case that is significantly faster. For example, for a database of 20 000 rules, our algorithm is 40 times faster than the naive implementation. Even without considering conflicts, our algorithm also provides a packet classifier with fast updates and fast lookups that can be used for stateful packet filtering.  相似文献   

3.
4.
The main contributions of this paper are in designing fast and scalable parallel algorithms for selection and median filtering. Based on the radix-/spl omega/ representation of data and the prune-and-search approach, we first design a fast and scalable selection algorithm on the arrays with reconfigurable optical buses (AROB). To the authors' knowledge, this is the most time efficient algorithm yet published, especially compared to the algorithms proposed by Han et al (2002) and Pan (1994). Then, given an N /spl times/ N image and a W /spl times/ W window, based on the proposed selection algorithm, several scalable median filtering algorithms are developed on the AROB model with a various number of processors. In the sense of the product of time and the number of processors used, most of the proposed algorithms are time or cost optimal.  相似文献   

5.
Clustering plays an important role in mining big data both as a modeling technique and a preprocessing step in many data mining process implementations. Fuzzy clustering provides more flexibility than non-fuzzy methods by allowing each data record to belong to more than one cluster to some degree. However, a serious challenge in fuzzy clustering is the lack of scalability. Massive datasets in emerging fields such as geosciences, biology, and networking do require parallel and distributed computations with high performance to solve real-world problems. Although some clustering methods are already improved to execute on big data platforms, their execution time is highly increased for gigantic datasets. In this paper, a scalable Fuzzy C-Means (FCM) clustering method named BigFCM is proposed and designed for the Hadoop distributed data platform. Based on the MapReduce programming model, the proposed algorithm exploits several mechanisms including an efficient caching design to achieve several orders of magnitude reduction in execution time. The BigFCM performance compared with Apache Mahout K-Means and Fuzzy K-Means through an evaluation framework developed in this research. Extensive evaluation using over multi-gigabyte datasets including SUSY and HIGGS shows that BigFCM is scalable while it preserves the quality of clustering.  相似文献   

6.
This paper reports a diary study of the use of mobile telephones for rendezvousing by young adults (aged 18–30) and mature adults (aged 31–45) in the UK. A number of age differences were found. Specifically, 31–45s more frequently: (1) attributed problems rendezvousing to the overrunning of previous activities, and to the spontaneous performance of additional tasks (‘side-stepping’); (2) reported that ‘problem’ rendezvous resulted in unnecessary sacrifices; and (3) changed plans for the rendezvous. These differences arose, because additional family commitments encouraged 31–45s to pack their daily programme of activities more tightly than 18–30s. Mobile phones might better target 31–45s, if they, for example, enhanced To Do Lists with context-sensitive reminders, in the first instance, reminders triggered by location (GSM network cellID) and logging off from PCs.
  相似文献   

7.
8.
为了提高质量可伸缩高性能视频编码(SHVC)的编码速度,提出一种基于质量SHVC的帧内预测算法。首先,利用层间相关性来预测可能的深度,排除可能性较小的深度;其次,对可能的编码深度,采用层间预测(ILR)模式进行编码,并对得到的残差系数进行分布拟合检验,判断是否满足拉普拉斯分布从而跳过帧内模式;最后,对深度编码得到的深度残差系数判断是否满足深度提前终止判断条件,如果满足该条件则提前终止以提高编码速度。实验结果表明,所提算法能够在保证编码效率损失很小的情况下使编码速度提高79%。  相似文献   

9.
10.
针对可分级视频编码(SVC)技术中层间残差预测(ILRP)大大增加了编码复杂度的问题,提出一种快速算法。带残差的帧间预测和不带残差的帧间预测都采用全搜索的预测算法,编码复杂度很大,所提算法分析了增强层带残差的帧间预测和不带残差的帧间预测之间的率失真代价(RDCost)差异,动态判定增强层是否需要采用层间残差预测以减少层间残差预测过程;同时依据层间预测模式相关性,利用基本层最优帧间预测模式指导增强层最优帧间预测模式的选择过程,进一步节省编码时间。实验结果表明,与参考模型JSVM中的算法相比,改进的快速算法在编码质量降低小于0.01dB,码率提高不大于3%的前提下,可节省平均编码时间50%左右,有效地降低了编码复杂度,对于编码器的优化方面有理论参考价值和实际应用意义。  相似文献   

11.
提出了一种适用于SVC增强层的快速帧内模式选择算法。首先利用基本层和增强层模式分布的相关性,优化了增强层的帧内预测过程;并提出了一种能优化SVC的INTRA_BL模式选择过程的算法。实验结果证明,在比特率和PSNR值保持相近的情况下,该算法的编码时间可减少59.6%。  相似文献   

12.
Although scalable video coding can achieve coding efficiencies comparable with single layer video coding, its computational complexity is higher due to its additional inter-layer prediction process. This paper presents a fast adaptive termination algorithm for mode selection to increase its computation speed while attempting to maintain its coding efficiency. The developed algorithm consists of the following three main steps which are applied not only to the enhancement layer but also to the base layer: a prediction step based on neighboring macroblocks, a first round check step, and a second round check step or refinement if failure occurs during the first round check. Comparison results with the existing algorithms are provided. The results obtained on various video sequences show that the introduced algorithm achieves about one-third reduction in the computation speed while generating more or less the same video quality.
Jianfeng RenEmail:
  相似文献   

13.
14.
15.
16.
为进一步提高基于一种运动信息可分级模型的可分级视频编码的编码效率,减小运动信息可分级的最低码率限制,对该运动可分级模型的二维多向性进行了具体研究与改进,更新了运动估计的流程,提出了多向上的两个运动可分级维度的运动分级等级一致性原则以及二维数据的渐进式存储存取结构,从而实现了二维多向上的运动可分级。实验测试结果证明该二维多向的运动信息模型优于不可分级的运动信息,能有效提高可分级视频编码系统的效率。  相似文献   

17.

This paper proposes a novel human vision system based, spread spectrum method to scalable image watermarking. A scalable decomposition of the watermark is spread into the entire frequency sub-bands of the wavelet decomposed image. At each wavelet sub-band, the watermark data are inserted into the selected coefficients of the sub-band in a manner that the watermark embedding visual artifact occurs in the highly textured, highly contrasted and very dark/bright areas of the image. In the lowest frequency sub-band of wavelet transform, the coefficients are selected by independent analysis of texture, contrast and luminance information. In high frequency sub-bands, the coefficient selection is done by analyzing coefficients amplitude and local entropy. The experimental results show that the watermarked test images are highly transparent and robust against scalable wavelet-based image coding even at very low bit-rate coding. The proposed approach can guarantee content authentication for scalable coded images, especially on heterogeneous networks which different users with different process capabilities and network access bandwidth use unique multimedia sources.

  相似文献   

18.
19.
《Computer》2002,35(9):93-95
Over the past decade, many observers have claimed that the Internet brings the information revolution's components together in a way that will rival the industrial revolution's effects on human productivity and quality of life. The paper discusses an economically scalable Internet, including bandwidth resources, quality of service, users' quality of experience, scalability and multicast-enabled distribution.  相似文献   

20.
RDF Site Summaries constitute an application of RDF on the Web that has considerably grown in popularity. However, the way RSS systems operate today limits their scalability. Current RSS feed arregators follow a pull-based architecture model, which is not going to scale with the increasing number of RSS feeds becoming available on the Web. In this paper, we introduce G-ToPSS, a scalable publish/subscribe system for selective information dissemination. G-ToPSS only sends newly updated information to the interested user and follows a push-based architecture model. G-ToPSS is particularly well suited for applications that deal with large-volume content distribution from diverse sources. G-ToPSS allows use of an ontology as a way to provide additional information about the data disseminated. We have implemented and experimentally evaluated G-ToPSS and we provide results demonstrating its scalability compared to alternative approaches. In addition, we describe an application of G-ToPSS and RSS to a Web-based content management system that provides an expressive, efficient, and convenient update notification dissemination system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号