首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present the design and implementation of a parallel exact inference algorithm on the Cell Broadband Engine (Cell BE) processor, a heterogeneous multicore architecture. Exact inference is a key problem in exploring probabilistic graphical models, where the computation complexity increases dramatically with the network structure and clique size. In this paper, we exploit parallelism in exact inference at multiple levels. We propose a rerooting method to minimize the critical path for exact inference, and an efficient scheduler to dynamically allocate SPEs. In addition, we explore potential table representation and layout to optimize DMA transfer between local store and main memory. We implemented the proposed method and conducted experiments on the Cell BE processor in the IBM QS20 Blade. We achieved speedup up to 10 × on the Cell, compared to state-of-the-art processors. The methodology proposed in this paper can be used for online scheduling of directed acyclic graph (DAG) structured computations.  相似文献   

2.
As CMOS feature sizes continue to shrink and traditional microarchitectural methods for delivering high performance (e.g., deep pipelining) become too expensive and power-hungry, chip multiprocessors (CMPs) become an exciting new direction by which system designers can deliver increased performance. Exploiting parallelism in such designs is the key to high performance, and we find that parallelism must be exploited at multiple levels of the system: the thread-level parallelism that has become popular in many designs fails to exploit all the levels of available parallelism in many workloads for CMP systems. We describe the Cell Broadband Engine and the multiple levels at which its architecture exploits parallelism: data-level, instruction-level, thread-level, memory-level, and compute-transfer parallelism. By taking advantage of opportunities at all levels of the system, this CMP revolutionizes parallel architectures to deliver previously unattained levels of single chip performance. We describe how the heterogeneous cores allow to achieve this performance by parallelizing and offloading computation intensive application code onto the Synergistic Processor Element (SPE) cores using a heterogeneous thread model with SPEs. We also give an example of scheduling code to be memory latency tolerant using software pipelining techniques in the SPE. This paper is based in part on “Chip multiprocessing and the Cell Broadband Engine”, ACM Computing Frontiers 2006.  相似文献   

3.
In this paper the author make a comprehensive comparison of different parallelizations of a sequential number theoretic algorithm having large memory requirements. Brunotte’s algorithm is one of the currently known best methods for the decision of the canonical number system (or more generally shift radix system) property. Still, it can be very space-consuming in some cases. Pushing the algorithm to its limits may hopefully shed light on mathematical patterns that would otherwise not be discernible. The algorithm contains many n-dimensional vector operations and set operations like insert, find, clear, etc. The parallel algorithms encounter two difference kinds of concurrency problems. First, they need computationally intensive arithmetic vector operations, second, the set implementations require a huge amount of memory and general purpose processors. The algorithms described in this article are basically designed for two platforms. The first platform is a generic symmetric multiprocessing (SMP) architecture without any vector processor extension, the second is the Cell Broadband Engine. The SMP platforms have several general purpose processors in contrast with the Cell Broadband Engine where the processors have Synergistic vector processors.  相似文献   

4.
本文介绍了一个宽带网络综合管理系统的组成,该系统支持各种接入和各种应用实时计费,还实现了网络系统运营商、业务运营商、内容提供商分账的新的运营模式。文中重点讨论了该系统实时计费引擎的设计。  相似文献   

5.
We report the results of the bottom-up implementation of one MILC lattice quantum chromodynamics (QCD) application on the Cell Broadband Engine™ processor. In our implementation, we preserve MILC’s framework for scaling the application to run on a large number of compute nodes and accelerate computationally intensive kernels on the Cell’s synergistic processor elements. Speedups of 3.4 × for the 8 × 8 × 16 × 16 lattice and 5.7 × for the 16 × 16 × 16 × 16 lattice are obtained when comparing our implementation of the MILC application executed on a 3.2 GHz Cell processor to the standard MILC code executed on a quad-core 2.33 GHz Intel Xeon processor. We provide an empirical model to predict application performance for a given lattice size. We also show that performance of the compute-intensive part of the application on the Cell processor is limited by the bandwidth between main memory and the Cell’s synergistic processor elements, whereas performance of the application’s parallel execution framework is limited by the bandwidth between main memory and the Cell’s power processor element.  相似文献   

6.
为利用智能手机进行车牌识别,研究了OpenCV在车牌识别中的应用。首先介绍了OpenCV及车牌识别的工作流程,然后论述了OpenCV在车牌定位、字符分割和字符识别中的具体应用。实验结果表明识别效果良好,为在Android等智能手机上进一步开发车辆信息管理系统奠定了基础。  相似文献   

7.
介绍了一种利用开源视觉库OpenCV在Visual Studio 2005环境下实现图像阈值分割的方法。首先利用双立方插值法估算图像的光照分布,去除不均匀光照,再利用Otsu方法对图像进行阈值分割。实验表明,对传统阈值分割法难以处理的不均匀光照图像能达到良好的分割效果。  相似文献   

8.
SCAN is a special purpose context-free language which describes and generates a wide range of array accessing algorithms from a short set of simple ones. These algorithms may represent scan techniques for image processing, but at the same time they stand as generic data accessing strategies. In this paper we present two schemes (one sequential and one parallel) which implement the SCAN language and compare their memory requirements and execution time.  相似文献   

9.
拜访地接入是解决省间漫游带宽瓶颈的有效方式。对从归属地接入改造为拜访地接入的关键技术进行探讨,通过特定签约信息综合判断对目标用户进行有效识别,降低漫游上网时延,提升用户感知。  相似文献   

10.
提出一种Java与OpenCV结合实现的目标检测模块,详细讲述了利用JNI技术调用OpenCV目标检测方法的具体步骤和关键过程。该模块可很容易地与科研、工业等领域的Java视频系统集成。实验结果表明,集成了该模块的Java视频系统获得了较高的检测率和处理速度。  相似文献   

11.
机器视觉系统是集机械、光电、控制、计算机和数字图像处理等技术于一体的高度自动化系统,能够有效地提高产品的质量和产量,现已广泛应用于工业当中。随着科技的发展,对焊接的质量与效率的要求越来越大,为此有必要设计一套行之有效的,在焊接自动跟踪技术中能够在线检测焊缝并进行处理的焊缝实时检测与处理系统。文中在VC++6.0开发环境中设计了一种采用Otsu自动选择阈值和Canny算子检测边缘,基于OpenCV的焊缝实时检测与处理算法,实验结果表明基于该算法的系统实时性高、效果好、检测正确快速。  相似文献   

12.
Efficient high-performance implementation of JPEG-LS encoder   总被引:1,自引:0,他引:1  
A new design approach to create an efficient high-performance JPEG-LS encoder is proposed in this paper. The proposed implementation compresses the image data with the lossless mode of JPEG-LS. When the acquisition of precious content (image) is specified to occur in real-time, then lossless compression is essential. Lossless compression is important to critical applications, such as the acquisition of medical images and transmission of high-definition high-resolution images from space (satellite). The contribution of the paper is to introduce an efficient pipelined JPEG-LS encoder, which requires significantly lower encoding time than any other available JPEG-LS hardware or software implementation. The experimental results show that encoding is performed as expected in high-speed, being able to serve real-time applications. This is the first time that a JPEG-LS implementation offers such a high-speed encoding.
Athanasios P. KakarountasEmail:
  相似文献   

13.
刘洁  冯贵玉  张汗灵 《计算机仿真》2006,23(11):305-307,344
该文探讨了可用于图像处理与计算机视觉编程的强大类库OpenCV,该类库使用起来极为方便,利用OpenCV中的数字图处理和计算机视觉的函数处理相关问题变得很简单。该文首先介绍OpenCV的强大功能以及研究的意义,然后介绍了OpenCV新版本的一些特点,并且讨论在VC++环境下的软件设置问题,最后给出了典型的图像处理的实例。随着计算机视觉和数字图像处理技术不断深入各个领域,OpenCV为VC++编程处理数字图像提供了极大的方便,具有广阔的应用前景,该文对于图像处理与计算机视觉方面的应用设计以及研究开发都将具有重要参考价值。  相似文献   

14.
宽带滤波器的优化设计及其MATLAB仿真   总被引:1,自引:0,他引:1  
该文介绍了一种宽带滤波器的优化设计方法,即结合采用影像参数法的定K式和m导出式,分别设计相应的低通、高通滤波器,将其级联后得到初步的宽带带通滤波器;然后利用MATLAB进行仿真调试,对比设计要求和滤波器响应特性,反复调整滤波器LC参数,以获得最好的滤波性能。这种方法设计的带通滤波器带宽大、噪声低、频率响应特性较理想,并且设计简单、计算容易,也易于实现。这种设计方法特别适用于本身计算、比较复杂的宽带滤波器的设计。本文设计的宽带LC滤波器在实际应用中获得了良好的效果。  相似文献   

15.
在Visual C++6.0环境下,介绍一种结合背景差分法和瞬时差分法,能在视频序列中识别运动目标算法,结合OpenCV给出了具体过程和部分代码。该算法利用瞬时差分法得到当前帧中运动目标的轮廓信息,在更新背景模型时不更新运动目标轮廓内区域,避免了由运动目标引起的背景模型更新误差,当发现移动物便自动发出警告声。  相似文献   

16.
As parallel machines become more widely available, many existing algorithms are being converted to take advantage of the improved speed offered by such computers. However, the method by which the algorithm is distributed is crucial towards obtaining the speed-ups required for many real-time tasks. This paper presents three parallel implementations of the Douglas—Peucker line simplification algorithm on a Sequent Symmetry computer and compares the performance of each with the original sequential algorithm.  相似文献   

17.
This paper describes the FPGA implementation of FastCrypto, which extends a general-purpose processor with a crypto coprocessor for encrypting/decrypting data. Moreover, it studies the trade-offs between FastCrypto performance and design parameters, including the number of stages per round, the number of parallel Advance Encryption Standard (AES) pipelines, and the size of the queues. Besides, it shows the effect of memory latency on the FastCrypto performance. FastCrypto is implemented with VHDL programming language on Xilinx Virtex V FPGA. A throughput of 222 Gb/s at 444 MHz can be achieved on four parallel AES pipelines. To reduce the power consumption, the frequency of four parallel AES pipelines is reduced to 100 MHz while the other components are running at 400 MHz. In this case, our results show a FastCrypto performance of 61.725 bits per clock cycle (b/cc) when 128-bit single-port L2 cache memory is used. However, increasing the memory bus width to 256-bit or using 128-bit dual-port memory, improves the performance to 112.5 b/cc (45 Gb/s at 400 MHz), which represents 88% of the ideal performance (128 b/cc).  相似文献   

18.
针对DMD视频播放需求,提出了一种基于Open CV的视频播放程序开发方法。利用面向对象开发平台VC++6.0设计程序操作界面。调用Open CV中封装好的函数实现AVI格式视频文件的读取与预处理,将图像帧转换成DMD播放需要的特定灰度图像。通过USB将转换后的图像传送至DMD播放器播放,并在操作界面上实现视频同步播放。  相似文献   

19.
本文依据印刷质量检测系统的要求,通过分析用于数字图像处理的开放源代码的计算机视觉类库OpenCV的特点与功能,分析了OpenCV在印刷质量检测系统中的可行性。提出OpenCV可以用于印刷质量检测系统的开发。  相似文献   

20.
Electron tomography (ET) combines electron microscopy and the principles of tomographic imaging in order to reconstruct the three-dimensional structure of complex biological specimens at molecular resolution. Weighted back-projection (WBP) has long been the method of choice since the reconstructions are very fast. It is well known that iterative methods produce better images, but at a very costly time penalty. In this work, it is shown that efficient parallel implementations of iterative methods, based primarily on data decomposition, can speed up such methods to an extent that they become viable alternatives to WBP. Precomputation of the coefficient matrix has also turned out to be important to substantially improve the performance regardless of the number of processors used. Matrix precomputation has made it possible to speed up the block-iterative component averaging (BICAV) algorithm, which has been studied before in the context of computerized tomography (CT) and ET, by a factor of more than 3.7. Component-averaged row projections (CARP) is a recently introduced block-parallel algorithm, which was shown to be a robust method for solving sparse systems arising from partial differential equations. It is shown that this algorithm is also suitable for single-axis ET, and is advantageous over BICAV both in terms of runtime and image quality. The experiments were carried out on several datasets of ET of various sizes, using the blob model for representing the reconstructed object.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号