共查询到20条相似文献,搜索用时 15 毫秒
1.
We present the design and implementation of a parallel exact inference algorithm on the Cell Broadband Engine (Cell BE) processor, a heterogeneous multicore architecture. Exact inference is a key problem in exploring probabilistic graphical models, where the computation complexity increases dramatically with the network structure and clique size. In this paper, we exploit parallelism in exact inference at multiple levels. We propose a rerooting method to minimize the critical path for exact inference, and an efficient scheduler to dynamically allocate SPEs. In addition, we explore potential table representation and layout to optimize DMA transfer between local store and main memory. We implemented the proposed method and conducted experiments on the Cell BE processor in the IBM QS20 Blade. We achieved speedup up to 10 × on the Cell, compared to state-of-the-art processors. The methodology proposed in this paper can be used for online scheduling of directed acyclic graph (DAG) structured computations. 相似文献
2.
The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor 总被引:1,自引:0,他引:1
Michael Gschwind 《International journal of parallel programming》2007,35(3):233-262
As CMOS feature sizes continue to shrink and traditional microarchitectural methods for delivering high performance (e.g.,
deep pipelining) become too expensive and power-hungry, chip multiprocessors (CMPs) become an exciting new direction by which
system designers can deliver increased performance. Exploiting parallelism in such designs is the key to high performance,
and we find that parallelism must be exploited at multiple levels of the system: the thread-level parallelism that has become
popular in many designs fails to exploit all the levels of available parallelism in many workloads for CMP systems. We describe
the Cell Broadband Engine and the multiple levels at which its architecture exploits parallelism: data-level, instruction-level,
thread-level, memory-level, and compute-transfer parallelism. By taking advantage of opportunities at all levels of the system,
this CMP revolutionizes parallel architectures to deliver previously unattained levels of single chip performance. We describe
how the heterogeneous cores allow to achieve this performance by parallelizing and offloading computation intensive application
code onto the Synergistic Processor Element (SPE) cores using a heterogeneous thread model with SPEs. We also give an example
of scheduling code to be memory latency tolerant using software pipelining techniques in the SPE.
This paper is based in part on “Chip multiprocessing and the Cell Broadband Engine”, ACM Computing Frontiers 2006. 相似文献
3.
Antal Tátrai Author Vitae 《Journal of Parallel and Distributed Computing》2011,71(4):565-572
In this paper the author make a comprehensive comparison of different parallelizations of a sequential number theoretic algorithm having large memory requirements. Brunotte’s algorithm is one of the currently known best methods for the decision of the canonical number system (or more generally shift radix system) property. Still, it can be very space-consuming in some cases. Pushing the algorithm to its limits may hopefully shed light on mathematical patterns that would otherwise not be discernible. The algorithm contains many n-dimensional vector operations and set operations like insert, find, clear, etc. The parallel algorithms encounter two difference kinds of concurrency problems. First, they need computationally intensive arithmetic vector operations, second, the set implementations require a huge amount of memory and general purpose processors. The algorithms described in this article are basically designed for two platforms. The first platform is a generic symmetric multiprocessing (SMP) architecture without any vector processor extension, the second is the Cell Broadband Engine. The SMP platforms have several general purpose processors in contrast with the Cell Broadband Engine where the processors have Synergistic vector processors. 相似文献
4.
本文介绍了一个宽带网络综合管理系统的组成,该系统支持各种接入和各种应用实时计费,还实现了网络系统运营商、业务运营商、内容提供商分账的新的运营模式。文中重点讨论了该系统实时计费引擎的设计。 相似文献
5.
Guochun Shi Volodymyr Kindratenko Steven Gottlieb 《International journal of parallel programming》2009,37(5):488-507
We report the results of the bottom-up implementation of one MILC lattice quantum chromodynamics (QCD) application on the
Cell Broadband Engine™ processor. In our implementation, we preserve MILC’s framework for scaling the application to run on
a large number of compute nodes and accelerate computationally intensive kernels on the Cell’s synergistic processor elements.
Speedups of 3.4 × for the 8 × 8 × 16 × 16 lattice and 5.7 × for the 16 × 16 × 16 × 16 lattice are obtained when comparing
our implementation of the MILC application executed on a 3.2 GHz Cell processor to the standard MILC code executed on a quad-core
2.33 GHz Intel Xeon processor. We provide an empirical model to predict application performance for a given lattice size.
We also show that performance of the compute-intensive part of the application on the Cell processor is limited by the bandwidth
between main memory and the Cell’s synergistic processor elements, whereas performance of the application’s parallel execution
framework is limited by the bandwidth between main memory and the Cell’s power processor element. 相似文献
6.
7.
介绍了一种利用开源视觉库OpenCV在Visual Studio 2005环境下实现图像阈值分割的方法。首先利用双立方插值法估算图像的光照分布,去除不均匀光照,再利用Otsu方法对图像进行阈值分割。实验表明,对传统阈值分割法难以处理的不均匀光照图像能达到良好的分割效果。 相似文献
8.
9.
Nikolaos G. Bourbakis Christos Alexopoulos Allen Klinger 《Computer Languages, Systems and Structures》1989,14(4):239-254
SCAN is a special purpose context-free language which describes and generates a wide range of array accessing algorithms from a short set of simple ones. These algorithms may represent scan techniques for image processing, but at the same time they stand as generic data accessing strategies. In this paper we present two schemes (one sequential and one parallel) which implement the SCAN language and compare their memory requirements and execution time. 相似文献
10.
拜访地接入是解决省间漫游带宽瓶颈的有效方式。对从归属地接入改造为拜访地接入的关键技术进行探讨,通过特定签约信息综合判断对目标用户进行有效识别,降低漫游上网时延,提升用户感知。 相似文献
11.
12.
机器视觉系统是集机械、光电、控制、计算机和数字图像处理等技术于一体的高度自动化系统,能够有效地提高产品的质量和产量,现已广泛应用于工业当中。随着科技的发展,对焊接的质量与效率的要求越来越大,为此有必要设计一套行之有效的,在焊接自动跟踪技术中能够在线检测焊缝并进行处理的焊缝实时检测与处理系统。文中在VC++6.0开发环境中设计了一种采用Otsu自动选择阈值和Canny算子检测边缘,基于OpenCV的焊缝实时检测与处理算法,实验结果表明基于该算法的系统实时性高、效果好、检测正确快速。 相似文献
13.
This paper describes the design concepts behind implementations of mixed‐precision linear algebra routines targeted for the Cell processor. It describes in detail the implementation of code to solve linear system of equations using Gaussian elimination in single precision with iterative refinement of the solution to the full double‐precision accuracy. By utilizing this approach the algorithm achieves close to an order of magnitude higher performance on the Cell processor than the performance offered by the standard double‐precision algorithm. The code is effectively an implementation of the high‐performance LINPACK benchmark, as it meets all of the requirements concerning the problem being solved and the numerical properties of the solution. Copyright © 2007 John Wiley & Sons, Ltd. 相似文献
14.
本设计是基于OpenCV的管道巡检水下机器人的设计与实现,水下机器人通过所配备的OpenCV摄像头对水下管道进行探测,主要应用于对水下或海底管道的路线进行循迹,并且能够对外壁破损情况的检查以及清理附着的杂物。该设计的实现是应用OpenCV摄像头对所需巡检的管道进行探测,并将所探测到的图像反馈到STM32控制器中识别出管道的循迹路线,并通过驱动模块中的电机设计实现平稳的循迹以及转弯功能。最终通过水下机器人的循迹以及图像处理功能的设计将探测到管道外部轮廓的图像信息进行处理,突出外壁所存在的问题以及附着的杂物。本设计将水下机器人驱动模式与图像处理功能进行很好的结合与应用,从而实现对管道循迹的过程中管道外壁的收检测。 相似文献
15.
Efficient high-performance implementation of JPEG-LS encoder 总被引:1,自引:0,他引:1
Markos E. Papadonikolakis Athanasios P. Kakarountas Costas E. Goutis 《Journal of Real-Time Image Processing》2008,3(4):303-310
A new design approach to create an efficient high-performance JPEG-LS encoder is proposed in this paper. The proposed implementation
compresses the image data with the lossless mode of JPEG-LS. When the acquisition of precious content (image) is specified
to occur in real-time, then lossless compression is essential. Lossless compression is important to critical applications,
such as the acquisition of medical images and transmission of high-definition high-resolution images from space (satellite).
The contribution of the paper is to introduce an efficient pipelined JPEG-LS encoder, which requires significantly lower encoding
time than any other available JPEG-LS hardware or software implementation. The experimental results show that encoding is
performed as expected in high-speed, being able to serve real-time applications. This is the first time that a JPEG-LS implementation
offers such a high-speed encoding.
相似文献
Athanasios P. KakarountasEmail: |
16.
该文探讨了可用于图像处理与计算机视觉编程的强大类库OpenCV,该类库使用起来极为方便,利用OpenCV中的数字图处理和计算机视觉的函数处理相关问题变得很简单。该文首先介绍OpenCV的强大功能以及研究的意义,然后介绍了OpenCV新版本的一些特点,并且讨论在VC++环境下的软件设置问题,最后给出了典型的图像处理的实例。随着计算机视觉和数字图像处理技术不断深入各个领域,OpenCV为VC++编程处理数字图像提供了极大的方便,具有广阔的应用前景,该文对于图像处理与计算机视觉方面的应用设计以及研究开发都将具有重要参考价值。 相似文献
17.
宽带滤波器的优化设计及其MATLAB仿真 总被引:1,自引:0,他引:1
该文介绍了一种宽带滤波器的优化设计方法,即结合采用影像参数法的定K式和m导出式,分别设计相应的低通、高通滤波器,将其级联后得到初步的宽带带通滤波器;然后利用MATLAB进行仿真调试,对比设计要求和滤波器响应特性,反复调整滤波器LC参数,以获得最好的滤波性能。这种方法设计的带通滤波器带宽大、噪声低、频率响应特性较理想,并且设计简单、计算容易,也易于实现。这种设计方法特别适用于本身计算、比较复杂的宽带滤波器的设计。本文设计的宽带LC滤波器在实际应用中获得了良好的效果。 相似文献
18.
Multicore accelerators are used today to supplement traditional superscalar processors in massively parallel computer nodes with extra floating‐point computation power. This paper presents our parallelization and performance enhancement and evaluation of the conjugate gradient (CG) linear equation solver with enhanced matrix multiplication on the Cell Broadband Engine accelerator. The paper also compares the CG performance results on the Cell and two CG implementations on a computer with two quadcore Xeon processors, one with OpenMP and the other with OpenMPI. We also report the enhancements made on the CG code and performance analysis of CG on single and dual Cell Broadband Engine packages with 8 and 16 synergistic processing elements and on Xeon for heptadiagonal matrices, in particular to matrix multiplication and synchronization. We also report the communication and computation time breakdowns and the floating point operations per second ratio. Our parallel CG solver is shown to scale well with data size, grid dimensionality, and number of cores. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献
19.
在Visual C++6.0环境下,介绍一种结合背景差分法和瞬时差分法,能在视频序列中识别运动目标算法,结合OpenCV给出了具体过程和部分代码。该算法利用瞬时差分法得到当前帧中运动目标的轮廓信息,在更新背景模型时不更新运动目标轮廓内区域,避免了由运动目标引起的背景模型更新误差,当发现移动物便自动发出警告声。 相似文献
20.
宽带城域网中的新一代宽带无线接入技术—LMDS 总被引:1,自引:0,他引:1
1.引言当前,随着国际电信市场的迅速发展,目前各国的核心网络建设均初具规模,基本可满足当前通信的需求。而突出的矛盾体现在接入网方面,即用户与核心网络的连接部分。这一问题是通信向宽带、智能、个人化发展的关键。随着网络经济的迅速崛起和因特网的快速发展,人们对于数据业务、语音、数据、图像等多媒体通信的需求日益增强,传统的铜线、电缆,已完全不能满足传输的要求,更新成本又 相似文献