共查询到19条相似文献,搜索用时 203 毫秒
1.
模板匹配是进行滤波、边缘检测、目标识别和图像匹配的一种基本和有效的方法。但是模板匹配是一种密集型运算,在单处理机上实现耗时较多,但是如采用并行阵列计算机,硬软件成本也会相应提高。所幸Intel处理器提供了MMX/SSE/SSE2指令集,支持指令级SIMD操作。可将模板匹配主要运算部分进行SIMD并行化,在Linux平台下编程实现单处理机上的并行处理。测试结果表明:SIMD大大加快了模板匹配的速度。 相似文献
2.
利用流SIMD扩展加速3D曲线网格的流线计算 总被引:4,自引:0,他引:4
流线是一种基本的流场可视化技术,计算流线要耗费大量时间,Intel处理器(Pentium Ⅲ,Pentium4)提供流SIMD扩展(SSE),支持指令 级SIMD操作。3D曲线网格上的流线计算包含速度插值、数值积分、点定位等主要子过程,具有很高的内在SIMD并行性。通过将数据按SSE数据类型组织以及对主要子过程进行SIMD并行化,设计了线流计算的SSE算法。采用向量类库、嵌入汇编两种SSE编码方式分别实现SSE算法,并依据处理器的体系结构优化代码。测试结果表明:SSE大大加速了3D曲线网格的流线计算,向量类库方式比传统计算提高55%左右的性能,嵌入汇编提高75%左右。 相似文献
3.
代数重建算法是解决非完全投影数据重建的有效方法,尤其在对于超出探测器尺寸范围的大型零件的无损检测中已成为最有力的关键技术,但以往算法计算量较大、耗时较长。为了快速地进行代数重建,提出了一种基于Intel处理器单指令多数据(single instruction multiple data,SIMD)技术[2]的快速并行算法,并在充分分析代数重建公式特点的基础上,设计了一套便于并行化运算的数据结构及计算流程,其在运算中可一次性加载多个打包数据,利用MMX(multimedia extension)、SSE(streaming SIMD extension)和SSE2指令完成SIMD方式计算。通过仿真实验证明,该算法在达到同样精度的前提下,不仅提高了重建速度(加速比4倍),解决了传统代数重建算法运算速度慢的瓶颈问题,并且能够较好地重建部分数据缺失的投影图像,该算法对于航空航天大型零部件的无损检测具有重要的理论意义及工程应用价值。 相似文献
4.
5.
6.
7.
8.
一、PentiumⅢ及其SSE指令今年,Intel推出了其新一代处理器PentiumⅢ.PentiumⅢ又称为Katmai,这款新处理器使用与最新的PentiumⅡ同样的100MHz总线,同样的512K半速L2缓存,以及同样的Slot 1封装,给人的感觉是PentiumⅢ仅仅在时钟频率上提升到了500MHz(比最快的Pentium Ⅱ高50MHz),而且在普通的应用中好象PentiumⅢ比Pentium Ⅱ也快得不是很多,最多也就是百分之几.难道Pentium Ⅲ就没有比Pentium Ⅱ明显强大的地方吗?实际上不是如此,Pentium Ⅲ的优势就在于其新的特性:Intel SSE(Streaming SIMD Extensions,数据流单指令多数据扩展,原来被称为KNI(Katmai New Instructions)).SSE提供的指令集和微结构扩展使那些支持SSE的软件的运行效率得到提升,特别是与浮点运算有密切关系的软件.给我的感觉Intel SSE就是一套浮点的MMX指令,一套增强版本的单指令多数据流指令集合.SSE可以大大地增强PentiumⅢ的浮点运算效率,而浮点运算在3D几何学以及其它高端多媒体处理中扮演着重要的角色. 相似文献
9.
K元2—立方体网络SIMD计算机图像模板匹配并行算法 总被引:5,自引:0,他引:5
模板匹配是进行虑波、边缘检测、目标识别和图像匹配的一种基本和有效的方法 .对于 N× N的图像和M× N ( M相似文献
10.
一种基于奔腾SIMD指令的快速背景提取方法 总被引:3,自引:0,他引:3
论文提出一种基于Intel奔腾SIMD指令的快速背景提取方法。在一种改进的混合高斯背景模型中,Jeffrey值的计算和背景模型的更新等存在着很高的内在SIMD并行性,通过将数据按照SSE数据类型组织,实现了混合高斯背景模型的SIMD算法。实验结果表明:嵌入奔腾SIMD指令的方法比传统计算提高75%左右的性能,加速了背景提取的速度,达到了实时处理的要求,具有较大的实际应用价值。 相似文献
11.
Prasanna K.V.K. Krishnan V. 《IEEE transactions on pattern analysis and machine intelligence》1989,11(6):665-669
Efficient parallel algorithms developed on hypercube SIMD (single-instruction multiple data-stream) machines for image template matching are presented. Most of these parallel algorithms are asymptotically optimal in their time complexities. These results improve the known bounds in the literature 相似文献
12.
Allen R. Cinque L. Tanimoto S. Shapiro L. Yasuda D. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(5):490-501
Search of discrete spaces is important in combinatorial optimization. Such problems arise in artificial intelligence, computer vision, operations research, and other areas. For realistic problems, the search spaces to be processed are usually huge, necessitating long computation times, pruning heuristics, or massively parallel processing. We present an algorithm that reduces the computation time for graph matching by employing both branch-and-bound pruning of the search tree and massively-parallel search of the as-yet-unpruned portions of the space. Most research on parallel search has assumed that a multiple-instruction-stream/multiple-data-stream (MIMD) parallel computer is available. Since massively parallel stream (SIMD) computers are much less expensive than MIMD systems with equal numbers of processors, the question arises as to whether SIMD systems can efficiently handle state-space search problems. We demonstrate that the answer is yes, and in particular, that graph matching has a natural and efficient implementation on SIMD machines 相似文献
13.
非线性存储方案能在处理单元数等于存储体数的情况下,使SIMD机实现多种访存模式无冲突,提高其整体性能,文中提出一种用线性存储方案设计SIMD 一般方法,在存储方案给定的前提下,针对有限的模板集设计出同时满足存储器访问无冲突和互联网的并行结构,首先,用布尔向量空间表示模板,并指出模板与LC置换的对应关系,在此基础上,提出设计局部地址生成逻辑和增强的间接二进制N方体网络的方法,由于板集中任意的访存方式 相似文献
14.
This correspondence presents several parallel algorithms for image template matching on an SIMD array processor with a hypercube interconnection network. For an N by N image and an M by M window, the time complexity is reduced from O(N2M2) for the serial algorithm to O(M2/K2 + M * log2 N/K + log2 N * log2 K) for the N2K2-PE system (1 ? K ? M), or to O(N2M2/L2) for the L2-PE system (L ? N). With efficient use of the inter-PE communication network, each PE requires only a small local memory, many unnecessary data transmissions are eliminated, and the time complexity is greatly reduced. 相似文献
15.
现代编译器越来越依赖SIMD指令来提高向量化性能,但控制流的复杂性严重阻碍了SIMD向量化的发掘。现有的控制流向量化方法对于单层控制流的向量化很有效,但对嵌套等复杂控制流无法取得令人满意的效果。因此,提出了一种基于条件分类的控制流向量化方法。该方法对条件为循环不变量的控制流,以层次遍历的顺序实施IF外提;对条件为循环变量的控制流,结合语句匹配和条件合并递归地进行IF转换,生成相应的SIMD指令,从而实现嵌套控制流的向量化。实验结果表明,该方法能够有效消除循环中的嵌套控制流,提高向量化发掘的能力, 有效提升 测试程序的性能。 相似文献
16.
The aim of the work reported here is to build a useful toolset for 3D model-based vision on an SIMD parallel machine, the AMT DAP. Included in the toolset are facilities for model specification, manipulation and rendering using a ray-tracing approach as well as model recognition and validation using a geometrical-matching approach. In particular, an SIMD parallel version of a ray tracer and an SIMD parallel version of a bottom-up geometrical matcher are described. The ray tracer can render constructive solid geometry models and incorporates spatial subdivision of the scene. The matcher uses edge primitives recovered from scenes to match to model edges using local constraints and deals with spurious data using bin assignments. The overall toolset is illustrated by its use in closed-form testing and refinement, where the models, camera geometry and frame-to-frame motion in an image sequence generated by the ray tracer are known, but are checked and validated using geometrical matching, recognition and localisation. 相似文献
17.
本文结合LS MPP嵌入式计算机图象匹配算法的设计与编程实践,对面向图象理解应用的SIMD计算机减少存储类指令及其执行周期,控制寄存器的数量需求、基地址控制寄存器内容的保护,缓冲器的有关设计及图象阵列数据类型的设立等设计问题进行了分析和讨论,提出的一些设计思想对于面向图象理解的类似SIMD计算机的设计具有较为重要的参考价值。 相似文献
18.
Hussein A. H. Ibrahim John R. Kender David Elliot Shaw 《Journal of Parallel and Distributed Computing》1987,4(6)
This paper examines the applicability of fine-grained tree-structured SIMD machines, which are amenable to highly efficient VLSI implementation, to several low-level image understanding tasks. Algorithms are presented for histogramming, thresholding, image correlation, connected component labeling, and computing Euler number. A particular massively parallel machine called NON-VON is used for purposes of explication and performance evaluation. Only NON-VON tree-structured communication capabilities and its SIMD mode of execution are considered in this paper. Novel algorithmic techniques are described, such as vertical pipelining, subproblem partitioning, associative matching, and data duplication, that effectively exploit the massive parallelism available in fine-grained SIMD tree machines while avoiding communication bottlenecks. Simulation results are presented and compared with results obtained or forecast for other highly parallel machines. The relative advantages and limitations of the class of machines under consideration are outlined; except for some types of image correlation, the fine-grained SIMD tree is exceptionally fast. 相似文献
19.
In the last decade, the volume of unstructured data that Internet and enterprise applications create and consume has been
growing at impressive rates. The tools we use to process these data are search engines, business analytics suites, natural-language
processors and XML processors. These tools rely on tokenization, a form of regular expression matching aimed at extracting
words and keywords in a character stream. The further growth of unstructured data-processing paradigms depends critically
on the availability of high-performance tokenizers. Despite the impressive amount of parallelism that the multi-core revolution
has made available (in terms of multiple threads and wider SIMD units), most applications employ tokenizers that do not exploit
this parallelism. I present a technique to design tokenizers that exploit multiple threads and wide SIMD units to process
multiple independent streams of data at a high throughput. The technique benefits indefinitely from any future scaling in
the number of threads or SIMD width. I show the approach’s viability by presenting a family of tokenizer kernels optimized
for the Cell/B.E. processor that deliver a performance seen, so far, only on dedicated hardware. These kernels deliver a peak
throughput of 14.30 Gbps per chip, and a typical throughput of 9.76 Gbps on Wikipedia input. Also, they achieve almost-ideal
resource utilization (99.2%). The approach is applicable to any SIMD enabled processor and matches well the trend toward wider
SIMD units in contemporary architecture design. 相似文献