期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

陈辉龚浩张燕忠《计算机测量与控制》2004,12(12):1222-1225

模板匹配是进行滤波、边缘检测、目标识别和图像匹配的一种基本和有效的方法。但是模板匹配是一种密集型运算，在单处理机上实现耗时较多，但是如采用并行阵列计算机，硬软件成本也会相应提高。所幸Intel处理器提供了MMX／SSE／SSE2指令集，支持指令级SIMD操作。可将模板匹配主要运算部分进行SIMD并行化，在Linux平台下编程实现单处理机上的并行处理。测试结果表明：SIMD大大加快了模板匹配的速度。相似文献

2.

利用流SIMD扩展加速3D曲线网格的流线计算 总被引：4，自引：0，他引：4

张文李晓梅《计算机学报》2001,24(8):785-790

流线是一种基本的流场可视化技术,计算流线要耗费大量时间,Intel处理器（Pentium Ⅲ,Pentium4）提供流SIMD扩展（SSE）,支持指令级SIMD操作。3D曲线网格上的流线计算包含速度插值、数值积分、点定位等主要子过程,具有很高的内在SIMD并行性。通过将数据按SSE数据类型组织以及对主要子过程进行SIMD并行化,设计了线流计算的SSE算法。采用向量类库、嵌入汇编两种SSE编码方式分别实现SSE算法,并依据处理器的体系结构优化代码。测试结果表明：SSE大大加速了3D曲线网格的流线计算,向量类库方式比传统计算提高55％左右的性能,嵌入汇编提高75％左右。相似文献

3.

一种基于SIMD技术的快速并行代数重建算法 总被引：2，自引：0，他引：2

下载免费PDF全文

刘远张定华赵歆波毛海鹏刘晓鹏《中国图象图形学报》2007,12(1):73-77

代数重建算法是解决非完全投影数据重建的有效方法,尤其在对于超出探测器尺寸范围的大型零件的无损检测中已成为最有力的关键技术,但以往算法计算量较大、耗时较长。为了快速地进行代数重建,提出了一种基于Intel处理器单指令多数据(single instruction multiple data,SIMD)技术[2]的快速并行算法,并在充分分析代数重建公式特点的基础上,设计了一套便于并行化运算的数据结构及计算流程,其在运算中可一次性加载多个打包数据,利用MMX(multimedia extension)、SSE(streaming SIMD extension)和SSE2指令完成SIMD方式计算。通过仿真实验证明,该算法在达到同样精度的前提下,不仅提高了重建速度(加速比4倍),解决了传统代数重建算法运算速度慢的瓶颈问题,并且能够较好地重建部分数据缺失的投影图像,该算法对于航空航天大型零部件的无损检测具有重要的理论意义及工程应用价值。相似文献

4.

基于SIMD技术的σ LFSR*

曾光王政韩文报《计算机应用研究》2008,25(8):2434-2427

σ线性反馈移位寄存器(σLFSR)是一类适合软件快速实现的新型反馈移位寄存器。结合第二代单指令多数据流扩展指令集SSE2,设计了一类基于SIMD技术的σLFSR。这类σLFSR充分利用SSE2提供的128 bit整数数据结构及其操作,获得了非常高的软件实现效率,同时其输出序列达到了最大周期并具有良好的随机性。所得结论表明这类基于SIMD技术的σLFSR可以作为适合软件实现的高速序列密码驱动部分。相似文献

5.

基于SIMD技术的σ-LFSR

曾光王政韩文报《计算机应用研究》2008,25(8)

σ-线性反馈移位寄存器(σ-LFSR)是一类适合软件快速实现的新型反馈移位寄存器。结合第二代单指令多数据流扩展指令集SSE2,设计了一类基于SIMD技术的σ-LFSR。这类σ-LFSR充分利用SSE2提供的128bit整数数据结构及其操作,获得了非常高的软件实现效率,同时其输出序列达到了最大周期并具有良好的随机性。所得结论表明这类基于SIMD技术的σ-LFSR可以作为适合软件实现的高速序列密码驱动部分。相似文献

6.

软件SIMD的研究及应用 总被引：1，自引：0，他引：1

下载免费PDF全文

卜士喜竺红卫《计算机工程》2010,36(19):53-55

介绍软件SIMD技术,在不支持SIMD架构的处理器上使用该技术实现寄存器高低字节的并行运算,提高处理器的速度。软件SIMD包括基本的加减法运算、乘法运算和点积运算。在现有研究的基础上,解决包含负数的点积运算、复数运算中应用软件SIMD技术的问题,使其能广泛应用于数字信号处理等领域。相似文献

7.

SIMD指令集设计空间的形式化描述

李春江徐颖黄娟娟杨灿群《计算机科学》2013,40(6):32-36

SIMD (Single-Instruction-Multiple-Data)并行体系结构在现代处理器体系结构中扮演非常重要的角色.SI MD指令集已经成为处理器指令集中重要的子集.SIMD结构和指令集实现了短向量并行处理能力,SIMD指令集实现了对多种数据类型、多种操作模式的支持.采用形式化的方法,描述SIMD指令集的设计空间,从多个正交的维度刻画SIMD指令集的设计,基于此详细讨论了SIMD指令集的设计问题.该形式化方法有益于对SIMD指令集体系结构的分析和设计. 相似文献

8.

PentiumⅢ SSE指令及其在语音编码中的应用

郭武刘庆锋《数字社区&智能家居》1999,(8):44-45

一、PentiumⅢ及其SSE指令今年,Intel推出了其新一代处理器PentiumⅢ.PentiumⅢ又称为Katmai,这款新处理器使用与最新的PentiumⅡ同样的100MHz总线,同样的512K半速L2缓存,以及同样的Slot 1封装,给人的感觉是PentiumⅢ仅仅在时钟频率上提升到了500MHz(比最快的Pentium Ⅱ高50MHz),而且在普通的应用中好象PentiumⅢ比Pentium Ⅱ也快得不是很多,最多也就是百分之几.难道Pentium Ⅲ就没有比Pentium Ⅱ明显强大的地方吗?实际上不是如此,Pentium Ⅲ的优势就在于其新的特性:Intel SSE(Streaming SIMD Extensions,数据流单指令多数据扩展,原来被称为KNI(Katmai New Instructions)).SSE提供的指令集和微结构扩展使那些支持SSE的软件的运行效率得到提升,特别是与浮点运算有密切关系的软件.给我的感觉Intel SSE就是一套浮点的MMX指令,一套增强版本的单指令多数据流指令集合.SSE可以大大地增强PentiumⅢ的浮点运算效率,而浮点运算在3D几何学以及其它高端多媒体处理中扮演着重要的角色. 相似文献

9.

K元2—立方体网络SIMD计算机图像模板匹配并行算法 总被引：5，自引：0，他引：5

李俊山沈绪榜《计算机学报》2001,24(11):1196-1201

模板匹配是进行虑波、边缘检测、目标识别和图像匹配的一种基本和有效的方法 .对于 N× N的图像和M× N ( M相似文献

10.

一种基于奔腾SIMD指令的快速背景提取方法 总被引：3，自引：0，他引：3

周西汉刘勃周荷琴袁非牛《计算机工程与应用》2004,40(27):81-83

论文提出一种基于Intel奔腾SIMD指令的快速背景提取方法。在一种改进的混合高斯背景模型中,Jeffrey值的计算和背景模型的更新等存在着很高的内在SIMD并行性,通过将数据按照SSE数据类型组织,实现了混合高斯背景模型的SIMD算法。实验结果表明:嵌入奔腾SIMD指令的方法比传统计算提高75%左右的性能,加速了背景提取的速度,达到了实时处理的要求,具有较大的实际应用价值。相似文献

11.

Efficient parallel algorithms for image template matching onhypercube SIMD machines

Prasanna K.V.K. Krishnan V. 《IEEE transactions on pattern analysis and machine intelligence》1989,11(6):665-669

Efficient parallel algorithms developed on hypercube SIMD (single-instruction multiple data-stream) machines for image template matching are presented. Most of these parallel algorithms are asymptotically optimal in their time complexities. These results improve the known bounds in the literature 相似文献

12.

A parallel algorithm for graph matching and its MasParimplementation

Allen R. Cinque L. Tanimoto S. Shapiro L. Yasuda D. 《Parallel and Distributed Systems, IEEE Transactions on》1997,8(5):490-501

Search of discrete spaces is important in combinatorial optimization. Such problems arise in artificial intelligence, computer vision, operations research, and other areas. For realistic problems, the search spaces to be processed are usually huge, necessitating long computation times, pruning heuristics, or massively parallel processing. We present an algorithm that reduces the computation time for graph matching by employing both branch-and-bound pruning of the search tree and massively-parallel search of the as-yet-unpruned portions of the space. Most research on parallel search has assumed that a multiple-instruction-stream/multiple-data-stream (MIMD) parallel computer is available. Since massively parallel stream (SIMD) computers are much less expensive than MIMD systems with equal numbers of processors, the question arises as to whether SIMD systems can efficiently handle state-space search problems. We demonstrate that the answer is yes, and in particular, that graph matching has a natural and efficient implementation on SIMD machines 相似文献

13.

用非线性存储方案设计SIMD计算机的一般方法

佟冬方滨兴胡铭曾《计算机研究与发展》2000,37(2):194-200

非线性存储方案能在处理单元数等于存储体数的情况下,使ＳＩＭＤ机实现多种访存模式无冲突,提高其整体性能,文中提出一种用线性存储方案设计ＳＩＭＤ一般方法,在存储方案给定的前提下,针对有限的模板集设计出同时满足存储器访问无冲突和互联网的并行结构,首先,用布尔向量空间表示模板,并指出模板与ＬＣ置换的对应关系,在此基础上,提出设计局部地址生成逻辑和增强的间接二进制Ｎ方体网络的方法,由于板集中任意的访存方式相似文献

14.

Parallel Algorithms for Image Template Matching on Hypercube SIMD Computers 总被引：1，自引：0，他引：1

Fang Z Li X Ni LM 《IEEE transactions on pattern analysis and machine intelligence》1987,(6):835-841

This correspondence presents several parallel algorithms for image template matching on an SIMD array processor with a hypercube interconnection network. For an N by N image and an M by M window, the time complexity is reduced from O(N2M2) for the serial algorithm to O(M2/K2 + M * log2 N/K + log2 N * log2 K) for the N2K2-PE system (1 ? K ? M), or to O(N2M2/L2) for the L2-PE system (L ? N). With efficient use of the inter-PE communication network, each PE requires only a small local memory, many unnecessary data transmissions are eliminated, and the time complexity is greatly reduced. 相似文献

15.

基于条件分类的控制流向量化

孙回回赵荣彩高伟李雁冰《计算机科学》2015,42(11):240-247

现代编译器越来越依赖SIMD指令来提高向量化性能,但控制流的复杂性严重阻碍了SIMD向量化的发掘。现有的控制流向量化方法对于单层控制流的向量化很有效,但对嵌套等复杂控制流无法取得令人满意的效果。因此,提出了一种基于条件分类的控制流向量化方法。该方法对条件为循环不变量的控制流,以层次遍历的顺序实施IF外提;对条件为循环变量的控制流,结合语句匹配和条件合并递归地进行IF转换,生成相应的SIMD指令,从而实现嵌套控制流的向量化。实验结果表明,该方法能够有效消除循环中的嵌套控制流,提高向量化发掘的能力, 有效提升测试程序的性能。相似文献

16.

Using visualisation as a tool for model-based recognition

Martin Usoh Hilary Buxton 《The Visual computer》1993,9(7):381-400

The aim of the work reported here is to build a useful toolset for 3D model-based vision on an SIMD parallel machine, the AMT DAP. Included in the toolset are facilities for model specification, manipulation and rendering using a ray-tracing approach as well as model recognition and validation using a geometrical-matching approach. In particular, an SIMD parallel version of a ray tracer and an SIMD parallel version of a bottom-up geometrical matcher are described. The ray tracer can render constructive solid geometry models and incorporates spatial subdivision of the scene. The matcher uses edge primitives recovered from scenes to match to model edges using local constraints and deals with spurious data using bin assignments. The overall toolset is illustrated by its use in closed-form testing and refinement, where the models, camera geometry and frame-to-frame motion in an image sequence generated by the ray tracer are known, but are checked and validated using geometrical matching, recognition and localisation. 相似文献

17.

图象理解SIMD计算机的设计技术

李俊山李莉沈绪榜焦康《小型微型计算机系统》2002,23(9):1129-1132

本文结合LS MPP嵌入式计算机图象匹配算法的设计与编程实践，对面向图象理解应用的SIMD计算机减少存储类指令及其执行周期，控制寄存器的数量需求、基地址控制寄存器内容的保护，缓冲器的有关设计及图象阵列数据类型的设立等设计问题进行了分析和讨论，提出的一些设计思想对于面向图象理解的类似SIMD计算机的设计具有较为重要的参考价值。相似文献

18.

Low-level image analysis tasks on fine-grained tree-structured SIMD machines

Hussein A. H. Ibrahim John R. Kender David Elliot Shaw 《Journal of Parallel and Distributed Computing》1987,4(6)

This paper examines the applicability of fine-grained tree-structured SIMD machines, which are amenable to highly efficient VLSI implementation, to several low-level image understanding tasks. Algorithms are presented for histogramming, thresholding, image correlation, connected component labeling, and computing Euler number. A particular massively parallel machine called NON-VON is used for purposes of explication and performance evaluation. Only NON-VON tree-structured communication capabilities and its SIMD mode of execution are considered in this paper. Novel algorithmic techniques are described, such as vertical pipelining, subproblem partitioning, associative matching, and data duplication, that effectively exploit the massive parallelism available in fine-grained SIMD tree machines while avoiding communication bottlenecks. Simulation results are presented and compared with results obtained or forecast for other highly parallel machines. The relative advantages and limitations of the class of machines under consideration are outlined; except for some types of image correlation, the fine-grained SIMD tree is exceptionally fast. 相似文献

19.

Top-Performance Tokenization and Small-Ruleset Regular Expression Matching

Daniele Paolo Scarpazza 《International journal of parallel programming》2011,39(1):3-32

In the last decade, the volume of unstructured data that Internet and enterprise applications create and consume has been growing at impressive rates. The tools we use to process these data are search engines, business analytics suites, natural-language processors and XML processors. These tools rely on tokenization, a form of regular expression matching aimed at extracting words and keywords in a character stream. The further growth of unstructured data-processing paradigms depends critically on the availability of high-performance tokenizers. Despite the impressive amount of parallelism that the multi-core revolution has made available (in terms of multiple threads and wider SIMD units), most applications employ tokenizers that do not exploit this parallelism. I present a technique to design tokenizers that exploit multiple threads and wide SIMD units to process multiple independent streams of data at a high throughput. The technique benefits indefinitely from any future scaling in the number of threads or SIMD width. I show the approach’s viability by presenting a family of tokenizer kernels optimized for the Cell/B.E. processor that deliver a performance seen, so far, only on dedicated hardware. These kernels deliver a peak throughput of 14.30 Gbps per chip, and a typical throughput of 9.76 Gbps on Wikipedia input. Also, they achieve almost-ideal resource utilization (99.2%). The approach is applicable to any SIMD enabled processor and matches well the trend toward wider SIMD units in contemporary architecture design. 相似文献