首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
提出通过利用SIMD技术强大的运算能力和并行能力,有效地提高DRC效率的方法.实验表明经过利用SIMD方法优化后的DRC算法的效率,是经编译器优化DRC的2倍左右,与原始的C代码DRC算法相比有一个数量级的提高.  相似文献   

2.
Oblivious transfer (OT) is one of the most fundamental primitives in cryptography and is widely used in protocols for secure two-party and multi-party computation. As secure computation becomes more practical, the need for practical large-scale OT protocols is becoming more evident. OT extensions are protocols that enable a relatively small number of “base-OTs” to be utilized to compute a very large number of OTs at low cost. In the semi-honest setting, Ishai et al. (Advances in cryptology—CRYPTO’03, vol 2729 of LNCS, Springer, 2003) presented an OT extension protocol for which the cost of each OT (beyond the base-OTs) is just a few hash function operations. In the malicious setting, Nielsen et al. (Advances in cryptology—CRYPTO’12, vol 7417 of LNCS, Springer, 2012) presented an efficient OT extension protocol for the setting of malicious adversaries that is secure in a random oracle model. In this work, we improve OT extensions with respect to communication complexity, computation complexity, and scalability in the semi-honest, covert, and malicious model. Furthermore, we show how to modify our maliciously secure OT extension protocol to achieve security with respect to a version of correlation robustness instead of the random oracle. We also provide specific optimizations of OT extensions that are tailored to the use of OT in various secure computation protocols such as Yao’s garbled circuits and the protocol of Goldreich–Micali–Wigderson, which reduce the communication complexity even further. We experimentally verify the efficiency gains of our protocols and optimizations.  相似文献   

3.
The video compression algorithms based on the 3D wavelet transform obtain excellent compression rates at the expense of huge memory requirements, that drastically affects the execution time of such applications. Its objective is to allow the real-time video compression based on the 3D fast wavelet transform. We show the hardware and software interaction for this multimedia application on a general-purpose processor. First, we mitigate the memory problem by exploiting the memory hierarchy of the processor using several techniques. As for instance, we implement and evaluate the blocking technique. We present two blocking approaches in particular: cube and rectangular, both of which differ in the way the original working set is divided. We also put forward the reuse of previous computations in order to decrease the number of memory accesses and floating point operations. Afterwards, we present several optimizations that cannot be applied by the compiler due to the characteristics of the algorithm. On the one hand, the Streaming SIMD Extensions (SSE) are used for some of the dimensions of the sequence (y and time), to reduce the number of floating point instructions, exploiting Data Level Parallelism. Then, we apply loop unrolling and data prefetching to specific parts of the code. On the other hand, the algorithm is vectorized by columns, allowing the use of SIMD instructions for the y dimension. Results show speedups of 5x in the execution time over a version compiled with the maximum optimizations of the Intel C/C++ compiler, maintaining the compression ratio and the video quality (PSNR) of the original encoder based on the 3D wavelet transform. Our experiments also show that, allowing the compiler to perform some of these optimizations (i.e. automatic code vectorization), causes performance slowdown, demonstrating the effectiveness of our optimizations.Special Issue on Media and Communication Applications on General Purpose Processors: Hardware and Software Issues/Journal of VLSI Signal Processing Systems/Dr. Eric Debes, (Lead) Guest Editor. Contact Author: Gregorio Bernabé.Gregorio Bernabé was born in Antibes (Alpes Maritimos, France) on 21 November 1974. He received the M.S. in Computer Science from the University of Murcia (Spain) in 1997. In 1998, he joined the Computer Engineering Department of the University of Murcia, where he is an Assistant Professor as well as a Ph. D. candidate. His current research interests include video compression using the Wavelet Transform, and the development of optimizations to improve the performance of the video compression algorithms based on the 3D wavelet transform.Jose M. Garcia was born in Valencia, Spain on 9 January, 1962. He received the MS and the PhD degrees in electrical engineering from the Technical University of Valencia (Valencia, Spain), in 1987 and 1991, respectively. In 1987 he joined the Computer Science Department at the University of Castilla-La Mancha at the Campus of Albacete (Spain). From 1987 to 1993, he was an Assistant Professor of Computer Architecture. In 1994 he became an Associate Professor at the University of Murcia (Spain). From 1995 to 1997 he served as Vice-Dean of the School of Computer Science. At present, he is the Director of the Computer Engineering Department, and also the Head of the Research Group on Parallel Computing and Architecture. He has developed several courses on Computer Structure, Peripheral Devices, Computer Architecture and Multicomputer Design. His current research interests include Multiprocessors Systems, Interconnection Networks, File Systems, Grid Computing and its Application in Multimedia Systems. He has published over 45 refereed papers in different Journals and Conferences in these fields. Dr. Garcia is a member of several international associations as IEEE Computer Society, ACM, USENIX, and also a member of some European associations (Euromicro and ATI).Pepe Gonzalez received the M.S. and Ph.D. degrees from the Universitat Politecnica de Catalunya (UPC). In January 2000, he joined the Computer Engineering Department of the University of Murcia, Spain, and became an Associate Professor in June 2001. In March 2002, he joined the Intel Barcelona Research Center, where he is a Senior Researcher. Currently, Pepe is working in new paradigms for the IA-32 family, in particular, Thermal-and Power-Aware clustered microarchitectures. pepe.gonzalez@intel.com  相似文献   

4.
This paper proposes an exponentiation method with Frobenius mappings. The main target is an exponentiation in an extension field. This idea can be applied for scalar multiplication of a rational point of an elliptic curve defined over an extension field. The proposed method is closely related to so‐called interleaving exponentiation. Unlike interleaving exponentiation methods, it can carry out several exponentiations of the same base at once. This happens in some pairing‐based applications. The efficiency of using Frobenius mappings for exponentiation in an extension field was well demonstrated by Avanzi and Mihailescu. Their exponentiation method efficiently decreases the number of multiplications by inversely using many Frobenius mappings. Compared to their method, although the number of multiplications needed for the proposed method increases about 20%, the number of Frobenius mappings becomes small. The proposed method is efficient for cases in which Frobenius mapping cannot be carried out quickly.  相似文献   

5.
This contribution focuses on a class of Galois field used to achieve fast finite field arithmetic which we call an Optimal Extension Field (OEF), first introduced in [3]. We extend this work by presenting an adaptation of Itoh and Tsujii's algorithm for finite field inversion applied to OEFs. In particular, we use the facts that the action of the Frobenius map in GF (p m ) can be computed with only m-1 subfield multiplications and that inverses in GF (p) may be computed cheaply using known techniques. As a result, we show that one extension field inversion can be computed with a logarithmic number of extension field multiplications. In addition, we provide new extension field multiplication formulas which give a performance increase. Further, we provide an OEF construction algorithm together with tables of Type I and Type II OEFs along with statistics on the number of pseudo-Mersenne primes and OEFs. We apply this new work to provide implementation results using these methods to construct elliptic curve cryptosystems on both DEC Alpha workstations and Pentium-class PCs. These results show that OEFs when used with our new inversion and multiplication algorithms provide a substantial performance increase over other reported methods. Received 7 July 1999 and revised 29 March 2000 Online publication 15 September 2000  相似文献   

6.
7.
This paper presents a wide range of algorithms and architectures for computing the 1D and 2D discrete wavelet transform (DWT) and the 1D and 2D continuous wavelet transform (CWT). The algorithms and architectures presented are independent of the size and nature of the wavelet function. New on-line algorithms are proposed for the DWT and the CWT that require significantly small storage. The proposed systolic array and the parallel filter architectures implement these on-line algorithms and are optimal both with respect to area and time (under the word-serial model). Moreover, these architectures are very regular and support single chip implementations in VLSI. The proposed SIMD architectures implement the existing pyramid and a'trous algorithms and are optimal with respect to time  相似文献   

8.
This paper is a brief introduction to a new class of computers, the reconfigurable massively parallel computer. Its most distinguishing feature is the utilization of the reconfigurability of the interconnection network to establish a network topology well mapped to the algorithm communication graph so that higher efficiency can be achieved, and to remove faulty processors from the network so that the system operation can be kept uninterrupted while maintaining the same or slightly degraded efficiency. Several existing reconfigurable single instruction multiple data (SIMD) parallel architectures and their reconfiguration mechanism are described, the effectiveness of algorithm mapping, through reconfiguration, is demonstrated, and fault-tolerant schemes via reconfiguration are discussed  相似文献   

9.
LS SIMD计算机的并行技术   总被引:2,自引:0,他引:2  
文章主要讨论了LSSIMD计算机中所采用的并行技术数据并行技术、三级指令流水线并行技术与三组指令并行执行技术。  相似文献   

10.
In case of dynamic spectrum access networks, how to efficiently utilize the dynamically available bandwidth is very important to enhance the performance of the networks. In this paper, we propose an Error Adaptive MAC protocol which adaptively changes its transmission mode according to the channel status. Using the cognitive radio technology, additional channels are assumed to be randomly available for data transmission. When the channel error rate is relatively high, those additional channels are utilized for error recovery; otherwise, the extra channels can be used to increase the throughput if the wireless medium is stable and reliable. We formulate an analytical model to study the dynamics of our adaptive MAC protocol, and using simulation, show our proposed method can significantly enhance the throughput of dynamic spectrum access networks.  相似文献   

11.
The sixth-generation(6G) networks will consist of multiple bands such as low-frequency, midfrequency, millimeter wave, terahertz and other bands to meet various business requirements and networking scenarios. The dynamic complementarity of multiple bands are crucial for enhancing the spectrum efficiency, reducing network energy consumption, and ensuring a consistent user experience. This paper investigates the present researches and challenges associated with deployment of multi-band integrated networks in existing infrastructures. Then, an evolutionary path for integrated networking is proposed with the consideration of maturity of emerging technologies and practical network deployment. The proposed design principles for 6G multi-band integrated networking aim to achieve on-demand networking objectives, while the architecture supports full spectrum access and collaboration between high and low frequencies. In addition,the potential key air interface technologies and intelligent technologies for integrated networking are comprehensively discussed. It will be a crucial basis for the subsequent standards promotion of 6G multi-band integrated networking technology.  相似文献   

12.
Space diversity combining is a well-known method of smoothing amplitude fluctuations of the received signal in Rayleigh fading environments, such as mobile radio. Perhaps less well known is that space diversity combining can also be an excellent method of combating cochannel interference. In this paper, it is shown that high spectrum efficiencies in mobile radio systems can be achieved with a modest number of space diversity branches. With a large number of diversity branches it is shown that frequency reuse is possible resulting in spectrum efficiencies, as defined herein, greater than 100 percent.  相似文献   

13.
文章介绍了一种新的嵌入式SIMD协处理器地址产生器.该地址产生器主要完成地址计算和协处理器指令的场抽取功能.为了提高协处理器的性能,地址产生器中设计了新的传送路径.该传送路径能够不通过地址产生器中的ALU而把数据送入寄存器中,这个传送路径能够减少ldN指令的一个延迟周期.在SMIC0.18微米标准库单元下,该地址产生器的延迟能够满足周期为10ns的协处理器.  相似文献   

14.
DSP在手机、音乐播放器和其他消费品中的应用,直接关系着系统的功能与价格.在适当的价位上,DSP必需提供足够的功能满足当前需求,并且有充裕的可扩展性和空间,以便设计人员对硬件无需大动干戈,便能为系统添加新功能或强化现有功能.  相似文献   

15.
A smart-sensor VLSI circuit suitable for focal-plane low-level image processing applications is presented. The architecture of the device is based on a fine-grain software-programmable SIMD processor array. Processing elements, integrated within each pixel of the imager, are implemented utilising a switched-current analog microprocessor concept. This allows the achievement of real-time image processing speeds with high efficiency in terms of silicon area and power dissipation. The prototype 21 /spl times/ 21 vision chip is fabricated in a 0.6 /spl mu/m CMOS technology and achieves a cell size of 98.6 /spl mu/m /spl times/ 98.6 /spl mu/m. It executes over 1.1 giga instructions per second (GIPS) while dissipating under 40 mW of power. The architecture, circuit design and experimental results are presented in this paper.  相似文献   

16.
Scientific modeling with massively parallel SIMD computers   总被引:1,自引:0,他引:1  
A number of scientific models are discussed that possess a high degree of inherent parallelism. For simulation purposes this is exploited by employing a massively parallel SIMD (single instruction multiple data) computer. The authors describe one such computer, the distributed array processor (DAP), and discuss the optimal mapping of a typical problem onto the computer architecture to best exploit the model parallelism. By focusing on specific models currently under study, they exemplify the types of problems which benefit most from a parallel implementation. The extent of this benefit is considered relative to implementation on a machine of conventional architecture  相似文献   

17.
数据并行应用具有规则的结构,该结构可以描述为一系列算术运算函数在数据流上的操作,对应于此应用的SIMD体系结构,可以充分利用这种规则性来提高性能,然而,含有数据相关控制结构的应用,在SIMD体系结构上的执行效率却很低,将数据相关的控制结构转化成数据传输,就可以使含有数据相关控制流的应用在SIMD体系结构上高效执行,这种转化技术使更多的应用可以在SIMD体系结构上高效运行.  相似文献   

18.
Computation time for various primitive operations, such as broadcasting and global sum, can significantly increase when there are node failures in a hypercube. In this paper we develop nearly optimal algorithms for computing important basic problems on a faulty SIMD hypercube. In an SIMD hypercube, during a communication step, nodes can exchange information with their neighbors only across a specific dimension. Our parallel machine model is an n-dimensional SIMD hypercube Q n with up to n-1 node faults. In an SIMD hypercube, during a communication step, nodes can exchange information with their neighbors only across a specific dimension. We use the concept of free dimension to develop our algorithms, where a free dimension is defined to be a dimension i such that at least one end node of any i-dimension link is nonfaulty. In an n-cube, with f < n faults, it is known that there exist n-f+1 free dimensions. Using free dimensions, we show that broadcasting and global sum can be performed in n+5 steps, thereby improving upon the previously known algorithms for these primitives. The broadcasting algorithms work independent of the location of source node and faulty nodes. This revised version was published online in June 2006 with corrections to the Cover Date.  相似文献   

19.
算术SIMD模块是我们自主设计的高性能DSP中的关键模块.基于0.13微米工艺,提出了改进的SIMD指令实现算法,设计实现了算术SIMD模块的电路与版图,根据指令特点,提出了结果产生的两级选择结构,采用有限动态电路设计了SIMD加法器和比较判零子模块.用Nanosim工具实现了版图后模拟及时序分析的自动化,版图后延时控制在750ps以内,满足了高性能DSP芯片的时序要求.  相似文献   

20.
A four-processor chip, for use in processor arrays for image computations, is described. The large degree of data parallelism available in image computations allows dense array implementations where all processors operate under the control of a single instruction stream. An instruction decoder shared by the four processors on the chip minimizes the pin count allocated for global control of the processors. The chip incorporates an interface to an external SRAM (static RAM) for memory expansion without glue chips. The full-custom 2-μm CMOS chip contains 56669 transistors and runs instructions at 10 MHz. Five hundred and twelve 16-b processors and 4 Mbyte of distributed external memory fit on two industry standard cards to yield 5-billion instructions per second peak throughout. As image I/O can overlap perfectly with pixel computation, an array containing 128 of these chips can provide more than 600 16-b operations per pixel on 512×512 images at 30 Hz  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号