期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

DATA BYPASSING ARCHITECTURE AND CIRCUIT DESIGN FOR 32-BIT DIGITAL SIGNAL PROCESSOR

Chen Xiaoyi Yao Qingdong Liu Peng 《电子科学学刊(英文版)》2005,22(6):640-649

This paper presents a design method of ByPassing Unit（BPU） in 32-bit Digital Signal Processor（DSP）-MD32. MD32 is realized in 0.18 μm technology, 1.8V and 200 MHz working clock. It focuses on the Reduced Instruction Set Computer（RISC） architecture and DSP computation capability thoroughly, extends DSP with various addressing modes in a customized DSP pipeline stage architecture. The paper also discusses the architecture and circuit design of bypassing logic to fit MD32 architecture. The parallel execution of BPU with instruction decode in architecture level is applied to reduce time delay. The optimization of circuit that serial select with priority is analyzed in detail, and the result shows that about half of time delay is reduced after this optimization. Examples show that BPU is useful for improving the DSP＇s performance. The forwarding logic in MD32 realizes 8 data channels feedback and meets the working clock limit. 相似文献

2.

A Parallel-Based Lifting Algorithm and VLSI architecture for DWT

Xiong Chengyi Tian Jinwen Liu Jian Gao Zhirong 《电子科学学刊(英文版)》2006,23(2):244-248

A novel Parallel-Based Lifting Algorithm （PBLA） for Discrete Wavelet Transform （DWT）, exploiting the parallelism of arithmetic operations in all lifting steps, is proposed in this paper. It leads to reduce the critical path latency of computation, and to reduce the complexity of hardware implementation as well. The detailed derivation on the proposed algorithm, as well as the resulting Very Large Scale Integration （VLSI） architecture, is introduced, taking the 9/7 DWT as an example but without loss of generality. In comparison with the Conventional Lifting Algorithm Based Implementation （CLABI）, the critical path latency of the proposed architecture is reduced by more than half from （4Tm ＋ 8Ta）to Tm ＋ 4Ta, and is competitive to that of Convolution-Based Implementation （CBI）, but the new implementation will save significantly in hardware. The experimental results demonstrate that the proposed architecture has good performance in both increasing working frequency and reducing area. 相似文献

3.

Elliptic Curve Point Multiplication by Generalized Mersenne Numbers

Tao Wu Li-Tian Liu 《中国电子科技》2012,10(3):199-208

Montgomery modular multiplication in the residue number system （RNS） can be applied for elliptic curve cryptography. In this work, unified modular multipliers over generalized Mersenne numbers are proposed for RNS Montgomery modular multiplication, which enables efficient elliptic curve point multiplication （ECPM）. Meanwhile, the elliptic curve arithmetic with ECPM is performed by mixed coordinates and adjusted for hardware implementation. In addition, the conversion between RNS and the binary number system is also discussed. Compared with the results in the literature, our hardware architecture for ECPM demonstrates high performance. A 256-bit ECPM in Xilinx XC2VP100 field programmable gate array device （FPGA） can be performed in 1.44 ms, costing 22147 slices, 45 dedicated multipliers, and 8.25K bits of random access memories （RAMs）. 相似文献

4.

A scalable hybrid modular multiplication algorithm

Meng Qiang Chen Tao Dai Zibin Chen Quji 《电子科学学刊(英文版)》2008,25(3):378-383

Based on the analysis of several familiar large integer modular multiplication algorithms, this paper proposes a new Scalable Hybrid modular multiplication （SHyb） algorithm which has scalable operands, and presents an RSA algorithm model with scalable key size. Theoretical analysis shows that SHyb algorithm requires m^2n/2 ＋ 2m iterations to complete an mn-bit modular multiplication with the application of an n-bit modular addition hardware circuit. The number of the required iterations can be reduced to a half of that of the scalable Montgomery algorithm. Consequently, the application scope of the RSA cryptosystem is expanded and its operation speed is enhanced based on SHyb algorithm. 相似文献

5.

一种高速雪崩光电探测器

李彬杨晓红尹伟红吕倩倩崔荣韩勤《半导体学报》2014,35(7):074009-5

High-speed avalanche photodiodes are widely used in optical communication systems. Nowadays, separate absorption charge and multiplication structure is widely adopted. In this article, a structure with higher speed than separate absorption charge and multiplication structure is reported. Besides the traditional absorption layer, charge layer and multiplication layer, this structure introduces an additional charge layer and transit layer and thus can be referred to as separate absorption, charge, multiplication, charge and transit structure. The introduction of the new charge layer and transit layer brings additional freedom in device structure design. The benefit of this structure is that the carrier transit time and device capacitance can be reduced independently, thus the 3 dB bandwidth could be improved by more than 50% in contrast to the separate absorption charge and multiplication structure with the same size. 相似文献

6.

A Novel Receiver Architecture for DBF Antenna Array

ZHENG Sheng-hua XU Da-zhuan JIN Xue-ming 《中国电子科技》2007,5(1):33-37

The developments of the high speed analog to digital converters （ADC） and advanced digital signal processors （DSP） make the smart antenna with digital beamforming （DBF） a reality. In conventional M-elements array antenna system, each element has its own receiving channel and ADCs. In this paper, a novel smart antenna receiver with digital beamforming is proposed. The essential idea is to realize the digital beamforming receiver based on bandpass sampling of multiple distinct intermediate frequency （IF） signals. The proposed system reduces receiver hardware from M IF channels and 2M ADCs to one IF channel and one ADC using a heterodyne radio frequency （RF） circuitry and a multiple bandpass sampling digital receiver. In this scheme, the sampling rate of the ADC is much higher than the summation of the M times of the signal bandwidth. The local oscillator produces different local frequency for each RF channel The receiver architecture is presented in detail, and the simulation of bandpass sampling of multiple signals and digital down conversion to baseband is given. The principle analysis and simulation results indicate the effectiveness of the new proposed receiver. 相似文献

7.

Research on grouping strategy of SIP-based streaming media P2P live broadcast network

Hui Liu Yongfeng Huang Xing Li 《电子科学学刊(英文版)》2008,25(3):364-371

The rapid development of Internet has led to the explosion of information sharing, and how to supervise the sharing is a main research topic on current Internet. Aiming at the disadvantage that the current Peer-to-Peer （P2P） is hard to manage and control, this paper presents a Session Initial Protocol （SIP）-based P2P network of three-level architecture. SIP middleware is introduced to the middle level of the three-layer architecture. By the connection function of the SIP signaling, the P2P transmission on media-level can be controlled. Using SIP＇s register and authentication function, the manage layer can manage the whole P2P network. Based on the aforementioned architecture, this paper investigates the grouping strategy on a live broadcast application in P2P network. Combined with the function of SIP register, the paper works on several grouping strategies, sets up models to manage users by grouping them, presents a weight-based K-means IP address grouping algorithm, and realizes it. The experiment shows that the grouping strategy presented in this paper can solve the problem of group sharing of network resource, and can realize the efficient-sharing, reasonable-distributing of network resource 相似文献

8.

A Novel Architecture of Special Arithmetic Function Unit for Area-Efficient Programmable Vertex Shader

CHANG Yisong WEI Jizeng ZHAO Guoyu GUO Wei SUN Jizhou 《电子学报:英文版》2013,(3):483-488

A novel architecture of high precision, floating-point special Arithmetic function unit （SFU） for elementary transcendental functions is presented in this paper to provide area efficiency as well as high performance for programmable vertex shader. From the architecture point of view, the evaluation of quadratic approximation for special functions is performed by sharing the SIMD vector unit in shader architecture to minimize processing latency and to reduce area cost in SFU. An optimized minimax approach is proposed as well to obtain the finite-length and normalized quadratic coefficients for high precision. The experiment result shows that the proposed SFU can significantly reduce area cost and by adopting the proposed SFU, a vertex shader with Transport triggered architecture （TTA） can achieve 15.0% improvement on average in performance/area ratio for various shading benchmarks. 相似文献

9.

A Dormant Multi-Controller Model for Software Defined Networking

FU Yonghong ;BI Jun ;WU Jianping ;CHEN Ze ;WANG Ke ;LUO Min 《中国通信》2014,(3):45-55

In order to improve the scalability and reliability of Software Defined Networking （SDN）, many studies use multiple controllers to constitute logically centralized control plane to provide load balancing and fail over. In this paper, we develop a flexible dormant multi-controller model based on the centralized multi-controller architecture. The dormant multi-controller model allows part of controllers to enter the dormant state under light traffic condition for saving system cost. Meanwhile, through queueing analysis, various performance measures of the system can be obtained. Moreover, we analyze the real traffic of China Education Network and use the results as the parameters of computer simulation and verify the effects of parameters on the system characteristics. Finally, a total expected cost function is established, and genetic algorithm is employed to find the optimal values of various parameters to minimize system cost for the deployment decision making. 相似文献

10.

FPGA Implementation of FFT Algorithm for OFDM Based IEEE 802.16d （Fixed WiMAX） Communications

K. Harikrishna ;T. Rama Rao ;Vladimir A. Labay 《中国电子科技》2010,8(3):193-199

The IEEE 802.16d communication standard uses orthogonal frequency division multiplexing （OFDM）. In the widely used OFDM systems, the fast Fourier transform （FFT） and inverse fast Fourier transform pairs are used to modulate and demodulate the data constellation on the sub-carriers. In this paper, a high level implementation of a high performance FFT for OFDM modulator and demodulator is presented. The design has been coded in Verilog and targeted into Xilinx Spartan3 field programmable gate arrays. Radix-22 algorithm is proposed and used for the OFDM communication system. The design of the FFT is implemented and applied to fixed WiMAX--IEEE 802.16d communi- cation standard. The results are tabulated and the hardware parameters are compared. The proposed architecture is least in number of multipliers used and the memory size, and second to the least in number of adders used. 相似文献

11.

Error Detecting Dual Basis Bit Parallel Systolic Multiplication Architecture over GF(2^m)

下载免费PDF全文

Ashutosh Kumar Singh Asish Bera Hafizur Rahaman Jimson Mathew Dhiraj K. Pradhan 《电子科技学刊:英文版》2009,7(4):336-342

An error tolerant hardware efficient very large scale integration (VLSI) architecture for bit parallel systolic multiplication over dual base, which can be pipelined, is presented. Since this architecture has the features of regularity, modularity and unidirectional data flow, this structure is well suited to VLSI implementations. The length of the largest delay path and area of this architecture are less compared to the bit parallel systolic multiplication architectures reported earlier. The architecture is implemented using Austria Micro System's 0.35 m CMOS (complementary metal oxide semiconductor) technology. This architecture can also operate over both the dual-base and polynomial base. 相似文献

12.

A digit-serial multiplier for finite field GF(2/sup m/)

Chang Hoon Kim Chun Pyo Hong Soonhak Kwon 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(4):476-483

In this paper, an efficient digit-serial systolic array is proposed for multiplication in finite field GF(2/sup m/) using the standard basis representation. From the least significant bit first multiplication algorithm, we obtain a new dependence graph and design an efficient digit-serial systolic multiplier. If input data come in continuously, the proposed array can produce multiplication results at a rate of one every /spl lceil/m/L/spl rceil/ clock cycles, where L is the selected digit size. Analysis shows that the computational delay time of the proposed architecture is significantly less than the previously proposed digit-serial systolic multiplier. Furthermore, since the new architecture has the features of regularity, modularity, and unidirectional data flow, it is well suited to VLSI implementation. 相似文献

13.

An area-efficient bit-serial integer and GF(2) multiplier

Manfred Schimmler Hans-Werner Lang 《Microelectronic Engineering》2007,84(2):253-259

This paper presents the design of a new multiplier architecture for normal integer multiplication of positive and negative numbers as well as for multiplication in finite fields of order 2ⁿ. It has been developed to increase the performance of algorithms for cryptographic and signal processing applications on implementations of the Instruction Systolic Array (ISA) parallel computer model [M. Kunde, H.W. Lang, M. Schimmler, H. Schmeck, H. Schröder, Parallel Computing 7 (1988) 25-39, H.W. Lang, Integration, the VLSI Journal 4 (1986) 65-74]. The multiplier operates least significant bit (LSB)-first for integer multiplication and most significant bit ( )-first for finite field multiplication. It is a modular bit-serial design, which on the one hand can be efficiently implemented in hardware and on the other hand has the advantage that it can handle operands of arbitrary length. 相似文献

14.

Two systolic architectures for modular multiplication

Wei-Chang Tsai Shung C.B. Sheng-Jyh Wang 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2000,8(1):103-107

The authors present two systolic architectures to speed up the computation of modular multiplication in RSA cryptosystems. In the double-layer architecture, the main operation of Montgomery's algorithm is partitioned into two parallel operations after using the precomputation of the quotient bit. In the non-interlaced architecture, we eliminate the one-clock-cycle gap between iterations by pairing off the double-layer architecture. We compare our architectures with some previously proposed Montgomery-based systolic architectures, on the basis of both modular multiplication and modular exponentiation. The comparisons indicate that our architectures offer the highest speed, lower hardware complexity, and lower power consumption 相似文献

15.

一种GF(2~k)域的高效乘法器及其VLSI实现 总被引：2，自引：0，他引：2

周浩华沈泊章倩苓《半导体学报》2001,22(8):1063-1068

在分析全串行和全并行 GF(2 k)域乘法的基本原理基础上提出了一种适合于任意 GF(2 k)域的乘法器 UHGM(U nified Hybrid Galois Field Multiplier) .它为当前特别重要的 k为素数的 GF(2 k)域乘法 ,提供了一种高效的实现方法 .该乘法器具有结构规整、模块化好的特点 ,特别适合于 VL SI实现 ,同时这种结构具有粗粒度的面积和速度的可伸缩性 ,方便了在大范围内进行实现面积和速度的权衡 .最后给出了 GF(2 1 6 3)域上乘法器的 ASIC综合的结果相似文献

16.

VLSI architectures for vector quantization

M. Yan J. V. McCanny Y. Hu 《The Journal of VLSI Signal Processing》1995,10(1):5-23

The real time implementation of an efficient signal compression technique, Vector Quantization (VQ), is of great importance to many digital signal coding applications. In this paper, we describe a new family of bit level systolic VLSI architectures which offer an attractive solution to this problem. These architectures are based on a bit serial, word parallel approach and high performance and efficiency can be achieved for VQ applications of a wide range of bandwidths. Compared with their bit parallel counterparts, these bit serial circuits provide better alternatives for VQ implementations in terms of performance and cost. 相似文献

17.

高性能可扩展公钥密码协处理器研究与设计 总被引：1，自引：0，他引：1

下载免费PDF全文

黎明吴丹戴葵邹雪城《电子学报》2011,39(3):665-670

本文提出了一种高效的点乘调度策略和改进的双域高基Montgomery模乘算法,在此基础上设计了一种新型高性能可扩展公钥密码协处理器体系结构,并采用0.18μm 1P6M标准CMOS工艺实现了该协处理器,以支持RSA和ECC等公钥密码算法的计算加速.该协处理器通过扩展片上高速存储器和使用以基数为处理字长的方法,具有良好的可扩展性和较强的灵活性,支持2048位以内任意大数模幂运算以及576位以内双域任意椭圆曲线标量乘法运算.芯片测试结果表明其具有很好的加速性能,完成一次1024位模幂运算仅需197μs、GF(p)域192位标量乘法运算仅需225μs、GF(2^m)域163位标量乘法运算仅需200.7μs. 相似文献

18.

Conception et intégration ďun corrélateur systolique

Catherine Dezan Eric Gautrin Patrice Quinton 《电信纪事》1991,46(1-2):69-77

Many signal processing algorithms can be implemented on parallel architectures, whose regularity simplifies VLSI integration. In this paper, we present a systolic correlator, and we describe its VLSI implementation using full-custom, standard cell, and logical configurable arrays. We show how the architecture of this chip is formally derived using the Alpha language. 相似文献

19.

High performance VLSI architecture for Wave Digital Filtering

Rajinder Jit Singh J. V. McCanny 《The Journal of VLSI Signal Processing》1992,4(4):269-278

The application of fine grain pipelining techniques in the design of high performance Wave Digital Filters (WDFs) is described. It is shown that significant increases in the sampling rate of bit parallel circuits can be achieved using most significant bit (msb) first arithmetic. A novel VLSI architecture for implementing two-port adaptor circuits is described which embodies these ideas. The circuit in question is highly regular, uses msb first arithmetic and is implemented using simple carry-save adders. 相似文献

20.

An efficient tree architecture for modulo 2ⁿ+1 multiplication

Zhongde Wang G. A. Jullien W. C. Miller 《The Journal of VLSI Signal Processing》1996,14(3):241-248

Modulo 2ⁿ+1 multiplication plays an important role in the Fermat number transform and residue number systems; the diminished-1 representation of numbers has been found most suitable for representing the elements of the rings. Existing algorithms for modulo (2ⁿ+1) multiplication either use recursive modulo (2ⁿ+1) addition, or a regular binary multiplication integrated with the modulo reduction operation. Although most often adopted for largen, this latter approach requires conversions between the diminished-1 and binary representations. In this paper we propose a parallel fine-grained architecture, based on a Wallace tree, for modulo (2ⁿ+1) multiplication which does not require any conversions; the use of a Wallace tree considerably improves the speed of the multiplier. This new architecture exhibits an extremely modular structure with associated VLSI implementation advantages. The critical path delay and the hardware requirements of the new multiplier are similar to that of a correspondingn×n bit binary multiplier. 相似文献