期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Fast multiplication in VLSI using wave pipelining techniques

Fabian Klass Michael J. Flynn Ad J. Van De Goor 《The Journal of VLSI Signal Processing》1994,7(3):233-248

Wave pipelining is a design methodology that can increase the clock frequency of digital systems. Also known asmaximum-rate pipelining, it has long been considered a technique for approaching the physical speed limit of a digital circuit. Unlike conventional pipelining, wave pipelining does not require internal clocked elements to increase throughput. The synchronization of internal computations is achieved by balancing inherent RC delays of combinational logic elements, thus allowing circuits to be pipelined at a very fine-grain level. In this article, we describe the design of a 16×16 wave-pipelined multiplier using a 1.0 μm CMOS process. The multiplier is designed using a conventional static CMOS technology. Simulation results show a speedup of about 7× over a nonpipeline implementation. 相似文献

2.

Pipelined RLS adaptive filtering using scaled tangent rotations(STAR) 总被引：1，自引：0，他引：1

Raghunath K.J. Parhi K.K. 《Signal Processing, IEEE Transactions on》1996,44(10):2591-2604

The QR decomposition-based recursive least-squares (RLS) adaptive filtering algorithm (referred to as QRD-RLS) is very popular because it has good numerical properties and can be mapped onto a systolic array. However, in this architecture, pipelining of the operations within the systolic array cells is difficult. Pipelining would be necessary to operate at high speeds or to reduce the power dissipation in a VLSI implementation. Pipelining QRD-RLS using look-ahead techniques leads to a large hardware overhead. The square-root free forms of QRD-RLS are also difficult to pipeline. In this paper, a new scaled tangent rotation (STAR) is used instead of the Givens rotations used in QRD-RLS. The STAR-based RLS algorithm (referred to as STAR-RLS) is designed such that fine-grain pipelining can be accomplished with little hardware overhead The scaled tangent rotations are not exactly orthogonal transformations but tend to become orthogonal asymptotically. The STAR-RLS algorithm is square-root free and has less complexity and lower intercell communication than the QRD-RLS algorithm. The properties of the STAR-RLS algorithm, such as stability, numerical property, and dynamic range, are examined with and without pipelining and compared with those of QRD-RLS. Simulation results are presented to compare the performance of STAR-RLS and QRD-RLS algorithms 相似文献

3.

Improved algorithms for computing with faulty SIMD hypercubes

Raghavendra C.S. Sridhar M.A. 《Telecommunication Systems》1998,10(1-2):149-156

Computation time for various primitive operations, such as broadcasting and global sum, can significantly increase when there are node failures in a hypercube. In this paper we develop nearly optimal algorithms for computing important basic problems on a faulty SIMD hypercube. In an SIMD hypercube, during a communication step, nodes can exchange information with their neighbors only across a specific dimension. Our parallel machine model is an n-dimensional SIMD hypercube Q _n with up to n-1 node faults. In an SIMD hypercube, during a communication step, nodes can exchange information with their neighbors only across a specific dimension. We use the concept of free dimension to develop our algorithms, where a free dimension is defined to be a dimension i such that at least one end node of any i-dimension link is nonfaulty. In an n-cube, with f < n faults, it is known that there exist n-f+1 free dimensions. Using free dimensions, we show that broadcasting and global sum can be performed in n+5 steps, thereby improving upon the previously known algorithms for these primitives. The broadcasting algorithms work independent of the location of source node and faulty nodes. This revised version was published online in June 2006 with corrections to the Cover Date. 相似文献

4.

The n-Hop Multilateration Primitive for Node Localization Problems 总被引：1，自引：0，他引：1

Savvides Andreas Park Heemin Srivastava Mani B. 《Mobile Networks and Applications》2003,8(4):443-451

The recent advances in MEMS, embedded systems and wireless communication technologies are making the realization and deployment of networked wireless microsensors a tangible task. In this paper we study node localization, a component technology that would enhance the effectiveness and capabilities of this new class of networks. The n-hop multilateration primitive presented here, enables ad-hoc deployed sensor nodes to accurately estimate their locations by using known beacon locations that are several hops away and distance measurements to neighboring nodes. To prevent error accumulation in the network, node locations are computed by setting up and solving a global non-linear optimization problem. The solution is presented in two computation models, centralized and a fully distributed approximation of the centralized model. Our simulation results show that using the fully distributed model, resource constrained sensor nodes can collectively solve a large non-linear optimization problem that none of the nodes can solve individually. This approach results in significant savings in computation and communication, that allows fine-grained localization to run on a low cost sensor node we have developed. 相似文献

5.

A high-speed multiplexer-based fine-grain pipelined architecture for digital fuzzy logic controllers

Bahram Rashidi Sayed Masoud Sayedi 《International Journal of Electronics》2013,100(12):1997-2015

Design and implementation of a high-speed multiplexer-based fine-grain pipelined architecture for a general digital fuzzy logic controller has been presented. All the operators have been designed at gate level. For the multiplication, a multiplexer-based modified Wallace tree multiplier has been designed, and for the division and addition multiplexer-based non-restoring parallel divider and multiplexer-based Manchester adder have been used, respectively. To further increase the processing speed, fine-grain pipelining technique has been employed. By using this technique, the critical path of the circuit is broken into finer pieces. Based on the proposed architecture, and by using Quartus II 9.1, a sample two-input, one-output digital fuzzy logic controller with eight rules has been successfully synthesised and implemented on Stratix II field programmable gate array. Simulations were carried out using DSP Builder in the MATLAB/Simulink tool at a maximum clock rate of 301.84 MHz. 相似文献

6.

High-speed complex-number multiplications based on redundant binary representation of partial products

Kyung-Wook Shin Heung-Woo Jeon 《International Journal of Electronics》2013,100(6):683-702

The complex-number multiplier is one of the key arithmetic components for the baseband signal processing of modern digital communication systems such as channel equalization, timing recovery, modulation and demodulation. This paper presents two algorithms suitable for a high-speed complex-number multiplier, which are based on redundant binary (RB) representation of partial products. The basic idea behind our approach is to convert a pair of binary partial products into a RB form so that the post-addition/subtraction which is inevitable in the conventional methods based on binary multiplication, is eliminated. With the proposed algorithms, the complex-number multiplication is reduced to two RB multiplications, one for the real part and the other for the imaginary part. The RB multiplication is defined by an addition of RB partial products, and is performed in parallel without carry propagation from the least-significant digit to the most-significant digit. This work results not only in simplified arithmetic operations, but also in highly parallel and simple architecture when compared with conventional methods using binary multiplications. To demonstrate the algorithms, two test chips have been implemented using a 0.8µm CMOS technology. 相似文献

7.

Single-step creation of localized Delaunay triangulations

Filipe Araujo Luís Rodrigues 《Wireless Networks》2009,15(7):845-858

A localized Delaunay triangulation owns the following interesting properties for sensor and wireless ad hoc networks: it can be built with localized information, the communication cost imposed by control information is limited, and it supports geographical routing algorithms that offer guaranteed convergence. This paper presents two localized algorithms, fast localized Delaunay triangulation 1 (FLDT1) and fast localized Delaunay triangulation 2 (FLDT2), that build a graph called planar localized Delaunay triangulation, PLDel, known to be a good spanner of the Unit Disk Graph, UDG. Our algorithms improve previous algorithms with similar theoretical bounds in the following aspects: unlike previous work, FLDT1 and FLDT2 build PLDel in a single communication step, maintaining a communication cost of O(n log n), which is within a constant of the optimal. Additionally, we show that FLDT1 is more robust than previous triangulation algorithms, because it does not require the strict UDG connectivity model to work. The small signaling cost of our algorithms allows us to improve routing performance, by efficiently using the PLDel graph instead of sparser graphs, like the Gabriel or the Relative Neighborhood graphs. 相似文献

8.

A case for digit serial VLSI signal processors

Mary Jane Irwin Robert Michael Owens 《The Journal of VLSI Signal Processing》1990,1(4):321-334

Digit serial architectures, which have digit serial data transmission combined with digit serial computation, are uniquely suited for the design of VLSI signal processors. The speed disadvantages of digit serial input are overcome if the input is overlapped with the computation—what we refer to as digit pipelining. Digit pipelining allows us to break up long strings of combinatorial logic and, thus, to increase the clock rate of the system while still preserving much of the circuit structure. In general, for a modest increase in hardware (which in VLSI translates to a modest increase in area) digit serial architectures offer the potential of higher throughput than equivalent word parallel architectures. Several designs for various digit serial adders are presented. Then two filter examples are discussed that use the digit serial adders to achieve digit pipelining. 相似文献

9.

A Routing Algorithm for Wireless Ad Hoc Networks with Unidirectional Links 总被引：6，自引：0，他引：6

Prakash Ravi 《Wireless Networks》2001,7(6):617-625

Most of the routing algorithms for ad hoc networks assume that all wireless links are bidirectional. In reality, some links may be unidirectional. In this paper we show that the presence of such links can jeopardize the performance of the existing distance vector routing algorithms. We also present modifications to distance vector based routing algorithms to make them work in ad hoc networks with unidirectional links. For a network of n nodes, neighbors exchange n×n matrices to propagate routing information. This results in loop-free routes. 相似文献

10.

Evaluation architecturale VLSI de systèmes standard de codage ďimages

Gilles Privat Marc Renaudin 《电信纪事》1991,46(1-2):121-141

Image coding systems currently undergoing standardisation within ISO and CCITT are the final outcome of a process of incremental improvements to classical hybrid (transform-predictive) algorithms. The task of VLSI architecture synthesis for these complete systems is made somewhat awkward due to the unstructured, irregular and non-modular nature of these algorithms. An ad hoc methodology for pruning the architectural search space, directed by the goal of minimizing the overall internal memory, leads to a strongly control-flow solution, using a pipeline scheme more efficient than with the original signal-flow graph. A generic image coding processor using a parallel programmable architecture is another solution. It may be inferred that second generation image coding techniques should be designed with massive fine-grain parallelism in view, if they are to take advantage of the full potential of dedicated VLSI implementations. 相似文献

11.

Macro pipelining based scheduling on high performance heterogeneousmultiprocessor systems

Banerjee S. Hamada T. Chau P.M. Fellman R.D. 《Signal Processing, IEEE Transactions on》1995,43(6):1468-1484

Presents a technique for pipelining heterogeneous multiprocessor systems, macro pipelining based scheduling. The problem can be identified as a combination of optimal task/processor assignment to pipeline stages as well as a scheduling problem. The authors propose a new technique based on iterative applications of partitioning and scheduling schemes whereby the number of pipeline stages are identified and the scheduling problem is solved. The pipeline cycle is optimized in two steps. The first step finds a global coarse solution using the ratio cut partitioning technique. This is subsequently improved by the iterative architecture driven partitioning and the repartitioning and time axis relabeling techniques of the second step. The authors have considered a linear interprocessor communication cost model in scheduling. The proposed technique is applied to several examples. They find that for these examples, the proposed macro pipelining based scheduling can improve the throughput rate several times that of the conventional homogeneous multiprocessor scheduling algorithms 相似文献

12.

Asynchronous design for programmable digital signal processors

Meng T.H.-Y. Brodersen R.W. Messerschmitt D.G. 《Signal Processing, IEEE Transactions on》1991,39(4):939-952

相似文献

13.

Optimizing throughput and resource utilization using pipelining: Transformation based approach

Miodrag Potkonjak Jan Rabaey 《The Journal of VLSI Signal Processing》1994,8(2):117-130

A simple formulation of pipelining: Pipelining withN stages is equivalent to retiming where the number of delays on all inputs or all outputs, but not both, is increased byN is used as the basis for a convenient and efficient treatment of pipelining in the design of application specific computers.Pipelining according to the objective function (throughput or resource utilization) and the latency is introduced. For two polynomial complexity pipelining classes, optimal algorithms are presented. For two other classes both proofs of NP-completeness and efficient probabilistic algorithms are presented. Both theoretical and experimental properties of pipelining are discussed and a relationship with other transformations is explored. Due to similar formulations for both software pipelining and the pipelining presented here, all results can be easily modified for use in compilers for general purpose computers. We have also developed a polynomial complexity algorithm for determining the iteration bound.This work was done while the first author was at the University of California, Berkeley. 相似文献

14.

A Repeated Mapping Scheme of Task Modules with Minimum Communication Cost in Hypercube Multicomputers

Joo-Man Kim Cheol-Hoon Lee 《ETRI Journal》1998,20(4):327-345

This paper deals with the problem of one-to-one mapping of 2ⁿ task modules of a parallel program to an n-dimensional hypercube multicomputer so as to minimize the total communication cost during the execution of the task. The problem of finding an optimal mapping has been proven to be NP-complete. First we show that the mapping problem in a hypercube multicomputer can be transformed into the problem of finding a set of maximum cutsets on a given task graph using a graph modification technique. Then we propose a repeated mapping scheme, using an existing graph bipartitioning algorithm, for the effective mapping of task modules onto the processors of a hypercube multicomputer. The repeated mapping scheme is shown to be highly effective on a number of test task graphs; it increasingly outperforms the greedy and recursive mapping algorithms as the number of processors increases. Our repeated mapping scheme is shown to be very effective for regular graphs, such as hypercube-isomorphic or ‘almost’ isomorphic graphs and meshes; it finds optimal mappings on almost all the regular task graphs considered. 相似文献

15.

Blind adaptive beamforming for cyclostationary signals 总被引：7，自引：0，他引：7

Qiang Wu Kon Max Wong 《Signal Processing, IEEE Transactions on》1996,44(11):2757-2767

In order to increase the capacity and to suppress co-channel interference in digital communication systems such as mobile cellular and mobile satellite communication systems, the employment of array beamforming techniques has been proposed. However, conventional beamforming methods are not suitable for such cases since these methods were mainly developed for signal detection and direction-of-arrival (DOA) estimation in radar and sonar. In this paper, utilizing the cyclostationary properties of communication signals, we propose three blind cyclic adaptive beamforming (CAB) algorithms and their fast implementation schemes. Several numerical examples are included. These results demonstrate that the CAB algorithms are good candidates for spatial reuse of frequency spectrum in digital mobile communication systems of the next generation 相似文献

16.

Design methodology for subdigit pipelined digit-serial IIR filters

《Signal processing》1998,68(1):73-86

A novel architecture for high performance two's complement digit-serial IIR filters is presented. The application of the digit-serial computation to the design of IIR filters introduces delay elements in the feedback loop of the IIR filter. This offers the possibility of pipelining the feedback loop inherent in the IIR filters. To fully explore the advantages offered by the use of digit-serial computation, the digit serial structure is based on the feed forward of the carry digit, which allows subdigit pipelining to increase the throughput rate of the IIR filters. A systematic design methodology is presented to derive a wide range of digit-serial IIR filter architectures which can be pipelined to the subdigit level. This will give designers greater flexibility in finding the best trade off between hardware cost and throughput rate. It is shown that the application of digit-serial computations for the realisation of IIR filters combined with the possibility of subdigit pipelining, results in an increase in the computation speed with a considerable reduction in silicon area consumption when compared to an equivalent bit-parallel IIR filter realisations. 相似文献

17.

Design of a transport triggered vector processor for turbo decoding

Shahriar Shahabuddin Janne Janhunen Markku Juntti Amanullah Ghazi Olli Silvén 《Analog Integrated Circuits and Signal Processing》2014,78(3):611-622

In order to meet the requirement of high data rates for next generation wireless systems, efficient implementations of receiver algorithms are essential. On the other hand, faster time-to-market motivates the investigation of programmable implementations. This paper presents a novel design of a programmable turbo decoder as an application-specific instruction-set processor (ASIP) using transport triggered architecture (TTA). The processor architecture is designed in such a manner that it can be programmed with high level language to support different suboptimal maximum a posteriori (MAP) algorithms in a single TTA processor. The design enables the designer to change the algorithms according to the frame error rate performance requirement. A quadratic polynomial permutation interleaver is used for contention-free memory access and to make the processor 3GPP LTE compliant. Several optimization techniques to enable real time processing on programmable platforms are introduced. The essential parts of the turbo decoding algorithm are designed with vector function units. Unlike most other turbo decoder ASIPs, high level language is used to program the processor to meet the time-to-market requirements. With a single iteration, 68.35 Mbps decoding speed is achieved for the max-log-MAP algorithm at a clock frequency of 210 MHz on 90 nm technology. 相似文献

18.

Rate-optimal DSP synthesis by pipeline and minimum unfolding

Lih-Gwo Jeng Liang-Gee Chen 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1994,2(1):81-88

This paper presents a rate-optimal scheduling for real-time DSP algorithms. By using pipelining and unfolding techniques, the parallel characteristics of recursive DSP algorithms can be exploited. A novel unfolding technique is developed to unravel all concurrency in the recursive data flow graph. A perfect rate unfolded data flow graph is also introduced, which can cause a fully static rate optimal functional pipeline schedule. Experimental results have shown that the proposed method can always yield rate-optimal designs with a smaller unfolding factor compared to previous studies 相似文献

19.

ROM based methods for computing the squaring operation in modular rings

Poornachandra B. Rao Alexander Skavantzos 《The Journal of VLSI Signal Processing》1994,7(3):199-211

Signal Processing algorithms generally rely heavily on the convolution operation which in turn is multiplication intensive. However, more recently convolution algorithms based on the squaring operation as opposed to the multiplication operation have been developed. In this article we present two ROM based methods for performing the squaring operation modulo 2ⁿ, modulo 2ⁿ−1, or modulo 2ⁿ+1. The performance, cost, and implementation issues of the two methods are analyzed in detail and compared against each other as well as with a traditional ROM based implementation. It is shown that both methods obtain ROM bit savings of 99.99%, for 32-bit word lengths, when compared with traditional techniques. However, one of the methods outperforms the other in all other respects such as overhead costs, of up to 99.48% savings, performance, up to about 20 times faster, and regularity and simplicity of hardware design. 相似文献

20.

Data hiding technologies for digital radiography

Piva A. Barni M. Bartolini F. De Rosa A. 《Vision, Image and Signal Processing, IEE Proceedings -》2005,152(5):604-610

Research on data hiding is demonstrating every day that several applications can benefit from this technology; among these, medical data management. In particular, embedding patient information into a medical image through data hiding could improve the level of security and confidentiality that is essential for the diffusion of medical information systems. The design of a data hiding system for such an application has to take into account specific requirements, the most important are: high payload to identifying reliably a patient; quality preservation of the watermarked image; robustness to content modification. According to this analysis, a comparison between different data hiding approaches is presented, to evaluate the most suitable algorithms for embedding patient information into digital radiographs. In particular two algorithms based on statistical decision theory have been compared with schemes following the new approach of modelling data hiding as communication with side-information at the transmitter. These methods have been tested and compared in the framework of digital radiographies management in order to identify their benefits and drawbacks. 相似文献