期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Buffer and Register Allocation for Memory Space Optimization

Youcef Bouchebaba Bruno Girodias Fabien Coelho Gabriela Nicolescu El mostapha Aboulhamid 《The Journal of VLSI Signal Processing》2007,49(1):123-138

In today’s embedded systems, memory hierarchy is rapidly becoming a major factor in terms of power, performance and area. This is especially true for embedded multimedia applications using temporary multi-dimensional arrays that are typically used to store intermediate results during multimedia processing. In this paper, we propose a new technique that optimizes the use of the cache and the registers. It consists in combining buffer and register allocation to reduce the size of the temporary arrays. Firstly we use the concept of live data to replace each array by a buffer of lower size. Then we replace references to these buffers by registers. The buffer allocation step keeps only useful data in memory and the register allocation step allows taking advantage of data reuse in internal loops. Codes considered in this paper are multimedia applications structured as a sequence of loop nests. The experiments are made on Unix environment and on the StepNP simulator (MPSoC platform of STMicroelctronics). They show that our technique yields significant reduction of the number of data cache and TLB misses.

Gabriela NicolescuEmail:

相似文献

2.

Analysis and Test of Resistive-Open Defects in SRAM Pre-Charge Circuits

Luigi Dilillo Patrick Girard Serge Pravossoudovitch Arnaud Virazel Magali Bastian 《Journal of Electronic Testing》2007,23(5):435-444

In this paper, we present an exhaustive study on the influence of resistive-open defects in pre-charge circuits of SRAM memories. In SRAM memories, the pre-charge circuits operate the pre-charge and equalization at a certain voltage level, in general Vdd, of all the couples of bit lines of the memory array. This action is essential in order to ensure correct read operations. We have analyzed the impact of resistive-opens placed in different locations of these circuits. Each defect studied in this paper disturbs the pre-charge circuit in a different way and for different resistive ranges, but the produced effect on the normal memory action is always the perturbation of the read operations. This faulty behavior can be modeled by Un-Restored Write Faults (URWFs) and Un-Restored Read Faults (URRFs), because there is an incorrect pre-charge/equalization of the bit lines after a write or read operation that disturbs the following read operation. In the last part of the paper, we demonstrate that the test of URWFs is more effective in terms of resistive defect detection than that of URRFs and we list the necessary test conditions to detect them.

Magali BastianEmail: URL: http://www.infineon.com

相似文献

3.

Symbol Timing Synchronization Using Interpolation-Based Matched-Filters

Zujun Liu Kechu Yi 《Wireless Personal Communications》2009,50(4):457-467

This paper presents a symbol timing synchronization method by interpolating the discrete impulse response of matched-filter, in which the group delay of matched-filter is adjusted by different interpolation step to achieve symbol timing synchronization. The proposed method is of much less complexity than the conventional polyphase filterbank method that could be viewed as a special case of interpolated matched-filters. The interpolated matched-filters are used in a loop, and the loop is simulated at 2 samples/symbol, the simulation results show that the proposed method can provide precise symbol timing synchronization with less complexity.

Zujun LiuEmail:

相似文献

4.

Multiprocessor,Multithreading and Memory Optimization for On-Chip Multimedia Applications

B. Girodias Y. Bouchebaba G. Nicolescu E. M. Aboulhamid P. Paulin B. Lavigueur 《Journal of Signal Processing Systems》2009,57(2):263-283

Multiprocessor System-on-Chip is one of the main drivers of the semiconductor industry revolution by enabling the integration of complex functionality on a single chip. The techniques for processor design and application optimizations can be combined together for more efficient design of these systems. Thus, the memory optimization techniques improving the data locality can be combined with multithreading technology, improving the overall processor efficiency. The combination of these techniques is mainly challenged by the adaptation of memory optimization techniques to the high parallelism offered by the multithreading environments. This paper presents an in-depth analysis of the impact of multiprocessor and multithreading environments on memory optimization techniques. A discussion is provided on the different types of parallelization (fine and coarse grain) and their influence on memory optimization technique. Some improvements on existing memory optimization techniques are presented as well some adaptation necessary to use them in this type of environment.

B. GirodiasEmail:

相似文献

5.

MGFs for Rayleigh Random Variables

Christopher S. Withers Saralees Nadarajah 《Wireless Personal Communications》2008,46(4):463-468

Expressions are given for the moment generating functions of the Rayleigh and generalized Rayleigh distributions.

Saralees NadarajahEmail:

相似文献

6.

Data Reuse Exploration for Low Power Motion Estimation Architecture Design in H.264 Encoder

Yu-Han Chen Tung-Chien Chen Chuan-Yung Tsai Sung-Fang Tsai Liang-Gee Chen 《Journal of Signal Processing Systems》2008,50(1):1-17

Data access usually leads to more than 50% of the power cost in a modern signal processing system. To realize a low-power design, how to reduce the memory access power is a critical issue. Data reuse (DR) is a technique that recycles the data read from memory and can be used to reduce memory access power. In this paper, a systematic method of DR exploration for low-power architecture design is presented. For a start, the signal processing algorithms should be formulated as the nested loops structures, and data locality is explored by use of loop analysis. Then, corresponding DR techniques are applied to reduce memory access power. The proposed design methodology is applied to the motion estimation (ME) algorithms of H.264 video coding standard. After analyzing the ME algorithms, suitable parallel architectures and processing flows of the integer ME (IME) and fractional ME (FME) are proposed to achieve efficient DR. The amount of memory access is respectively reduced to 0.91 and 4.37% in the proposed IME and FME designs, and thus lots of memory access power is saved. Finally, the design methodology is also beneficial for other signal processing systems with a low-power consideration.

Liang-Gee ChenEmail:

相似文献

7.

Oscillation Test Scheme of SC Biquad Filters Based on Internal Reconfiguration 总被引：1，自引：0，他引：1

Uroš Kač Franc Novak 《Journal of Electronic Testing》2007,23(6):485-495

In this paper, we explore general conditions for the oscillation based test of switched-capacitor biquad filter stages. Expressions describing the characteristics of a filter stage put into oscillation are derived and conditions for achieving oscillation by internal transformation of the filter stage are explored. Reconfiguration scheme based on the transformation of the biquad filter stage to a quadratic oscillator is studied. Theoretically the circuit can be put into oscillation by de-activating a single capacitor. Simulations, however, show that in practice a carefully designed low feed-back loop is required to achieve acceptable oscillation test mode.

Franc NovakEmail:

相似文献

8.

An Enhanced Memory Address Mapping Scheme for Improved Memory Access Performance of 2-D DWT Processing Systems

Sze-Wei Lee Soon-Chieh Lim 《The Journal of VLSI Signal Processing》2007,47(3):201-221

The implementation of the memory for storing image and transform coefficients in 2-D DWT processing systems using the more cost-effective external memory module such as DDR DRAM is shown to suffer from effective memory bandwidth which is significantly lower than the memory system peak bandwidth if the conventional direct logical-to-physical memory address mapping is adopted. The low effective memory bandwidth is caused by the high level of memory overhead cycle occurrence which is in turn is closely related to the logical memory access patterns of 2-D DWT processes. The problem becomes even more severe for the 2-D DWT processing of video. An analysis on the logical memory access patterns of multi-level 2-D DWT is carried out and an enhanced logical-to-physical memory mapping scheme which minimizes the occurrence of memory overhead cycles is proposed. The proposed scheme is simulated and its performance in terms of effective memory access bandwidth is evaluated and compared with the conventional direct mapping scheme.

Soon-Chieh LimEmail:

相似文献

9.

Hardware Implementation of Skeletonization Algorithm for Parallel Asynchronous Image Processing

Alexey Lopich Piotr Dudek 《Journal of Signal Processing Systems》2009,56(1):91-103

This paper presents an FPGA realisation of an application-specific cellular processor array designed for asynchronous skeletonization of binary images. The skeletonization algorithm is based on iterative thinning utilizing a ‘grassfire’ transformation approach. The purpose of this work was to test the performance of a fully parallel asynchronous processor array and to evaluate the inhomogeneity of wave propagation velocity. A proof-of-concept design has been implemented and evaluated, the results are presented and discussed.

Piotr DudekEmail:

相似文献

10.

A Tool for Unbiased Comparison between Logarithmic and Floating-point Arithmetic

Jérémie Detrey Florent de Dinechin 《The Journal of VLSI Signal Processing》2007,49(1):161-175

For applications requiring a large dynamic, real numbers may be represented either in floating-point, or in the logarithm number system (LNS). Which system is best for a given application is difficult to know in advance, because the cost and performance of LNS operators depend on the target accuracy in a highly non linear way. Therefore, a comparison of the pros and cons of both number systems in terms of cost, performance and overall accuracy is only relevant on a per-application basis. To make such a comparison possible, two concurrent libraries of parameterized arithmetic operators, targeting recent field-programmable gate arrays, are presented. They are unbiased in the sense that they strive to reflect the state-of-the-art for both number systems. These libraries are freely available at .

Jérémie Detrey (Corresponding author)Email:

Florent de DinechinEmail:

相似文献

11.

Layereless Communications: From Dream to Reality?

Juha Saarnio Rui Aguiar I. Vijaya Kumar 《Wireless Personal Communications》2008,44(1):51-55

The paper summarizes the main results of one of the key panel session of the Workshop, focused on the investigation about the possible translation of the “layerless communications” from a dreaming vision to reality.

Juha SaarnioEmail:

相似文献

12.

Design and FPGA-Implementation of a High Performance Timing Recovery Loop for Broadband Communications

V. Torres A. Pérez-Pascual T. Sansaloni J. Valls 《Journal of Signal Processing Systems》2009,56(1):17-23

Timing recovery in communication systems with linear modulations is usually performed with a non-data-aided feedback loop based on a fractional interpolator timing corrector and the Gardner’s timing error detector. The contribution of this paper is twofold. First, some design rules are given to predict the behaviour of the loop if pipeline is used. Second, it is shown that pipelining can be used to reduce power consumption in a timing feedback loop. A timing recovery loop has been implemented in an FPGA device and power consumption measures indicates that by including 16 extra registers in the loop the power consumption decreases a 63% and the synchronizer can process up to 66.5 MSPS.

J. Valls (Corresponding author)Email:

相似文献

13.

Improved Fractional Fourier Transform Based Receiver for Spatial Multiplexed MIMO Antenna Systems

Rajesh Khanna Rajiv Saxena 《Wireless Personal Communications》2009,50(4):563-574

In this work the performance of a Fractional Fourier transform (FrFT) based Minimum Mean Squared Error receiver for MIMO systems with space time processing over Rayleigh faded channels is presented. The proposed receiver called Optimum FrFT based MIMO receiver (OFMR) shows improved performance outperforming the simple MMSE receiver in Rayleigh faded channel.

Rajesh KhannaEmail: Email:

相似文献

14.

Architecture Considerations for Multi-Format Programmable Video Processors

Jonah Probell 《Journal of Signal Processing Systems》2008,50(1):33-39

Many different video processor architectures exist. Its architecture gives a processor strength for a particular application. Hardwired logic yields the best performance/cost, but a programmable processor is important for applications that support multiple coding standards, proprietary functions, or future changes to application requirements. Programmable video processor architectures achieve best performance through the use of parallelism at the data (SIMD), instruction (VLIW), and multiprocessor level, and optimally sized ALU, multiplier, and load/store datapaths. Because low-cost memory architectures are not optimized for the random access patterns of video processing, the performance of video processors is often limited by memory bandwidth rather than processing resources. Careful data organization alleviates memory bandwidth limitations. When choosing a video processor it is important to consider many factors, particularly performance, cost, power consumption, programmability, and peripheral support.

Jonah ProbellEmail:

相似文献

15.

Parallel Memory Architecture for Elliptic Curve Cryptography over $$ \mathbb{G}\mathbb{F}{\left( p \right)} $$ Aimed at Efficient FPGA Implementation

Ralf Laue Sorin A. Huss 《Journal of Signal Processing Systems》2008,51(1):39-55

Parallelization of operations is of utmost importance for efficient implementation of Public Key Cryptography algorithms. Starting with a classification of parallelization methods at different abstraction levels of public key algorithms, we propose a novel memory architecture for elliptic curve implementations with multiple modular multiplier units. This architecture is well-suited for different point addition and doubling algorithms over to be implemented on FPGAs. It allows the execution time to scale with the number of modular multipliers and exhibits nearly no overhead compared to the mere runtime of the multipliers. The advantages of this distributed memory architecture are demonstrated by means of two different point addition and doubling algorithms.

Sorin A. HussEmail:

相似文献

16.

Dynamic Sensor Self-Organization for Distributive Moving Target Tracking 总被引：1，自引：0，他引：1

Yu Hen Hu Xiaohong Sheng 《Journal of Signal Processing Systems》2008,51(2):161-171

To support distributive tracking of moving targets in a wireless sensor network, sensors that receive signal from the same target must collaborate to facilitate collaborative, distributed target tracking. We present an efficient dynamic sensor self-organizing algorithm that clusters sensors into groups without requiring a centralized control. Extensive simulations are conducted to verify the performance improvement as well as the communication reduction for the proposed methods.

Xiaohong ShengEmail:

相似文献

17.

A New Routing Metric for Satisfying Both Energy and Delay Constraints in Wireless Sensor Networks 总被引：1，自引：0，他引：1

Najet Boughanmi YeQiong Song 《Journal of Signal Processing Systems》2008,51(2):137-143

Besides energy constraint, wireless sensor networks should also be able to provide bounded communication delay when they are used to support real-time applications. In this paper, a new routing metric is proposed. It takes into account both energy and delay constraints. It can be used in AODV. By mathematical analysis and simulations, we have shown the efficiency of this new routing metric.

YeQiong SongEmail:

相似文献

18.

Maximize Parallelism Minimize Overhead for Nested Loops via Loop Striping

Chun Xue Zili Shao Edwin H.-M. Sha 《The Journal of VLSI Signal Processing》2007,47(2):153-167

Majority of scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are generally applied to increase parallelism for these nested loops. Most of the existing loop transformation techniques either can not achieve maximum parallelism, or can achieve maximum parallelism but with complicated loop bounds and loop indexes calculations. This paper proposes a new technique, loop striping, that can maximize parallelism while maintaining the original row-wise execution sequence with minimum overhead. Loop striping groups iterations into stripes, where all iterations in a stripe are independent and can be executed in parallel. Theorems and efficient algorithms are proposed for loop striping transformations. The experimental results show that loop striping always achieves better iteration period than software pipelining and loop unfolding, improving average iteration period by 50 and 54% respectively.

Edwin H.-M. ShaEmail:

相似文献

19.

Local Group Communication-aware MAC Protocol in Wireless Sensor Networks

Rong Zheng Jatindera Singh Walia Lui Sha 《International Journal of Wireless Information Networks》2006,13(4):275-287

In this paper, we investigate the problem of providing efficient communication primitives across domains of wireless sensor network (WSN) applications. We argue both qualitatively and quantitatively that group communication among sensors of geographic proximity is one of the basic building blocks of many WSN applications. Furthermore, group communication awareness needs to be embedded and implemented at the MAC layer due to the broadcast nature of wireless medium. We devise a MAC protocol, called LGC-MAC to enable efficient single-hop one-to-many and many-to-one communication. We present case studies of two example applications, acoustic target tracking and propagation of information with feedback using LGC-MAC and demonstrate that LGC-MAC can improve the response time, alleviate channel contention and provide better fault tolerance to packet collisions and wireless errors.

Rong ZhengEmail:

相似文献

20.

Consumer-Oriented Incoming Call Connection Service for a Ubiquitous Consumer Wireless World

Ivan Ganchev Máirtín S. O’Droma Ning Wang 《Wireless Personal Communications》2009,50(1):115-131

This paper proposes an architecture and protocol infrastructure for a novel consumer-oriented incoming call connection (ICC) service. In the ubiquitous consumer wireless world (UCWW) this service is one of the core services requiring new infrastructural solutions for its realisation. The solution proposed here, besides realising the service, will offer mobile users greater flexibility and management control over incoming calls, enable users to receive incoming calls via multiple access networks/providers through a single identity, enable user-driven, seamless, network-transparent hot access network change (HAC), largely end roaming charges and create a new wireless networking business opportunity among other benefits. The main components and interfaces of the ICC service architecture and infrastructure are described, and protocol candidates are suggested. A generic consumer-oriented ICC service scenario is elaborated theoretically, implemented and experimentally verified for voice over IP (VoIP) connections in a testbed environment which includes network-transparent HAC. Two distinct ICC operational modes are identified and compared in respect of relative signaling latency and processing resources for a number of key functions such as session setup and release, and HAC.

Ning WangEmail:

相似文献