共查询到20条相似文献,搜索用时 31 毫秒
1.
Youcef Bouchebaba Bruno Girodias Fabien Coelho Gabriela Nicolescu El mostapha Aboulhamid 《The Journal of VLSI Signal Processing》2007,49(1):123-138
In today’s embedded systems, memory hierarchy is rapidly becoming a major factor in terms of power, performance and area.
This is especially true for embedded multimedia applications using temporary multi-dimensional arrays that are typically used
to store intermediate results during multimedia processing. In this paper, we propose a new technique that optimizes the use
of the cache and the registers. It consists in combining buffer and register allocation to reduce the size of the temporary
arrays. Firstly we use the concept of live data to replace each array by a buffer of lower size. Then we replace references
to these buffers by registers. The buffer allocation step keeps only useful data in memory and the register allocation step
allows taking advantage of data reuse in internal loops. Codes considered in this paper are multimedia applications structured
as a sequence of loop nests. The experiments are made on Unix environment and on the StepNP simulator (MPSoC platform of STMicroelctronics).
They show that our technique yields significant reduction of the number of data cache and TLB misses.
相似文献
Gabriela NicolescuEmail: |
2.
Luigi Dilillo Patrick Girard Serge Pravossoudovitch Arnaud Virazel Magali Bastian 《Journal of Electronic Testing》2007,23(5):435-444
In this paper, we present an exhaustive study on the influence of resistive-open defects in pre-charge circuits of SRAM memories.
In SRAM memories, the pre-charge circuits operate the pre-charge and equalization at a certain voltage level, in general Vdd,
of all the couples of bit lines of the memory array. This action is essential in order to ensure correct read operations.
We have analyzed the impact of resistive-opens placed in different locations of these circuits. Each defect studied in this
paper disturbs the pre-charge circuit in a different way and for different resistive ranges, but the produced effect on the
normal memory action is always the perturbation of the read operations. This faulty behavior can be modeled by Un-Restored
Write Faults (URWFs) and Un-Restored Read Faults (URRFs), because there is an incorrect pre-charge/equalization of the bit
lines after a write or read operation that disturbs the following read operation. In the last part of the paper, we demonstrate
that the test of URWFs is more effective in terms of resistive defect detection than that of URRFs and we list the necessary
test conditions to detect them.
相似文献
Magali BastianEmail: URL: http://www.infineon.com |
3.
This paper presents a symbol timing synchronization method by interpolating the discrete impulse response of matched-filter,
in which the group delay of matched-filter is adjusted by different interpolation step to achieve symbol timing synchronization.
The proposed method is of much less complexity than the conventional polyphase filterbank method that could be viewed as a
special case of interpolated matched-filters. The interpolated matched-filters are used in a loop, and the loop is simulated
at 2 samples/symbol, the simulation results show that the proposed method can provide precise symbol timing synchronization
with less complexity.
相似文献
Zujun LiuEmail: |
4.
B. Girodias Y. Bouchebaba G. Nicolescu E. M. Aboulhamid P. Paulin B. Lavigueur 《Journal of Signal Processing Systems》2009,57(2):263-283
Multiprocessor System-on-Chip is one of the main drivers of the semiconductor industry revolution by enabling the integration
of complex functionality on a single chip. The techniques for processor design and application optimizations can be combined
together for more efficient design of these systems. Thus, the memory optimization techniques improving the data locality
can be combined with multithreading technology, improving the overall processor efficiency. The combination of these techniques
is mainly challenged by the adaptation of memory optimization techniques to the high parallelism offered by the multithreading
environments. This paper presents an in-depth analysis of the impact of multiprocessor and multithreading environments on
memory optimization techniques. A discussion is provided on the different types of parallelization (fine and coarse grain)
and their influence on memory optimization technique. Some improvements on existing memory optimization techniques are presented
as well some adaptation necessary to use them in this type of environment.
相似文献
B. GirodiasEmail: |
5.
Expressions are given for the moment generating functions of the Rayleigh and generalized Rayleigh distributions.
相似文献
Saralees NadarajahEmail: |
6.
Yu-Han Chen Tung-Chien Chen Chuan-Yung Tsai Sung-Fang Tsai Liang-Gee Chen 《Journal of Signal Processing Systems》2008,50(1):1-17
Data access usually leads to more than 50% of the power cost in a modern signal processing system. To realize a low-power
design, how to reduce the memory access power is a critical issue. Data reuse (DR) is a technique that recycles the data read
from memory and can be used to reduce memory access power. In this paper, a systematic method of DR exploration for low-power
architecture design is presented. For a start, the signal processing algorithms should be formulated as the nested loops structures,
and data locality is explored by use of loop analysis. Then, corresponding DR techniques are applied to reduce memory access
power. The proposed design methodology is applied to the motion estimation (ME) algorithms of H.264 video coding standard.
After analyzing the ME algorithms, suitable parallel architectures and processing flows of the integer ME (IME) and fractional
ME (FME) are proposed to achieve efficient DR. The amount of memory access is respectively reduced to 0.91 and 4.37% in the
proposed IME and FME designs, and thus lots of memory access power is saved. Finally, the design methodology is also beneficial
for other signal processing systems with a low-power consideration.
相似文献
Liang-Gee ChenEmail: |
7.
In this paper, we explore general conditions for the oscillation based test of switched-capacitor biquad filter stages. Expressions
describing the characteristics of a filter stage put into oscillation are derived and conditions for achieving oscillation
by internal transformation of the filter stage are explored. Reconfiguration scheme based on the transformation of the biquad
filter stage to a quadratic oscillator is studied. Theoretically the circuit can be put into oscillation by de-activating
a single capacitor. Simulations, however, show that in practice a carefully designed low feed-back loop is required to achieve
acceptable oscillation test mode.
相似文献
Franc NovakEmail: |
8.
The implementation of the memory for storing image and transform coefficients in 2-D DWT processing systems using the more
cost-effective external memory module such as DDR DRAM is shown to suffer from effective memory bandwidth which is significantly
lower than the memory system peak bandwidth if the conventional direct logical-to-physical memory address mapping is adopted.
The low effective memory bandwidth is caused by the high level of memory overhead cycle occurrence which is in turn is closely
related to the logical memory access patterns of 2-D DWT processes. The problem becomes even more severe for the 2-D DWT processing
of video. An analysis on the logical memory access patterns of multi-level 2-D DWT is carried out and an enhanced logical-to-physical
memory mapping scheme which minimizes the occurrence of memory overhead cycles is proposed. The proposed scheme is simulated
and its performance in terms of effective memory access bandwidth is evaluated and compared with the conventional direct mapping
scheme.
相似文献
Soon-Chieh LimEmail: |
9.
This paper presents an FPGA realisation of an application-specific cellular processor array designed for asynchronous skeletonization
of binary images. The skeletonization algorithm is based on iterative thinning utilizing a ‘grassfire’ transformation approach.
The purpose of this work was to test the performance of a fully parallel asynchronous processor array and to evaluate the
inhomogeneity of wave propagation velocity. A proof-of-concept design has been implemented and evaluated, the results are
presented and discussed.
相似文献
Piotr DudekEmail: |
10.
For applications requiring a large dynamic, real numbers may be represented either in floating-point, or in the logarithm
number system (LNS). Which system is best for a given application is difficult to know in advance, because the cost and performance
of LNS operators depend on the target accuracy in a highly non linear way. Therefore, a comparison of the pros and cons of
both number systems in terms of cost, performance and overall accuracy is only relevant on a per-application basis. To make
such a comparison possible, two concurrent libraries of parameterized arithmetic operators, targeting recent field-programmable
gate arrays, are presented. They are unbiased in the sense that they strive to reflect the state-of-the-art for both number
systems. These libraries are freely available at .
相似文献
Jérémie Detrey (Corresponding author)Email: |
Florent de DinechinEmail: |
11.
The paper summarizes the main results of one of the key panel session of the Workshop, focused on the investigation about
the possible translation of the “layerless communications” from a dreaming vision to reality.
相似文献
Juha SaarnioEmail: |
12.
V. Torres A. Pérez-Pascual T. Sansaloni J. Valls 《Journal of Signal Processing Systems》2009,56(1):17-23
Timing recovery in communication systems with linear modulations is usually performed with a non-data-aided feedback loop
based on a fractional interpolator timing corrector and the Gardner’s timing error detector. The contribution of this paper
is twofold. First, some design rules are given to predict the behaviour of the loop if pipeline is used. Second, it is shown
that pipelining can be used to reduce power consumption in a timing feedback loop. A timing recovery loop has been implemented
in an FPGA device and power consumption measures indicates that by including 16 extra registers in the loop the power consumption
decreases a 63% and the synchronizer can process up to 66.5 MSPS.
相似文献
J. Valls (Corresponding author)Email: |
13.
In this work the performance of a Fractional Fourier transform (FrFT) based Minimum Mean Squared Error receiver for MIMO systems
with space time processing over Rayleigh faded channels is presented. The proposed receiver called Optimum FrFT based MIMO
receiver (OFMR) shows improved performance outperforming the simple MMSE receiver in Rayleigh faded channel.
相似文献
Rajesh KhannaEmail: Email: |
14.
Jonah Probell 《Journal of Signal Processing Systems》2008,50(1):33-39
Many different video processor architectures exist. Its architecture gives a processor strength for a particular application.
Hardwired logic yields the best performance/cost, but a programmable processor is important for applications that support
multiple coding standards, proprietary functions, or future changes to application requirements. Programmable video processor
architectures achieve best performance through the use of parallelism at the data (SIMD), instruction (VLIW), and multiprocessor
level, and optimally sized ALU, multiplier, and load/store datapaths. Because low-cost memory architectures are not optimized
for the random access patterns of video processing, the performance of video processors is often limited by memory bandwidth
rather than processing resources. Careful data organization alleviates memory bandwidth limitations. When choosing a video
processor it is important to consider many factors, particularly performance, cost, power consumption, programmability, and
peripheral support.
相似文献
Jonah ProbellEmail: |
15.
Parallelization of operations is of utmost importance for efficient implementation of Public Key Cryptography algorithms.
Starting with a classification of parallelization methods at different abstraction levels of public key algorithms, we propose
a novel memory architecture for elliptic curve implementations with multiple modular multiplier units. This architecture is
well-suited for different point addition and doubling algorithms over to be implemented on FPGAs. It allows the execution time to scale with the number of modular multipliers and exhibits nearly
no overhead compared to the mere runtime of the multipliers. The advantages of this distributed memory architecture are demonstrated
by means of two different point addition and doubling algorithms.
相似文献
Sorin A. HussEmail: |
16.
To support distributive tracking of moving targets in a wireless sensor network, sensors that receive signal from the same
target must collaborate to facilitate collaborative, distributed target tracking. We present an efficient dynamic sensor self-organizing
algorithm that clusters sensors into groups without requiring a centralized control. Extensive simulations are conducted to
verify the performance improvement as well as the communication reduction for the proposed methods.
相似文献
Xiaohong ShengEmail: |
17.
A New Routing Metric for Satisfying Both Energy and Delay Constraints in Wireless Sensor Networks 总被引:1,自引:0,他引:1
Besides energy constraint, wireless sensor networks should also be able to provide bounded communication delay when they are
used to support real-time applications. In this paper, a new routing metric is proposed. It takes into account both energy
and delay constraints. It can be used in AODV. By mathematical analysis and simulations, we have shown the efficiency of this
new routing metric.
相似文献
YeQiong SongEmail: |
18.
Majority of scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques
are generally applied to increase parallelism for these nested loops. Most of the existing loop transformation techniques
either can not achieve maximum parallelism, or can achieve maximum parallelism but with complicated loop bounds and loop indexes
calculations. This paper proposes a new technique, loop striping, that can maximize parallelism while maintaining the original row-wise execution sequence with minimum overhead. Loop striping
groups iterations into stripes, where all iterations in a stripe are independent and can be executed in parallel. Theorems
and efficient algorithms are proposed for loop striping transformations. The experimental results show that loop striping
always achieves better iteration period than software pipelining and loop unfolding, improving average iteration period by
50 and 54% respectively.
相似文献
Edwin H.-M. ShaEmail: |
19.
Rong Zheng Jatindera Singh Walia Lui Sha 《International Journal of Wireless Information Networks》2006,13(4):275-287
In this paper, we investigate the problem of providing efficient communication primitives across domains of wireless sensor network (WSN) applications. We argue both qualitatively and quantitatively that group communication among sensors of geographic proximity is one of the basic building blocks of many WSN applications. Furthermore, group communication awareness needs to be embedded and implemented at the MAC layer due to the broadcast nature of wireless medium. We devise a MAC protocol, called LGC-MAC to enable efficient single-hop one-to-many and many-to-one communication. We present case studies of two example applications, acoustic target tracking and propagation of information with feedback using LGC-MAC and demonstrate that LGC-MAC can improve the response time, alleviate channel contention and provide better fault tolerance to packet collisions and wireless errors.
相似文献
Rong ZhengEmail: |
20.
This paper proposes an architecture and protocol infrastructure for a novel consumer-oriented incoming call connection (ICC)
service. In the ubiquitous consumer wireless world (UCWW) this service is one of the core services requiring new infrastructural
solutions for its realisation. The solution proposed here, besides realising the service, will offer mobile users greater
flexibility and management control over incoming calls, enable users to receive incoming calls via multiple access networks/providers
through a single identity, enable user-driven, seamless, network-transparent hot access network change (HAC), largely end
roaming charges and create a new wireless networking business opportunity among other benefits. The main components and interfaces
of the ICC service architecture and infrastructure are described, and protocol candidates are suggested. A generic consumer-oriented
ICC service scenario is elaborated theoretically, implemented and experimentally verified for voice over IP (VoIP) connections
in a testbed environment which includes network-transparent HAC. Two distinct ICC operational modes are identified and compared
in respect of relative signaling latency and processing resources for a number of key functions such as session setup and
release, and HAC.
相似文献
Ning WangEmail: |