首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 179 毫秒
1.
In order to mitigate narrow-band interference in spread spectrum communications systems, novel communications receivers incorporating transform domain filtering techniques are designed. In this paper, lapped transforms are used to transform the received data signal to the transform domain wherein adaptive excision is performed. Transform domain detection algorithms, which yield bit decisions based on the remaining signal energy, are analyzed and, together with excision, are employed on a block-by-block basis to suppress single-tone and narrow-band Gaussian interference. System performance is analytically quantified in terms of the overall system bit-error rate (BER). Subsequent results are presented for a variety of channel conditions and compared to those obtained using excision algorithms based on orthonormal block transforms (Medley 1995). These results demonstrate the improved performance and increased robustness with respect to jammer frequency and bandwidth of lapped transform domain excision techniques relative to similar algorithms based on nonweighted block transforms  相似文献   

2.
The implementation of digital filtering algorithms using pipelined vector processors is investigated. Modeling of vector processors and vectorization methods are explained, and then the performances of several implementation methods are evaluated based on the model. Vector processor implementation of FIR filtering algorithms using the outer product method and the indirect convolution method is evaluated. Recursive and adaptive filtering algorithms, which lead to dependency problems in direct vector processor implementations, are implemented very efficiently using a newly developed vectorization method. The proposed method computes multiple output samples at a time, making the vector length independent of the filter order. Illustrative examples comparing theoretical results with Cray X-MP simulation results are included.  相似文献   

3.
沈晶聂  叶猛 《电视技术》2012,36(9):103-107
在网络处理器的平台上开发了用户管理控制系统,用于对用户上网内容和行为进行监控。网络处理器是可编程的高效网络数据处理芯片,网络控制器是用户管控系统中用于过滤数据的器件。通过实验,在硬件方面使用优化流水线这一高效的芯片处理数据的方法来提升数据处理效率,在软件方面通过使用不同的算法来优化性能,这些算法包括流过滤算法、潜在语义索引算法和IP碎片处理技术。实验结果表明,基于网络处理器的网络控制器在根据过滤和转发规则对数据过滤和转发时准确率高,速度快,非常好地达到了对用户上网内容和行为监控的效果。  相似文献   

4.
This paper presents floating point design and implementation of System on Chip (SoC) based Differential Evolution (DE) algorithm using Xilinx Virtex-5 Field Programmable Gate Array (FPGA). The hardware implementation is carried out to enhance the execution speed of the embedded applications. Intellectual Property (IP) of DE algorithm is developed and interfaced with the 32-bit PowerPC 440 processor using processor local bus (PLB) of Xilinx Virtex-5 FPGA. In the proposed architecture the algorithmic parameters of DE are scalable. The software and hardware implementation of the DE algorithm is carried out in PowerPC embedded processor and hardware IP respectively. The optimization of numerical benchmark functions and system identification in control systems are implemented to verify the proposed hardware SoC platform. The performance of the IP is measured in terms of acceleration gain of the DE algorithm. The optimization problems are solved by using floating point arithmetic in both embedded processor and hardware. The experimental result concludes that the hardware DE IP accelerates the execution speed approximately by 200 times compared to equivalent software implementation of DE algorithm on PowerPC 440 processor. Further, as a case study an Infinite Impulse Response (IIR) based system identification task on SoC using the developed hardware accelerator is implemented.  相似文献   

5.
Graphics processing is an increasing important application domain with the demand of real-time rendering, video streaming, virtual reality, and so on. Illumination is a critical module in graphics rendering and is typically compute-bound, memory-bound, and power-bound in different application cases. It is crucial to decide how to schedule different illumination algorithms with different features according to the practical requirements in reconfigurable graphics hardware. This paper analyze the performance characteristics of four main-stream lighting algorithms, Lambert illumination algorithm, Phong illumination algorithm, Blinn-Phong illumination algorithm, and Cook-Torrance illumination algorithm, using hardware performance counters on x86 processor platform KabyLake (KBL). The data movement, computation, power consumption, and memory accessing are evaluated over a range of application scenarios. Further, by analyzing the system-level behavior of these illumination algorithms, obtains the cons and pros of these specific algorithms were obtained. The associated relationship between performance/energy and the evaluated metrics was analyzed through Pearson correlation coefficient(PCC)analysis. According to these performance characterization data, this paper presents some reconfiguration suggestions in reconfigurable graphics processor.  相似文献   

6.
Two interpolation algorithms are presented for the computation of the inverse of a two variable polynomial matrix. The first interpolation algorithm, is based on the Lagrange interpolation method that matches pre-assigned data of the determinant and the adjoint of a two-variable polynomial matrix, on a set of points on several circles centered at the origin. The second interpolation algorithm is using discrete fourier transforms (DFT) techniques or better fast fourier transforms which are very efficient algorithms available both in software and hardware and that they are greatly benefitted by the existence of a parallel environment (through symmetric multiprocessing or other techniques). The complexity of both algorithms is discussed and illustrated examples are given. The DFT-algorithm is implemented in the Mathematica programming language and tested in comparison to the respective built-in function of Mathematica.  相似文献   

7.
A configurable architecture for performing image transform algorithms is presented that provides a better tradeoff between low complexity and algorithm flexibility than either software-programmable processors or dedicated ASIC's. The configurable processor unit requires only 110 K transistors and can execute several image transform algorithms. By emulating the signal flow of the algorithms in hardware, rather than software, complexity is reduced by an order of magnitude compared with current software programmable video signal processors, while providing more flexibility than single function ASIC's. The processor has been fabricated in 1.2-μm CMOS and has been successfully used to execute the discrete cosine transform/inverse discrete cosine transform (DCT/IDCT), subband coding, vector quantization, and two-dimensional filtering algorithms at pixel rates up to 25 MPixels/s  相似文献   

8.
The development of more processing demanding applications on the Internet (video broadcasting) on one hand and the popularity of recent devices at the user level (digital cameras, wireless videophones, ...) on the other hand introduce challenges at several levels. Today, such devices present processing capabilities and bandwidth settings that are inefficient to manage scalable QoS requirements in a typical media delivery framework. In this paper, we present an impact study of such a scalable data representation optimized for QoS (Matching Pursuit 3D algorithms) on processor architectures to achieve the best performance and power efficiency. A review of state of the art techniques for processor architecture enhancement let us expect promising opportunities from the latest developments in the reconfigurable computing research field. We present here the first design steps of an efficient reconfigurable coprocessor especially designed to cope with future video delivery and multimedia processing requirements. Architecture perspectives are proposed with respect to low development cost constraints, backward compatibilty and easy coprocessor usage using an original strategy based on a hardware/software codesign methodology.Sebastien Bilavarn received the M.S. degree from Rennes University (France) in 1998 and the PhD degree in Electrical Engineering from South Brittany University in 2002. Since June 2002, he works as a post-doc fellow at Signal Processing Institute, Swiss Federal Institute of Technology (EPFL). Sebastiens research interests include design methodologies for embedded systems, reconfigurable computing and Digital Signal Processing. Currently, his work focuses on using Adaptive Computing Systems to optimise computer architectures, which is a collaboration with the Architecture Research Lab of the System Technology Labs, Intel Corporation.Eric Debes received a M.S. in Electrical and Computer Engineering from Supélec, France in 1996, a M.S. in Electrical Engineering from the Technical University Darmstadt, Germany in 1997 and a PhD in Signal Processing from the Swiss Federal Institute of Technology. Since 2001 he has been a Researcher in the Architecture Research Lab of the System Technology Labs, Intel Corporation, Santa Clara, California. Erics research interests include image and video coding and processing algorithms as well as computer architecture and parallelism. At Intel he has been working together with different processor teams and microarchitecture research groups on the definition of new media and communication features (including new SIMD and streaming instructions, multicore processors and low-power architectures) in the CPU and the chipset to provide better media application performance and end user quality of service with a given system and processor power envelope and/or energy budget. More recently Eric has been working on system-on-chip modelling, processor and system power estimation and architecture design space exploration for consumer electronics applications. He is a member of the IEEE, of the ACM and of the SPIE.Pierre Vandergheynst received the M.S. degree in physics and the Ph.D. degree in mathematical physics from the Université catholique de Louvain, Belgium, in 1995 and 1998 respectively. From 1998 to 2001, he was a Postdoctoral Researcher with the Signal Processing Laboratory, Swiss Federal Institute of Technology (EPFL), in Lausanne, Switzerland. He is now an Assistant Professor of Visual Information Processing at EPFL, where is research focuses on computer vision, data processing and mathematical tools for visual information processing. Prof. Vandergheynst is Co-Editor-in-Chief of Signal Processing and member of the IEEE.Jean-Philippe Diguet received the M.S degree and the PhD degree from Rennes University (France) in 1993 and 1996 respectively. His thesis focused on the estimation of hardware complexity and algorithmic transforms for architectural synthesis. Then he joined the IMEC in Leuven (Belgium) where he worked as a post-doc fellow on the minimization of the power consumption of memories at the system-level. From 1997 to 2002, he has been an associated professor at the South Brittany University and member of the LESTER laboratory. In 2003/04, he has initiated and created an innovating company in the domain of short range wireless communications. In 2004, he obtains a CNRS researcher position. His current work focuses on design space exploration of embedded systems, real-time scheduling in the context of hardware/software architecture configurations. Within the LESTER laboratory, he heads the “Design Trotter” team focusing on EDA methods and tools.  相似文献   

9.
该文利用多个高性能数字信号处理器,结合FPGA和通用处理器,实现了一个空时自适应处理(STAP)的通用实时平台系统。借鉴Valiant(1990)提出的BSP模型, 采用多重流水线,提出一个空时自适应处理(STAP)计算模型。该模型可以弥补STAP算法和实际并行系统的差距,为开发提供了统一框架;同时,方便了对算法的性能评估。在基于该模型的具体开发过程中,选择可扩展簇式多处理机结构作为系统硬件架构,采用数据块静态分配方案进行算法的分解与映射,并采取一系列通信和程序优化的方法。结果表明,系统能满足实时要求,可扩展性好,方便类似系统的系列开发。  相似文献   

10.
Low-Area/Power Parallel FIR Digital Filter Implementations   总被引:4,自引:0,他引:4  
This paper presents a novel approach for implementing area-efficient parallel (block) finite impulse response (FIR) filters that require less hardware than traditional block FIR filter implementations. Parallel processing is a powerful technique because it can be used to increase the throughput of a FIR filter or reduce the power consumption of a FIR filter. However, a traditional block filter implementation causes a linear increase in the hardware cost (area) by a factor of L, the block size. In many design situations, this large hardware penalty cannot be tolerated. Therefore, it is important to design parallel FIR filter structures that require less area than traditional block FIR filtering structures. In this paper, we propose a method to design parallel FIR filter structures that require a less-than-linear increase in the hardware cost. A novel adjacent coefficient sharing based sub-structure sharing technique is introduced and used to reduce the hardware cost of parallel FIR filters. A novel coefficient quantization technique, referred to as a scalable maximum absolute difference (MAD) quantization process, is introduced and used to produce quantized filters with good spectrum characteristics. By using a combination of fast FIR filtering algorithms, a novel coefficient quantization process and area reduction techniques, we show that parallel FIR filters can be implemented with up to a 45% reduction in hardware compared to traditional parallel FIR filters.  相似文献   

11.
Many-core processors are good candidates for speeding up video coding because the parallelism of these applications can be exploited more efficiently by the many-core architecture. Lock methods are important for many-core architecture to ensure correct execution of the program and communication between threads on chip. The efficiency of lock method is critical to overall performance of chipped many-core processor. In this paper, we propose two types of hardware locks for on-chip many-core architecture, a centralized lock and a distributed lock. First, we design the architectures of centralized lock and distributed lock to implement the two hardware lock methods. Then, we evaluate the performance of the two hardware locks and a software lock by quantitative evaluation micro-benchmarks on a many-core processor simulator Godson-T. The experimental results show that the locks with dedicated hardware support have higher performance than the software lock, and the distributed hardware lock is more scalable than the centralized hardware lock.  相似文献   

12.
Novel algorithmic features of multimedia applications and advances in VLSI technologies are driving forces behind the new multimedia signal processors. We propose an architecture platform which could provide high performance and flexibility, and would require less external I/O and memory access. It is comprised of array processors to be used as the hardware accelerator and RISC cores to be used as the basis of the programmable processor. It is a hierarchical and scalable architecture style which facilitates the hardware-software codesign of multimedia signal processing circuits and systems. While some control-intensive functions can be implemented using programmable CPUs, other computation-intensive functions can rely on hardware accelerators.To compile multimedia algorithms, we also present an operation placement and scheduling scheme suitable for the proposed architectural platform. Our scheme addresses data reusability and exploits local communication in order to avoid the memory/communication bandwidth bottleneck, which leads to faster program execution. Our method shows a promising performance: a linear speed-up of 16 times can be achieved for the block-matching motion estimation algorithm and the true motion tracking algorithm, which have formed many multimedia applications (e.g., MPEG-2 and MPEG-4).  相似文献   

13.
The paper describes two approaches suitable for a field-programmable gate-array (FPGA) implementation of fast Walsh-Hadamard transforms. These transforms are important in many signal-processing applications including speech compression, filtering and coding. Two novel architectures for the fast Hadamard transforms using both a systolic architecture and distributed arithmetic techniques are presented. The first approach uses the Baugh-Wooley multiplication algorithm for a systolic architecture implementation. The second approach is based on both a distributed arithmetic ROM and accumulator structure, and a sparse matrix-factorisation technique. Implementations of the algorithms on a Xilinx FPGA board are described. The distributed arithmetic approach exhibits better performances when compared with the systolic architecture approach  相似文献   

14.
Fast computation of the discrete Walsh and Hadamard transforms   总被引:1,自引:0,他引:1  
The discrete Walsh and Hadamard transforms are often used in image processing tasks such as image coding, pattern recognition, and sequency filtering. A new discrete Walsh transform (DWT) algorithm is derived in which a modified form of the DWT relation is decomposed into smaller-sized transforms using vectorized quantities. A new sequency-ordered discrete Hadamard transform (DHAT) algorithm is also presented. The proposed approach results in more regular algorithms requiring no independent data swapping and fewer array-index updating and bit-reversal operations. An analysis of the computational complexity and the execution time performance are provided. The results are compared with those of the existing algorithms  相似文献   

15.
BIOS的设计与实现   总被引:5,自引:1,他引:4  
文章详细阐述了BIOS的基本组成框架,提出了一个适合于检测工控机硬件的BIOS上电自检流程,并就设计中的几个关键性问题:正确性,兼容性和可移植性,以及压缩算法等进行了分析,最后整个BIOS在西北工业大学航空微电子中心自主研发的龙腾S1系统(PC104兼容)平台上进行了严格的验证.  相似文献   

16.
By using block processing, partitioning, and fast Fourier transforms (FFTs), large filters perform efficiently in the frequency domain. For small processing delay the complexity can still be too large for implementation on a digital signal processor (DSP). A solution is to partition the filter into unequal-length subfilters. Application in adaptive filtering yields the nonuniform partitioned block frequency domain adaptive filter (NU-PBFDAF)  相似文献   

17.
Energy-Scalable Protocols for Battery-Operated MicroSensor Networks   总被引:10,自引:0,他引:10  
In wireless sensor networks, the goal is to gather information from a large number of sensor nodes and communicate the information to the end-user, all under the constraint of limited energy resources. Network protocols minimize energy by using localized communication and control and by exploiting computation/communication tradeoffs. In addition, data fusion algorithms such as beamforming aggregate data from multiple sources to reduce data redundancy and enhance signal-to-noise ratios, thus further reducing the required communications. We have developed a sensor network system that uses a localized clustering protocol and beamforming data fusion to enable energy-efficient collaboration. We compare the performance of two beamforming algorithms, the Maximum Power and the Least Mean Squares (LMS) beamforming algorithms, using the StrongARM SA-1100 processor. Results show that the LMS algorithm requires less than one-fifth the energy required by the Maximum Power beamforming algorithm with only a 3 dB loss in performance, thus showing that the LMS algorithm is better suited for energy-constrained systems. We explore the energy-scalability of the LMS algorithm, and we propose an energy-quality scalable architecture that incorporates techniques such as variable filter length, variable voltage supply and variable adaptation time.  相似文献   

18.
This paper describes the implementation of a digital audio effect system‐on‐a‐chip (SoC), which integrates an embedded digital signal processor (DSP) core, audio codec intellectual property, a number of peripheral blocks, and various audio effect algorithms. The audio effect SoC is developed using a software and hardware co‐design method. In the design of the SoC, the embedded DSP and some dedicated hardware blocks are developed as a hardware design, while the audio effect algorithms are realized using a software centric method. Most of the audio effect algorithms are implemented using a C code with primitive functions that run on the embedded DSP, while the equalization effect, which requires a large amount of computation, is implemented using a dedicated hardware block with high flexibility. For the optimized implementation of audio effects, we exploit the primitive functions of the embedded DSP compiler, which is a very efficient way to reduce the code size and computation. The audio effect SoC was fabricated using a 0.18 μm CMOS process and evaluated successfully on a real‐time test board.  相似文献   

19.
The transform coding of images is analyzed from a common standpoint in order to generate a framework for the design of optimal transforms. It is argued that all transform coders are alike in the way they manipulate the data structure formed by transform coefficients. A general energy compaction measure is proposed to generate optimized transforms with desirable characteristics particularly suited to the simple transform coding operation of scalar quantization and entropy coding. It is shown that the optimal linear decoder (inverse transform) must be an optimal linear estimator, independent of the structure of the transform generating the coefficients. A formulation that sequentially optimizes the transforms is presented, and design equations and algorithms for its computation provided. The properties of the resulting transform systems are investigated. In particular, it is shown that the resulting basis are nonorthogonal and complete, producing energy compaction optimized, decorrelated transform coefficients. Quantization issues related to nonorthogonal expansion coefficients are addressed with a simple, efficient algorithm. Two implementations are discussed, and image coding examples are given. It is shown that the proposed design framework results in systems with superior energy compaction properties and excellent coding results.  相似文献   

20.
High efficiency video coding (HEVC) transform algorithm for residual coding uses 2-dimensional (2D) 4×4 transforms with higher precision than H.264's 4×4 transforms, resulting in increased hardware complexity. In this paper, we present a shared architecture that can compute the 4×4 forward discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) of HEVC using a new mapping scheme in the video processor array structure. The architecture is implemented with only adders and shifts to an area-efficient design. The proposed architecture is synthesized using ISE14.7 and implemented using the BEE4 platform with the Virtex-6 FF1759 LX550T field programmable gate array (FPGA). The result shows that the video processor array structure achieves a maximum operation frequency of 165.2 MHz. The architecture and its implementation are presented in this paper to demonstrate its programmable and high performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号