共查询到20条相似文献,搜索用时 31 毫秒
1.
Tay-Jyi Lin Hung-Yueh Lin Chie-Min Chao Chih-Wei Liu Chih-Wei Jen 《The Journal of VLSI Signal Processing》2006,42(2):127-138
A multimedia system-on-a-chip (SoC) usually contains one or more programmable digital signal processors (DSP) to accelerate
data-intensive computations. But most of these DSP cores are designed originally for standalone applications, and they must
have some overlapped (and redundant) components with the host microprocessor. This paper presents a compact DSP for multi-core
systems, which is fully programmable and has been optimized to execute a set of signal processing kernels very efficiently.
The DSP core was designed concurrently with its automatic software generator based on high-level synthesis. Moreover, it performs
lightweight arithmetic—the static floating-point (SFP), which approximates the quality of floating-point (FP) operations with
the hardware similar to that of the integer arithmetic. In our simulations, the compact DSP and its auto-generated software
can achieve 3X performance (estimated in cycles) of those DSP cores in the dual-core baseband processors with similar computing resources.
Besides, the 16-bit SFP has above 40 dB signal to round-off noise ratio over the IEEE single-precision FP, and it even outperforms
the hand-optimized programs based on the 32-bit integer arithmetic. The 24-bit SFP has above 64 dB quality, of which the maximum
precision is identical to that of the single-precision FP. Finally, the DSP core has been implemented and fabricated in the
UMC 0.18μm 1P6M CMOS technology. It can operate at 314.5 MHz while consuming 52mW average power. The core size is only 1.5 mm×1.5 mm
including the 16 KB on-chip memory and the AMBA AHB interface.
This work was supported by the National Science Council, Taiwan under Grant NSC93-2220-E-009-017. Besides, the authors would
like to thank the National Chip Implementation Center (CIC) for chip fabrication.
Tay-Jyi Lin received the BS degree in electrical and control engineering from National Chiao Tung University, Taiwan, in 1998. He is
working toward the PhD degree in the Department of Electronics Engineering and the Institute of Electronics, National Chiao
Tung University. His current researches include the heterogeneous computing platform for embedded multimedia systems, complexity-aware
architecture design, and high-performance/low-power digital signal processors.
Hung-Yueh Lin received the BS and the MS degrees in electronics engineering from National Chiao Tung University, Taiwan, in 2002 and 2004,
respectively. He is now with MediaTek, Inc., Hsinchu, Taiwan. His research interests include lightweight computer arithmetic
and DSP architecture.
Chie-Min Chao received the BS degree in electronics engineering from National Chiao Tung University, Taiwan, in 2003, where he is currently
pursuing his MS degree. His researches include system software development, VLSI system design, and DSP architecture.
Chih-Wei Liu received the BS and the PhD degrees in electrical engineering from National Tsing Hua University, Taiwan, in 1991 and 1999,
respectively. From 1999 to 2000, he was an integrated circuit design engineer at the Electronics Research and Service Organization
(ERSO) of Industrial Technology Research Institute (ITRI), Taiwan. Then, near the end of 2000, he started to work for the
SoC Technology Center (STC) of ITRI as a project leader and eventually left ITRI at the end of Oct., 2003. He is currently
with the Department of Electronics Engineering and the Institute of Electronics, National Chiao Tung University, Taiwan, as
an assistant professor. His current research interests include SoC and VLSI system design, processor architecture, digital
signal processing, digital communications, and coding theory.
Chein-Wei Jen received the BS degree from National Chiao Tung University, Taiwan, in 1970, the MS degree from Stanford University in 1977,
and the PhD degree from National Chiao Tung University in 1983. From 1981 to 2004, he was with the Department of Electronics
Engineering and the Institute of Electronics at National Chiao Tung University. Dr Jen was given the Outstanding Electrical
Engineering Professor Award by the Chinese Institute of Electrical Engineering in 2002. He is currently the General Director
of the SoC Technology Center at Industrial Technology Research Institute, the Adviser of National SoC Program, and the Managing
Director of the Board of the Taiwan IC Design Society. His research interests include SoC design, VLSI architectures, multimedia
processing, and design automation. He holds seven patents and has published over 50 journal and 100 conference papers in these
areas. 相似文献
2.
We developed a pipelined scheduling technique of functional hardware and software modules for platform‐based system‐on‐a‐chip (SoC) designs. It is based on a modified list scheduling algorithm. We used the pipelined scheduling technique for a performance analysis of an MPEG4 video encoder application. Then, we applied it for architecture exploration to achieve a better performance. In our experiments, the modified SoC platform with 6 pipelines for the 32‐bit dual layer architecture shows a 118% improvement in performance compared to the given basic SoC platform with 4 pipelines for the 16‐bit single‐layer architecture. 相似文献
3.
June‐Young Chang Won‐Jong Kim Young‐Hwan Bae Jin Ho Han Han‐Jin Cho Hee‐Bum Jung 《ETRI Journal》2005,27(5):497-503
In this paper, we present a performance analysis for an MPEG‐4 video codec based on the on‐chip network communication architecture. The existing on‐chip buses of system‐on‐a‐chip (SoC) have some limitation on data traffic bandwidth since a large number of silicon IPs share the bus. An on‐chip network is introduced to solve the problem of on‐chip buses, in which the concept of a computer network is applied to the communication architecture of SoC. We compared the performance of the MPEG‐4 video codec based on the on‐chip network and Advanced Micro‐controller Bus Architecture (AMBA) on‐chip bus. Experimental results show that the performance of the MPEG‐4 video codec based on the on‐chip network is improved over 50% compared to the design based on a multi‐layer AMBA bus. 相似文献
4.
Digital Signal Processing (DSP) is widely used in high-performance media processing and communication systems. In majority of these applications, critical DSP functions are realized as embedded cores to meet the low-power budget and high computational complexity. Usually these cores are ASICs that cannot be easily retargeted for other similar applications that share certain commonalities. This stretches the design cycle that affects time-to-market constraints. In this paper, we present a reconfigurable high-performance low-power filter coprocessor architecture for DSP applications. The coprocessor architecture, apart from having the performance and power advantage of its ASIC counterpart, can be reconfigured to support a wide variety of filtering computations. Since filtering computations abound in DSP applications, the implementation of this coprocessor architecture can serve as an important embedded hardware IP. 相似文献
5.
6.
7.
Özgün Paker Jens Sparsø Niels Haandbæk Mogens Isager Lars Skovby Nielsen 《The Journal of VLSI Signal Processing》2004,37(1):95-110
This paper describes a low-power programmable DSP architecture that targets audio signal processing. The architecture can be characterized as a heterogeneous multiprocessor consisting of small instruction set processors called mini-cores as well as standard DSP and CPU cores that communicate using message passing. The mini-cores are tailored for different classes of filtering algorithms (FIR, IIR, N-LMS etc.), and in a typical system the communication among processors occur at the sampling rate only.The mini-cores are intended as soft-macros to be used in the implementation of system-on-chip solutions using a synthesis-based design flow targeting a standard-cell implementation. They are parameterized in word-size, memory-size, etc. and can be instantiated according to the needs of the application. To give an impression of the size of a mini-core we mention that one of the FIR mini-cores in a prototype design has 16 instructions, a 32-word × 16-bit program memory, a 64-word × 16-bit data memory and a 25-word × 16-bit coefficient memory.Results obtained from the design of a prototype chip containing mini-cores for a hearing aid application, demonstrate a power consumption that is only 1.5–1.6 times larger than a hardwired ASIC and more than 6–21 times lower than current state of the art low-power DSP processors. This is due to: (1) the small size of the processors and (2) a smaller instruction count for a given task. 相似文献
8.
9.
10.
Optical half-band filters 总被引:4,自引:0,他引:4
This paper proposes two kinds of novel 2×2 circuit configuration for finite-impulse response (FIR) half-band filters. These configurations can be transformed into each other by a symmetric transformation and their power transmittance is identical. The configurations have only about half the elements of conventional FIR lattice-form filters. We derive a design algorithm for achieving desired power transmittance spectra. We also describe 2×2 circuit configurations for infinite-impulse response (IIR) half-band filters. These configurations are designed to realize arbitrary-order IIR half-band filter characteristics by extending the conventional half-band circuit configuration used in millimeter-wave devices. We discuss their filter characteristics and confirm that they have a power half-band property. We demonstrate design examples including FIR maximally flat half-band filters, an FIR Chebyshev half-band filter, and an IIR elliptic half-band filter 相似文献
11.
12.
Wonjong Kim Seungchul Kim Younghwan Bae Sungik Jun Youngsoo Park Hanjin Cho 《ETRI Journal》2003,25(6):510-516
In this paper, we describe the development of a platform‐based SoC of a 32‐bit smart card. The smart card uses a 32‐bit microprocessor for high performance and two cryptographic processors for high security. It supports both contact and contactless interfaces, which comply with ISO/IEC 7816 and 14496 Type B. It has a Java Card OS to support multiple applications. We modeled smart card readers with a foreign language interface for efficient verification of the smart card SoC. The SoC was implemented using 0.25 µm technology. To reduce the power consumption of the smart card SoC, we applied power optimization techniques, including clock gating. Experimental results show that the power consumption of the RSA and ECC cryptographic processors can be reduced by 32% and 62%, respectively, without increasing the area. 相似文献
13.
IIR数字滤波器的优化设计和DSP实现 总被引:3,自引:0,他引:3
首先叙述了直接Ⅱ型IIR(无限冲击响应)数字滤波器能够克服使用定点DSP实现IIR数字滤波器时引起的输入数据的溢出问题;然后利用MATLAB软件生成滤波器的输入数据和系数,进行相应的数据压缩处理,并生成仿真波形;最后给出了用DSP语言实现直接Ⅱ型结构IIR数字滤波器的完整程序、仿真结果,同时对仿真结果进行了分析、比较。 相似文献
14.
Jia-Ming Chen Chun-Nan Liu Jen-Kuei Yang Shau-Yin Tseng Wei-Kuan Shih An-Yeu Wu 《Journal of Signal Processing Systems》2011,62(3):383-402
Two representative multimedia applications—AAC and H.264/AVC decoders on the parallel architecture core (PAC) SoC are introduced
in the second part of the two introductory papers. The applications have been programmed on the PACDSP core and the PAC SoC
to demonstrate the high-performance, low-power DSP computations and the effectiveness of the dynamic voltage and frequency
scaling (DVFS) capability on the heterogeneous multicore SoC. First, techniques to exploit data- and instruction-level parallelisms
existing in the application kernels are described for performance optimizations on the clustered VLIW architecture of PACDSP
with the distributed register organization. Next, two variation techniques of asymmetric programming model are introduced
by examples of decoders. Then, the energy efficiency of the programmable multimedia SoC is demonstrated using an innovative
power-aware H.264/AVC decoder. Finally, a DVFS-aware framework for soft real-time video playback is provided by extending
the power-aware decoding scheme. The work provides practical references of realizing multimedia applications on PAC SoC suitable
for rich-function and resource constraint portable devices. 相似文献
15.
《Microelectronics Journal》2002,33(5-6):501-508
This paper proposes the FPGA implementation of the digit-serial Canonical Signed-Digit (CSD) coefficient FIR filters which can be used as format conversion filters in place of the ones employed for the MPEG2 TM 5 (test model 5). Canonical representation of a signed digit (CSD) is a method used to reduce cost by representing a signed number using the least amount of non-zero digits, thereby reducing the number of multiply operations. As Field Programmable Gate Arrays (FPGAs) have grown in capacity, improved in performance, and decreased in cost, they are becoming a viable solution for performing computationally intensive tasks, with the ability to tackle applications formerly reserved for custom chips and programmable digital signal processing (DSP) devices. A digit-serial CSD FIR filter design is realized and practical design guidelines are provided using FPGAs. An analysis of the performance comparison of bit-serial, serial distributed arithmetic, and digit-serial CSD FIR filters on a Xilinx XC4000XL-series FPGA is described. The results show that the proposed digit-serial CSD FIR filter is compact and an efficient implementation of real-time DSP applications on FPGAs. 相似文献
16.
Reconfigurable Computing for Digital Signal Processing: A Survey 总被引:6,自引:0,他引:6
Steady advances in VLSI technology and design tools have extensively expanded the application domain of digital signal processing over the past decade. While application-specific integrated circuits (ASICs) and programmable digital signal processors (PDSPs) remain the implementation mechanisms of choice for many DSP applications, increasingly new system implementations based on reconfigurable computing are being considered. These flexible platforms, which offer the functional efficiency of hardware and the programmability of software, are quickly maturing as the logic capacity of programmable devices follows Moore's Law and advanced automated design techniques become available. As initial reconfigurable technologies have emerged, new academic and commercial efforts have been initiated to support power optimization, cost reduction, and enhanced run-time performance.This paper presents a survey of academic research and commercial development in reconfigurable computing for DSP systems over the past fifteen years. This work is placed in the context of other available DSP implementation media including ASICs and PDSPs to fully document the range of design choices available to system engineers. It is shown that while contemporary reconfigurable computing can be applied to a variety of DSP applications including video, audio, speech, and control, much work remains to realize its full potential. While individual implementations of PDSP, ASIC, and reconfigurable resources each offer distinct advantages, it is likely that integrated combinations of these technologies will provide more complete solutions. 相似文献
17.
In this article we provide a framework for controlling the bit rate of multiple prerecorded MPEG video sequences by choosing the quantization factors assigned to individual sources in a way that the total mean square error at the output of the encoder is minimized. We propose and test a knapsack model for the selection of the quantization factors. Our computations based on a set of relatively diverse video sequences reveal that the proposed model achieves a high utilization of the available bandwidth and acceptable distortion levels without any data loss. 相似文献
18.
19.
As DSP (Digital Signal Processing) applications become more complex, there is also a growing need for new architectures supporting efficient high-level language compilers. We try to synthesize a new DSP processor architecture by adding several DSP processor specific features to a RISC core that has a compiler friendly structure, such as many general-purpose registers and orthogonal instructions. The synthesized digital signal processor supports single-cycle MAC (Multiply-and-ACcumulate), direct memory access, automatic address generation, and hardware looping capabilities in addition to ordinary RISC instructions. The compiler for the new architecture is quickly implemented by developing a code-converter that modifies the assembly codes that are generated by the RISC compiler. The performance effects of adding each of these as well as all the combined features are evaluated using seven DSP-kernel benchmarks, a QCELP vocoder, and an MPEG video decoder. The effects of CPU clock frequency change due to the addition of these features are also considered. Finally, we also compare the performances with several existing DSP processors, such as TMS320C3x, TMS320C54x, and TMS320C5x. 相似文献
20.
Les Mintzer 《The Journal of VLSI Signal Processing》1993,6(2):119-127
Distributed arithmetic techniques are the key to efficient implementation of DSP algorithms in FPGAs. The distributed arithmetic process is briefly described. A representative DSP design application in the form of an 8 tap FIR filter is offered for the Xilinx XC3042 field programmable logic array (FPGA). The design is presented in sufficient detail—from filter specifications via filter design software through detailed logic of salient data and control functions to obtain a realistic placing and routing of configurable logic block (CLBs) and in/out block (IOBs) components for simulation verification and performance evaluation vis-a-vis commercially available dedicated 8 tap FIR filter chips. 相似文献