共查询到20条相似文献,搜索用时 15 毫秒
1.
《Simulation Practice and Theory》1997,5(2):137-151
The speedup factor in real time simulation of dynamic systems using multiprocessor resources depends on: the architecture of the multiprocessor system, type of interconnection between parallel processors, numerical methods and techniques used for discretization and task assignment and scheduling policy. The minimization of the number of processors needed for real time simulation requires the minimization of processors times for interprocessor communications and efficient scheduling policy. Therefore, this article presents a methodology for the real time simulation of dynamic systems including a new pre-emptive static assignment and scheduling policy. The advantages of applying digital signal processor with parallel architecture, for example TMS320C40, in real time simulation have been described. Some important issues in real time architectures necessary for efficient multiprocessor real time simulations, such as multiple I/O channels, concurrent I/O and CPU processing, direct high speed interprocessor communications, fast context switching, multiple busses, multiple memories, and powerful arithmetic units are inherent to this processor. These features minimize interprocessor communication time and maximize sustained CPU performance. 相似文献
2.
精确制导技术的不断发展对弹载计算机的性能要求越来越高,采用多个处理器并行处理技术才能满足这些要求。TMS320C6416数字信号处理器是当前性能最强的数字信号处理器之一。简要介绍了精确制导技术概况,提出了采用多个TMS320C6416处理器并行处理的弹载计算机设计方案。测试结果表明,采用此方案设计的弹载计算机能够满足实时图像匹配的要求。 相似文献
3.
《Microprocessors and Microsystems》1986,10(5):251-257
A solution to the development of hardware for realtime signal processing applications using an existing Unix-based microprocessor development system is described. The system used is based upon a single special purpose digital signal processor controlled by a general purpose microprocessor. The modularity of the approach allows the possibility of control of several signal processing modules performing concurrent tasks. A particularly attractive feature of this development system is that it may be extended at low cost to accommodate new processors as they become available from manufacturers. 相似文献
4.
Sung W. Mitra S.K. Jeren B. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(1):110-120
An efficient real-time implementation of digital filtering algorithms using a multiprocessor system in a ring network is investigated. This method is based on a parallel block processing approach, where a continuously supplied input data is divided into blocks, and the blocks are processed concurrently by being assigned to each processor in the system. This approach requires only a simple interconnection network and reduces significantly the number of communications among the processors, making the system easily expandable and highly efficient. In addition, various digital signal processing algorithms can be implemented on the same multiprocessor system. The data dependency of the blocks to be processed concurrently brings on dependency problems between the processors. A systematic scheduling method has been developed by using a precedence graph for the analysis of the dependency relation. Methods for solving the dependency problems between the processors are also investigated. Implementation procedures and results for FIR, recursive, and adaptive filtering algorithms are illustrated 相似文献
5.
6.
7.
详述了由TMS32020信号处理器构成的引信抗背景干扰自适应滤波电路的原理、设计方案及工作流程,并对系统的实际应用及应用结果作了分析。其中,TMS32020采用最小组态结构,在其内部程序RAM中执行程序,算法采用线性LMS算法。 相似文献
8.
We study digital clock synchronization for multiprocessor systems, where processors are triggered by a common clock pulse and communicate with others via shared memory.A self-stabilizing digital clock synchronization protocol for systems with a general communication graph is presented. The protocol can commence in an arbitrary non-consistent system state and converges to a legitimate state in which the clocks are synchronized and incremented by one in every subsequent pulse.To enhance the fault-tolerance of our protocol, we allow that during and following convergence processors may stop operating. Crash failures may partition the communication graph into several connected components. Our protocol synchronizes the clocks of the processors in every such connected component. For the case in which faulty processors can exhibit Byzantine behavior, we prove that there is no digital clock synchronization protocol that tolerates even one single faulty processor. 相似文献
9.
TMS320C2XX系列DSP双电源电压供电电路的两种方案 总被引:1,自引:0,他引:1
TMS320系列DSP具有高性能,低成本等特点,在许多系统中得到了应用,为了有效降低系统的功耗,对DSP芯片采用双电源供电是十分必要的。TMS320C206是TMS320C2000系列中采用双电源供电的16位定点DSP产品。本文针对系统输入电压的不同情况介绍了两种TMS320C206的双电源供电电路,该方案也适用于其它采用5V/3.3V供电的DSP。 相似文献
10.
A low cost, high-speed, general-purpose ditigal signal processing system was constructed using the TMS32010 digital signal processor. The system was designed with simplicity, compactness, flexibility and expandibility in mind. A parallel processing architecture was adopted to achieve realtime performance. Four processors were used in the prototype system, but this can be expanded easily. Interprocessor data transfer and communications with the host computer are facilitated via a single common bus and a bank of shared memory. A one-dimensional digital FIR filter and a realtime FFT program were used to evaluate the performance of the system. In addition, a realtime spectrogram was implemented as an application example. 相似文献
11.
The Radon transform and its inverse (a filtered backprojection) are receiving increasing attention for applications in image reconstruction. As data collection capabilities and image reconstruction algorithms have become more sophisticated, the computational intensity of these problems has drastically increased. Parallel processing techniques are being used to implement highspeed hardware designs that will speed up this computationally burdensome task. Parallel arrays of digital signal processing (DSP) chips may be used to compute the Radon transform and back-projection for high-speed image reconstruction. In this paper we describe computation of the Radon transform and back-projection using a parallel pipelined processor architecture of DSP chips and evaluate the accuracy of the computations and quality of reconstructed images. To justify the computational approach selected, alternative procedures for computation of the Radon transform and back-projection are described and their performance using the 32-bit fixed-point arithmetic of the selected DSP chips are compared. We present, evaluate, and compare the simulated performances of implementations of these procedures on two fixed-point DSP chips: the TI TMS32020 and the AT&T DSP16. 相似文献
12.
G.E. Plassman J. Sobieszczanski-Sobieski 《Structural and Multidisciplinary Optimization》2001,22(2):102-115
Numerical experiments were conducted to find out the extent to which a Genetic Algorithm (GA) may benefit from a multiprocessor
implementation, considering, on one hand, that analyses of individual designs in a population are independent of each other
so that they may be executed concurrently on separate processors, and, on the other hand, that there are some operations in
a GA that cannot be so distributed. The algorithm experimented with was based on a gaussian distribution rather than bit exchange
in the GA reproductive mechanism, and the test case was a hub frame structure of up to 1080 design variables. The experimentation
engaging up to 128 processors confirmed expectations of radical elapsed time reductions comparing to a conventional single
processor implementation. It also demonstrated that the time spent in the nondistributable parts of the algorithm and the
attendant cross-processor communication may have a very detrimental effect on the efficient utilization of the multiprocessor
machine and on the number of processors that can be used effectively in a concurrent manner. Three techniques were devised
and tested to mitigate that effect, resulting in efficiency increasing to exceed 99 percent. Of particular interest to the
user, corresponding elapsed time compression factors approaching 128 are realized on 128 processors.
Received October 18, 2000 相似文献
13.
14.
Small-scale shared-memory multiprocessors are commonly used in a workgroup environment where multiple applications, both parallel and sequential, are executed concurrently while sharing the processors and other system resources. To utilize the processors efficiently, an effective allocation strategy is required. In this paper, we use performance data obtained from an SGI multiprocessor to evaluate several processor allocation strategies when running two parallel programs simultaneously. We examine gang scheduling (coscheduling), static space-sharing (space partitioning), and a dynamic allocation scheme called loop-level process control (LLPC) with three different dynamic allocation heuristics. We use regression analysis to quantify the measured data and thereby explore the relationship between the degree of parallelism of the application, specific system parameters (such as the size of the system), the processor allocation strategy, and the resulting performance. This study shows that dynamically partitioning the system using LLPC or similar heuristics provides better performance for applications with a high degree of parallelism than either gang scheduling or static space-sharing. 相似文献
15.
This paper describes the results of the research for implementing applications for concurrent execution on the CYBERPLUS multiparallel computer system. Three layers of parallelism are built into this system: instruction, computation and functional levels. The paper contains a short summary of the hardware, and the different layers of the software for multiprocessor systems. They are based on the previously developed CPFTN (Cyberplus Fortran Compiler) compiler which produces optimized object code for a single CYBERPLUS processor, exploiting through a series of horizontal and vertical optimizations the internal parallelism present in this processor. Unlike other compilers, the input to CPFTN consists of an entire kernel containing all subroutines and functions loaded into a single processor. An interprocessor communications subsystem provides the low-level component of the multiprocessor system and contains the data transfer and synchronization capabilities. The high-level component consists of extensions to FORTRAN (CPMFTN, Cyberplus Multiprocessor Fortran) for characterizing the data shared between processors and implements concepts specially adapted to the private memory architecture of the CYBERPLUS as well as to the specific needs of the user community targetted by this product. The resulting scheme is a large grain size, demand driven, data flow type. It is assumed that partition between processors can best be done on the bases of the high-level knowledge of the problem possessed by the user. The corresponding kernels may be created independently of each other, using the new language features introduced to characterize data shared between processors. Scheduling, synchronization, as well as data transfer operations are automatically distributed by the compiler across the subroutines and functions constituting the grain size of the partition. The first version of CPMFTN under development is described followed by several simple application paradigms and discussion of the merits of possible future extensions. 相似文献
16.
17.
An efficient parallel architecture is proposed for high-performance multimedia data processing using multiple multimedia video processors (MVP; TMS320C80), which are fully programmable general digital signal processors (DSP). This paper describes several requirements for a multimedia data processing system and the system architecture of an image computing system called the KAIST Image Computing System (KICS). The performance of the KICS is evaluated in terms of its I/O bandwidth and the execution time for some image processing functions. An application of the KICS to the real-time Moving Picture Expert Group 2 (MPEG-2) encoder is introduced. The programmability and the high-speed data-access capability of the KICS are its most important features as a high-performance system for real-time multimedia data processing. 相似文献
18.
本文介绍了TI公司的新一代DSP芯片──多媒体视频处理器TMS320C80(MVP)的应用特点、主要性能及芯片内部的系统结构和功能框图,并简要地介绍了利用C80构成应用系统实例。最后讨论了C80的应用前景。 相似文献
19.
《Journal of Parallel and Distributed Computing》2005,65(4):464-478
Distributed-memory parallel computers and networks of workstations (NOWs) both rely on efficient communication over increasingly high-speed networks. Software communication protocols are often the performance bottleneck. Several current and proposed parallel systems address this problem by dedicating one general-purpose processor in a symmetric multiprocessor (SMP) node specifically for protocol processing. This protocol processing convention reduces communication latency and increases effective bandwidth, but also reduces the peak performance since the dedicated processor no longer performs computation. In this paper, we study a parallel machine with SMP nodes and compare two protocol processing policies: the Fixed policy, which uses a dedicated protocol processor; and the Floating policy, where all processors perform both computation and protocol processing. The results from synthetic microbenchmarks and five macrobenchmarks show that: (i) a dedicated protocol processor benefits light-weight protocols much more than heavy-weight protocols, (ii) a dedicated protocol processor is generally advantageous when there are four or more processors per node, (iii) multiprocessor node performance is not as sensitive to interrupt overhead as uniprocessor node because a message arrival is likely to find an idle processor on a multiprocessor node, thereby eliminating interrupts, (iv) the system with the lowest cost-performance will include a dedicated protocol processor when interrupt overheads are much higher than protocol weight—as in light-weight protocols. 相似文献
20.
针对以往光纤气体传感器模拟信号处理电路的不足,基于DSP芯片TMS320F2812设计并实现了一种嵌入式光纤气体传感器信号处理系统;重点描述了硬件实现方法及其软件开发,其硬件主要由接口电路、高精度运放器OPA139P,32位高性能定点DSP(数字信号处理器)TMS320F2812和D/A转换芯片DAC7625组成;软件部分的主要功能是产生交流信号,完成两路输入信号滤波,相敏检波,相除、线性化处理等工作. 相似文献