首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
The vision processor (VP) and vision controller (VC), two integrated products dedicated to video compression, are discussed. The chips implement the P×64, JPEG, and MPEG image compression standards. The VP forms the heart of the image compression system. It performs discrete cosine transform (DCT), quantization, and motion estimation, as well as inverse DCT, and inverse quantization. The highly parallel and microcode-based processor performs all of the JPEG, MPEG, and P×64 algorithms. The VC smart microcontroller controls the compression process and provides the interface to the host system. It captures pixels from a video source, performs video preprocessing, supervises pixel compression by the VP, performs Huffman encoding, and passes the compressed data to the host over a buffered interface. It takes compressed data from the host, performs coder decoding, supervises decompression via the VP, performs postprocessing, and generates digital pixel output for a video destination such as a monitor  相似文献   

2.
《Robotics and Computer》1994,11(2):91-98
A new model is presented to describe data-flow algorithms implemented in a multiprocessing system. Called the resource/data flow graph (RDFG), the model explicitly represents cyclo-static processor schedules as circuits of processor arcs that reflect the order that processors execute graph nodes. The model also allows the guarantee of meeting hard real-time deadlines. When unfolded, the model identifies statistically the processor schedule. The model therefore is useful for determining the throughput and latency of systems with heterogeneous processors. The applicability of the model is demonstrated using a space surveillance algorithm.  相似文献   

3.
This paper describes the architecture defects of existing hardware platforms have been analyzed. These defects allow software vulnerabilities exploitation. The authors propose to solve this problem by building a processor with secure-by-design architecture. Requirements for such a processor are formulated within this paper. Also the authors describe the application of virtualization technology to model the existing processor and to use this model to demonstrate proposed approach.  相似文献   

4.
A pattern recognition system is described which employs normalized cross-correlation as a measure of similarity. A potential implementation is presented which is based on existing or feasibile charge-coupled device discrete analog structures. Estimates of processing times are given.  相似文献   

5.
A multipurpose neural processor for machine vision systems   总被引:1,自引:0,他引:1  
A multitask neural network is proposed as a plausible visual information processor for performing a variety of real-time operations associated with the early stages of vision. The computational role performed by the processor, named the positive-negative (PN) neural processor, emulates the spatiotemporal information processing capabilities of certain neural activity fields found along the human visual pathway. The state-space model of this visual information processor corresponds to a bilayered two-dimensional array of densely interconnected nonlinear processing elements (PE's). An individual PE represents the neural activity exhibited by a spatially localized subpopulation of excitatory or inhibitory nerve cells. Each PE may receive inputs from an external signal space as well as from itself and the neighboring PE's within the network. The information embedded in the external input data which originates from a video camera or another processor is extracted by the feedforward subnet. The feedback subnet of the PN neural processor generates a variety of transient and steady-state activities. Their various computational roles are applicable to gray level, edge, texture, or color information processing. Computer simulations involving gray level image processing are used to illustrate the versatility of the PN neural processor architecture for machine vision system design.  相似文献   

6.
Growing demand for high speed processing of streamed data (e.g. video-streams, digital signal streams, communication streams, etc.) in the advanced manufacturing environments requires the adequate cost-efficient stream-processing platforms. Platforms based on the embedded microprocessors often cannot satisfy performance requirements due to limitations associated with the sequential nature of data execution process. During the last decade, development and prototyping of the above embedded platforms has started moving towards utilization of the Field Programmable Gate Array (FPGA) devices. However, the programming of an application to the FPGA based platform became an issue due to relatively complicated hardware design process. The paper presents an approach which allows simplification of the application programming process by utilization of: (i) the uniformed FPGA platform with the dynamically reconfigurable architecture, (ii) a programming technique based on a temporal partitioning of the application in segments which can be described in terms of macro-operators (function specific virtual components). The paper describes the concept of the approach, presents the analytical investigation and experimental verification of the cost-effectiveness of the proposed platform comparing to the platforms based on sequential micro-processors. It is also shown that the approach can be beneficially utilized in collaborative design and manufacturing.  相似文献   

7.
A new type of high performance array processor system is presented in this paper.Unlikethe conventional host-peripheral array processor systems,this system is designed with afunctionally distributed approach.The design philosophy is described first.Then the hardwareorganizations of two concrete systems,namely:150-AP and GF-10/12,including thecommunication between processors are shown.Some attractive system performances for usersprograms are also given.  相似文献   

8.
We present a novel architecture to develop Virtual Environments (VEs) for multicore CPU systems. An object-centric method provides a uniform representation of VEs. The representation enables VEs to be processed in parallel using a multistage, dual-frame pipeline. Dynamic work distribution and load balancing is accomplished using a thread migration strategy with minimal overhead. This paper describes our approach, and shows it is efficient and scalable with performance experiments. Near linear speed-ups have been observed in experiments involving up to 1,000 deformable objects on a six-core i7 CPU. This approach’s practicality is demonstrated with the development of a medical simulation trainer for a craniotomy procedure.  相似文献   

9.
《Micro, IEEE》2001,21(2):48-54
With higher sensor resolutions available, the speed and dynamic range requirements for image processors in digital imaging systems are more demanding. A 12-bit, 50 Mpixels/s digital image acquisition system balances power end performance. The analog processor's total power dissipation is only 150 MW at full speed-an enviable quality for the portable market although the underlying technique requires special analog circuitry to handle fast gain changes, it achieves a far wider dynamic range  相似文献   

10.
New generations of automobiles will include driver assistance systems requiring powerful, low-cost processors to handle video/camera applications and to enable fast, convenient application development. Shrinking feature sizes on processors already in development will bring substantial increases in system speed and functionality.  相似文献   

11.
This paper presents a CORDIC (Coordinate Rotation Digital Computer)-based split-radix fast Fourier transform (FFT) core for OFDM systems, for example, Ultra Wide Band (UWB), Asymmetric Digital Subscriber Line (ADSL), Digital Audio Broadcasting (DAB), Digital Video Broadcasting – Terrestrial (DVB-T), Very High Bitrate DSL (VHDSL), and Worldwide Interoperability for Microwave Access (WiMAX). The high-speed 128/256/512/1024/2048/4096/8192-point FFT processor has been implemented by 0.18 μm (1p6m) at 1.8 V, in which all the control signals are generated internally. This programmable FFT processor outperforms the conventional ones in terms of both power consumption and core area.  相似文献   

12.
13.
In the Big Data era, the gap between the storage performance and an application’s I/O requirement is increasing. I/O congestion caused by concurrent storage accesses from multiple applications is inevitable and severely harms the performance. Conventional approaches either focus on optimizing an application’s access pattern individually or handle I/O requests on a low-level storage layer without any knowledge from the upper-level applications. In this paper, we present a novel I/O-aware bandwidth allocation framework to coordinate ongoing I/O requests on petascale computing systems. The motivation behind this innovation is that the resource management system has a holistic view of both the system state and jobs’ activities and can dynamically control the jobs’ status or allocate resource on the fly during their execution. We treat a job’s I/O requests as periodical sub-jobs within its lifecycle and transform the I/O congestion issue into a classical scheduling problem. Based on this model, we propose a bandwidth management mechanism as an extension to the existing scheduling system. We design several bandwidth allocation policies with different optimization objectives either on user-oriented metrics or system performance. We conduct extensive trace-based simulations using real job traces and I/O traces from a production IBM Blue Gene/Q system at Argonne National Laboratory. Experimental results demonstrate that our new design can improve job performance by more than 30%, as well as increasing system performance.  相似文献   

14.
Realtime applications of any microprocessor necessitate interfacing to a large variety of peripheral devices. Various interfacing techniques are discussed. Examples are given in which Intel's 8085 is taken as the typical microprocessor. The I/O transfers considered fall into two categories: memory-mapped transfers and I/O-mapped transfers. Both synchronous and asynchronous types are dealt with. Bit masking and interrupt techniques were used for asynchronous memory-mapped I/O transfer.Also included are multiplexed channel transfers and interrupt transfers. The former are treated as a special class of I/O transfer. The latter are useful in applications where it cannot be predicted when data will arrive for transfer to the microprocessor. Unlike other types of transfer, interrupt transfers are initiated by the I/O devices and not by the microprocessor. They are subdivided into software- and hardware-polled transfers. Examples are given of daisychain and search ring transfers.  相似文献   

15.
Bekerman  M. Mendelson  A. 《Micro, IEEE》1995,15(5):72-83
Using a CPI metric, we analyze the performance of Pentium-based systems and examine their use of the processor's architectural features under different software environments. We break down the CPI into its basic constituents and examine the effects of various operating systems and applications on the CPI. This analysis indicates where the application spends its time during execution, giving designers a better understanding of design tradeoffs and potential causes of performance bottlenecks  相似文献   

16.
Problem partitioning to solve ordinary differential equations on a parallel processor system using classical numerical integration methods involves defining and ordering computation tasks and scheduling the tasks for execution, In defining tasks there is a tradeoff between decomposing a computation into a large number of primitive tasks to expose all potential parallelism and decomposing it into a smaller number of tasks to simplify scheduling. Scheduling is an intractable problem; heuristic scheduling algorithms reduce the effort required to schedule tasks but cannot guarantee that the parallel solution will execute in minimum time. An example illustrates difficulties encountered in scheduling tasks for parallel computation and use of a dependency graph as a tool in problem partitioning. The need for an efficient mechanism for asynchronous data exchanges among processors is demonstrated.  相似文献   

17.
The high speed needed in solving digital signal processing problems in real time has often given rise to multiple processor hardware designs. Devices such as the TMS32020 digital signal processor possess features designed to support concurrent processing. Progress in this area is currently hampered by the lack of suitable multiprocessor development tools. It is suggested that an incremental approach to multiprocessor development, using several methods of simulating the signal processor, may be used. Two simulation environments specifically for the development and testing of multiple digital signal processor designs are described. Firstly a single processor simulation system where the algorithms which will be performed by other concurrent processors may be executed in a high level language but without any need to simulate the instructions of the other processors. Secondly a multiple TMS32020 digital signal processor system where processors are simulated as several communicating tasks on a host computer using the IBM AIX (UNIX derived) multitasking operating system.  相似文献   

18.
Modern microprocessor design relies heavily on detailed full-chip performance simulations to evaluate complex trade-offs. Typically, different design alternatives are tried out for a specific sub-system or component, while keeping the rest of the system unchanged. We observe that full-chip simulations for such studies is overkill. This paper introduces mesoscale simulation, which employs high-level modeling for the unchanged parts of a design and uses detailed cycle-accurate simulations for the components being modified. This combination of high-level and low-level modeling enables accuracy on par with detailed full-chip modeling while achieving much higher simulation speeds than detailed full-chip simulations. Consequently, mesoscale models can be used to quickly explore vast areas of the design space with high fidelity. We describe a proof-of-concept mesoscale implementation of the memory subsystem of the Cell/B.E. processor and discuss results from running various workloads.  相似文献   

19.
20.
Multiple processor systems are an integral part of today's high-performance computing environment. Such systems are often configured as a two-dimensional grid of processors called a mesh. Tasks compete for rectangular submeshes of this mesh. The choice of submesh allocation strategy can significantly affect the level of processor utilization and a task's waiting time. In addition, the execution speed of various allocation algorithms varies widely, which can further affect system performance. This paper describes and categorizes several submesh allocation strategies, including a previously unreported method that is superior to other methods in terms of execution speed. The paper includes results of simulation studies used to compare the performance characteristics of the most efficient allocation strategies in each category.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号