首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Graphics hardware is far faster, smaller, cheaper, and more capable than 20 years ago, and it will obviously continue on that path. Memory and processor advances have let us move texture mapping and surface occlusion from software to hardware. We'll no doubt move more sophisticated modeling, lighting and imaging operations into future hardware. Chip I/O rates will continue to advance more slowly than transistor count and, as a result, graphics processors and memory will become ever more highly integrated. Putting memory and processor on the same chip will encourage massive parallelism, because on-chip bandwidth is staggering compared to that between chips. Integrating CPU and graphics is more a business issue than a technical one; game consoles represent one area where tight integration is mandatory. A more interesting question for looking 20 years into the future might be: what will be new? What fundamentally new capabilities can we predict as a result of hardware advances? What fundamentally new capabilities would we like to have but can't predict how, or if, we can achieve them? Advances in image generation hardware haven't fundamentally changed what an individual can actually do in an application; in contrast, some graphics hardware advances have created fundamental changes. We already have a start on some promising graphics hardware technologies that may enable fundamental changes in what we do in graphics over the next 20 years  相似文献   

2.
Through the use of an extended field programmable gate array (FPGA) technology, a large digital circuit can be realized on a relatively small amount of real hardware. Several configuration RAM modules are provided inside the FPGA chip, and the configuration of the gate array can be rapidly changed by replacing the active module. Data for configuration are transferred from an off-chip backup RAM to an unused configuration RAM module. A novel computation mechanism called the WASMII, which executes a target dataflow graph directly, can be proposed on the basis of this virtual hardware. A WASMII chip consists of the FPGA for virtual hardware and an additional mechanism to replace configuration RAM modules in a data-driven manner. Configuration data are preloaded in the order assigned in advance by a static scheduling preprocessor. By connecting a number of WASMII chips, a highly parallel system can be easily constructed.  相似文献   

3.
4.
This paper describes VIPER, the video image-processing system Erlangen. It consists of a general purpose microcomputer, commercially available image-processing hardware modules connected directly to the computer, video input/output-modules such as a TV camera, video recorders and monitors, and a software package. The modular structure and the capabilities of this system are explained. The software is user-friendly, menu-driven and performs image acquisition, transfers, greyscale processing, arithmetics, logical operations, filtering display, colour assignment, graphics, and a couple of management functions. More than 100 image-processing functions are implemented. They are available either by typing a key or by a simple call to the function-subroutine library in application programs. Examples are supplied in the area of biomedical research, e.g. in in-vivo microscopy.  相似文献   

5.
Genetic Programming (GP) (Koza, Genetic programming, MIT Press, Cambridge, 1992) is well-known as a computationally intensive technique. Subsequently, faster parallel versions have been implemented that harness the highly parallel hardware provided by graphics cards enabling significant gains in the performance of GP to be achieved. However, extracting the maximum performance from a graphics card for the purposes of GP is difficult. A key reason for this is that in addition to the processor resources, the fast on-chip memory of graphics cards needs to be fully exploited. Techniques will be presented that will improve the performance of a graphics card implementation of tree-based GP by better exploiting this faster memory. It will be demonstrated that both L1 cache and shared memory need to be considered for extracting the maximum performance. Better GP program representation and use of the register file is also explored to further boost performance. Using an NVidia Kepler 670GTX GPU, a maximum performance of 36 billion Genetic Programming Operations per Second is demonstrated.  相似文献   

6.
We consider the image registration problem to find a reasonable displacement field, such that a transformed template image becomes similar to a so-called reference image. The minimization of the similarity measure (exemplarily based on the gray-value difference) yields a nonlinear ill-posed inverse problem. The necessary regularization is done by replacing the ill-conditioned Hessian by a multidimensional total variation norm. This allows steep gradients and discontinuities in the displacement field in contrast to the common approach by elastic regularization which leads to globally smooth displacement fields. We propose and investigate a multigrid algorithm as inner iteration for registration. As we use Neumann boundary conditions which lead to singular systems, a special treatment before and during the FAS multigrid algorithm is required, e.g. the introduction and solution of an augmented system. We describe the necessary modifications for the multigrid algorithm and present convergence results as well as first registration experiments demonstrating the capabilities of the proposed approach. The work of the first author was supported by the Deutsche Forschungsgemeinschaft; grant HE 3404.  相似文献   

7.
We present algorithms for the randomized simulation of a shared memory machine (PRAM) on a Distributed Memory Machine (DMM). In a PRAM, memory conflicts occur only through concurrent access to the same cell, whereas the memory of a DMM is divided into modules, one for each processor, and concurrent accesses to the same module create a conflict. Thedelay of a simulation is the time needed to simulate a parallel memory access of the PRAM. Any general simulation of anm processor PRAM on ann processor DMM will necessarily have delay at leastm/n. A randomized simulation is calledtime-processor optimal if the delay isO(m/n) with high probability. Using a novel simulation scheme based on hashing we obtain a time-processor optimal simulation with delayO(log log(n) log*(n)). The best previous simulations use a simpler scheme based on hashing and have much larger delay: (log(n)/log log(n)) for the simulation of an n processor PRAM on ann processor DMM, and (log(n)) in the case where the simulation is time-processor optimal.Our simulations use several (two or three) hash functions to distribute the shared memory among the memory modules of the PRAM. The stochastic processes modeling the behavior of our algorithms and their analyses based on powerful classes of universal hash functions may be of independent interest.Research partially supported by NSF/DARPA Grant CCR-9005448. Work was done while at the University of California at Berkeley and the International Computer Science Institute, Berkeley, CA.Research partially supported by National Science Foundation Operating Grant CCR-9016468, National Science Foundation Operating Grant CCR-9304722, United States-Israel Binational Science Foundation Grant No. 89-00312, United States-Israel Binational Science Foundation Grant No. 92-00226, and ESPRIT BR Grant EC-US 030.Part of work was done during a visit at the International Computer Science Institute at Berkeley; supported in part by DFG-Forschergruppe Effiziente Nutzung massiv paralleler Systeme, Teilprojekt 4, and by the Esprit Basic Research Action Nr. 7141 (ALCOM II).  相似文献   

8.
In this paper we present a computationally economical method of recovering the projective motion of head mounted cameras or EyeTap devices, for use in wearable computer-mediated reality. The tracking system combines featureless vision and inertial methods in a closed loop system to achieve accurate robust head tracking using inexpensive sensors. The combination of inertial and vision techniques provides the high accuracy visual registration needed for fitting computer graphics onto real images and the robustness to large interframe camera motion due to fast head rotations. Operating on a 1.2 GHz Pentium III wearable computer with graphics accelerated hardware, the system is able to register live video images with less than 2 pixels of error (0.3 degrees) at 12 frames per second. Fast image registration is achieved by offloading computer vision computation onto the graphics hardware, which is readily available on many wearable computer systems. As an application of this tracking approach, we present a system which allows wearable computer users to share views of their current environments that have been stabilised to another viewer's head position.
Chris AimoneEmail:
  相似文献   

9.
The raster graphics display system described represents a general purpose mini-computer, specially for CAD applications. The system is based on a hierarchical asynchronous multiple microprocessor system. In practice this mini-computer is extendable up to 15–20 workstations. On the workstations, different graphical and non-graphical devices can be connected. The most interesting workstation is a raster graphics display device which was developed specially for the computer system described. This raster graphics display device contains a processor for the application program, two dedicated processors and two separate identical frame buffers, each of them containing one whole set of image data. Applying algorithms for anti-aliasing, virtual pixel dislocation (intensity dislocation) and multi-pixel-overlappings with hidden line (surface) elimination the image readability and quality can be increased considerably. In particular the paper deals with an anti-aliasing algorithm with a real-time hardware realization.  相似文献   

10.
本文介绍了一种基于Linux操作系统的嵌入式多媒体通信终端的软硬件方案的设计及实现。终端硬件电路设计以嵌入式处理器PXA250为核心,以可编程器件CPLD实现逻辑控制,外挂适当的存储器件及其它外围电路,从而构成一个嵌入式系统;软件设计通过对嵌入式系统中有限的处理能力和有限通信带宽资源的综合分析和优化分配,实现了语音、图像、电子白板、短信息以及文件传输等并发的多媒体通信。本终端达到了各项测试指标,在恶劣无线链路上工作稳定、性能优良。  相似文献   

11.
12.
Increased network speeds coupled with new services delivered via the Internet have increased the demand for intelligence and flexibility in network systems. This paper argues that both can be provided by new hardware platforms comprised of heterogeneous multi-core systems with specialized communication support. We present and evaluate an experimental network service platform that uses an emergent class of devices—network processors—as its communication support, coupled via a dedicated interconnect to a host processor acting as a computational core. A software infrastructure spanning both enables the dynamic creation of application-specific services on the network processor, mediated by middleware and controlled by kernel-level communication support. Experimental evaluations use a Pentium IV-based computational core coupled with an IXP 2400 network processor. The sample application services run on both include an image manipulation application and application-level multicasting.
Karsten SchwanEmail:
  相似文献   

13.
The increased programmability of graphics hardware allows efficient graphical processing unit (GPU) implementations of a wide range of general computations on commodity PCs. An important factor in such implementations is how to fully exploit the SIMD computing capacities offered by modern graphics processors. Linear expressions in the form of , where A is a matrix, and and are vectors, constitute one of the most basic operations in many scientific computations. In this paper, we propose a SIMD code optimization technique that enables efficient shader codes to be generated for evaluating linear expressions. It is shown that performance can be improved considerably by efficiently packing arithmetic operations into four‐wide SIMD instructions through reordering of the operations in linear expressions. We demonstrate that the presented technique can be used effectively for programming both vertex and pixel shaders for a variety of mathematical applications, including integrating differential equations and solving a sparse linear system of equations using iterative methods.  相似文献   

14.
基于DSP的实时图像处理系统   总被引:9,自引:8,他引:9  
以DSP TMS320C6416为核0处理器,设计了一种通用的MPEG-4实时图象处理系统。文中对系统的硬件系统及软件设计进行了详细的介绍。其中视频采集、运动估计算法和软件的优化是保证本系统高效工作的关键部分.因此,本文对其进行了重点讨论.提出了相应的解决方法。实验表明,该系统可以满足当前的远程监控、电视电话、会议电视、道路交通管理等诸多视频/图象处理与传输领域应用的需求。  相似文献   

15.
ATR's Evolutionary Systems Department aims to build (i.e. grow/evolve) an artificial brain by the year 2001. This artificial brain should initially contain thousands of interconnected artificial neural network modules, and be capable of controlling approximately 1000 “behaviors” in a “robot kitten”. The name given to this research project is “CAM-Brain”, because the neural networks (based on cellular automata) will be grown inside special hardware called Cellular Automata Machines (CAMs). Using a family of CAMs, each with its own processor to measure the performance quality or fitness of the evolved neural circuits, will allow the neural modules and their interconnections to be grown/evolved at electronic speeds. State of the art in CAM design is about 10 to the power 9 or 10 cells. Since a neural module of about 15 connected neurons can fit inside a cube of 100 cells on a side (1 million cells), a CAM which is specially adapted for CAM-Brain could contain thousands of interconnected modules, i.e. an artificial brain.  相似文献   

16.
Sohie  G.R.L. Kloker  K.L. 《Micro, IEEE》1988,8(6):49-67
A overview is given of Motorola's DSP96002, a digital signal processor that implements IEEE-standard floating-point arithmetic. It is designed for graphics, image processing, spectral analysis and scientific computing applications. Performance peaks at 40.5 Mflops (million floating-point operations per second) and 13.5 MIPS (million instructions per second) and 18 Mflops on assembly-language benchmarks. The DSP is software-compatible with the fixed-point 56000/1 family architecture and instruction set. The 96002 achieves compatibility with other processors and databases, higher mathematical accuracy, and better error handling than implementations that do not conform to the IEEE standard. The 96002's on-chip memories, dual-bus architecture, and transparent DMA are suitable for multiprocessor systems in which many 96002s connect with minimum external components. These features result in a smaller-footprint, lower-cost system than other microprocessors or data-path chips. On-chip support for the fast access modes of external memories achieves near-SRAM (static random-access memory) performance with high-density DRAM/VRAM (dynamic RAM/virtual RAM) devices. An on-chip circuit emulation controller provides full access and control of the machine state for system debugging. A variety of software and hardware development tools support the 96002  相似文献   

17.
Time capsule     
Eduardo Kac 《AI & Society》2000,14(2):243-249
Time Capsule is a work-experience that lies somewhere between a local eventinstallation, a site-specific work in which the site itself is both my body and a remote database, a simulcast on TV and the Web, and interactive webscanning of my body. The live component of the piece was realised on November 11, 1997, in the context of the exhibition Arte Suporte Computador, at the cultural centre Casa da Rosas, in Sao Paulo, Brazil. Time Capsule was carried live on the evening newscast of the TV station Canal 21 and on tape by two other TV stations (TV Manchete and TV Cultura). The webscast was transmitted by Casa das Rosas.The object that gives the piece its title is a microchip that contains a programmed identification number and that is integrated with a coil and a capacitor, all hermetically sealed in biocompatible glass. The temporal scale of the work is stretched between the ephemeral and the permanent; i.e., between the few minutes necessary for the completion of the basic procedure, the microchip implantation, and the permanent character of the implant. As with other underground time capsules, it is under the skin that this digital time capsule projects itself into the future.  相似文献   

18.
A novel hardware architecture for extracting region boundaries in two raster scan passes through a binary image is presented. The first pass gathers statistics regarding the size of each object contour. This information is used by the hardware to allocate dynamically off-chip memory for storage of boundary codes. In the second raster pass the same architecture constructs lists of grid-joint codes to represent the perimeter pixels of each object. These codes, referred to variously as crack codes or raster-chain codes in the literature, are later decoded by the hardware to reproduce the ordered sequence of coordinates surrounding each object. This list of coordinates is useful for a variety of shape recognition and manipulation algorithms that utilize boundary information. We present results of software simulations of the VLSI architecture, along with measurements on the coding efficiency of the basic algorithm, and estimates of the overall complexity of a proposed VLSI chip.  相似文献   

19.
The SCAN formal languages can be used for tight integration of image and video compression, encryption and data hiding. This work presents such a tightly integrated embeddable system, which can be used as a “black box” in streaming media. In our previous work we had studied and implemented separate modules for SCAN compression and SCAN encryption, using a large Virtex II FPGA for each. There were no implementation of data hiding, no integration of the three aspects of SCAN, and no complete design for decompression/decryption/unhiding. This paper presents a new architecture and a complete design for SCAN compression/encryption/hiding, as well as the corresponding decompression/decryption/data unhiding operations. A recent technology based on an embedded processor with reconfigurable fabric extensions has been used for this design, which was carried out to post place and route cycle-accurate simulations with real video sequences. The new design is of substantially lower performance versus the previous reconfigurable implementations of single modules, however, it proves that a low-cost embeddable system can be made for all three operations. This paper presents in detail the different aspects of the architecture, their integration, and their mapping to the fixed and reconfigurable resources of the Stretch S5000 reconfigurable processor. To our knowledge, this is the first tightly integrated compression/encryption/ information hiding system to be reported in the literature.
Nikolaos BourbakisEmail:
  相似文献   

20.
设计了一种采用高性能图像处理器DM642为核心的X光片辅助诊断系统。该系统在硬件方面利用了DM642丰富的外设接口,具有快速稳定的数据分析处理能力,无需改动医疗诊断设备,直接插入计算机PCI接口,从而满足了对医学图像进行复杂算法处理和实时输出图像的要求;软件方面采用了模块化设计,系统维护方便;图像增强实验表明,该系统处理效果显著。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号