首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
在ARMv8 64位多核处理器上基于OpenBLAS实现了四精度三角矩阵求解(QTRSM)。基于两种数据格式分别实现了QTRSM,第一种实现利用GCC编译器对long double数据类型的支持来实现QTRSM,第二种实现采用double-double数据格式及其相应的四精度加减法、乘法和除法。以long double数据类型QTRSM为测试基准,就不同矩阵规模下测试结果精度和时间与double-double数据格式QTRSM进行比较。实验结果表明:两者得到近似相同精度的数值结果,但double-double数据格式QTRSM的性能是long double数据类型QTRSM的1.6倍。随着线程数的增加,两种QTRSM实现的加速比接近2.0,具有较好的可扩展性。  相似文献   

2.
本文介绍一种用于高性能DSP的32位浮点乘法器设计,通过采用改进Booth编码的树状4-2压缩器结构,提高了速度,降低了功耗,该乘法器结构规则且适合于VLSI实现,单个周期内完成一次24位整数乘或者32位浮点乘。整个设计采用Verilog HDL语言结构级描述,用0.25um单元库进行逻辑综合.完成一次乘法运算时间为24.30ns.  相似文献   

3.
Shylashree  N.  Venkatesh  B.  Saurab  T. M.  Srinivasan  Tarun  Nath  Vijay 《Microsystem Technologies》2019,25(6):2349-2359

All modern computational devices consist of ALU. With increase in complexity of software and the consistent shift of software towards parallelism, high speed processors with hardware support for time consuming operations such as multiplication would benefit. Smaller, compact devices such as IoT devices need to run software such as security software and be able to offload computation cost from the cloud. In this paper, a high speed 8-bit ALU using 18 nm FinFET technology is proposed. The arithmetic and logical unit consists of fast compute units such as Kogge Stone fast adder and Dadda multiplier along with basic logic gates. In this paper, an ALU with each compute unit optimized for speed is proposed, while responsibly consuming area. Dadda multiplier is of 8 × 8 architecture as opposed to conventional approach of 4 × 4 making it a true 8-bit ALU. Simulation and analysis is done using Cadence Virtuoso in Analog Design Environment. The transistor count of proposed design is 5298, the power consumption is 219 µW and maximum delay is 166.8 ps. The design is also expected to consume a maximum of one clock cycle for any computation.

  相似文献   

4.
本文介绍一种用于高性能DSP的32位浮点乘法器设计,通过采用改进Booth编码的树状4-2压缩器结构,提高了速度,降低了功耗,该乘法器结构规则且适合于VLSI实现,单个周期内完成一次24位整数乘或者32位浮点乘。整个设计采用Verilog HDL语言结构级描述,用0.25um单元库进行逻辑综合.完成一次乘法运算时间为24.30ns.  相似文献   

5.
This paper outlines the early results of research into developing algorithms to permit the counting and tracking of moving vehicles in real world scenes using the CLIP4 parallel image processor.  相似文献   

6.
高玲  祝翔  李鸥 《微计算机信息》2006,83(8):224-226
异步处理器解决了传统的同步处理器时钟偏移的问题,具有低功耗和高并行性等优点。本文着重分析了设计异步处理器的关键技术及实现方法,分析比较了当前异步处理器的实现方式,指出了异步处理器的研究方向和重点。并展望了异步处理器技术在媒体处理领域中的应用。异步处理器虽然还没有得到实际的广泛应用,但具有很高的研究价值。  相似文献   

7.
A new model of an extracochlear prosthesis has been developed using a digital signal processor. As a speech coding method, a new idea has been proposed to simulataneously transmit the pitch signal and the second formant frequency through an electrode. A digital signal processor was used to extract both pitch and second formant frequencies in real time. This new method of speech coding has proven effective for discrimination of the five Japanese vowels.  相似文献   

8.
This paper describes a real-time vision system (RVS) architecture and performance and its use of an integrated memory array processor (IMAP) prototype. This prototype integrates eight 8-bit processors and a 144-kbit SRAM on a single chip. The RVS was developed with 64 IMAP prototypes connected in series in a 512 processor-system configuration. A host workstation can access the memory on the IMAP prototypes directly through a random access port. Images are inputted and outputted at high speed through serial access ports. The RVS performance is shown in real-time road-image processing and in a neural network simulation, as well as in low-level image processing algorithms, such as filtering, histograms, discrete cosine transform (DCT), and rotation. The RVS image processing is shown to be much faster than the video rate.  相似文献   

9.
Ultra high speed and moderate resolution ADCs with low latency are demanded in many applications.A 4-GS/s 8-bit ADC is implemented in the 0.35μm SiGe BiCMOS technology.It is based on the two-channel time-interleaved architecture and each sub-ADC employs the two-stage cascaded folding and interpolating topology which guarantees the low-latency property.Calibration circuits are introduced to compensate for the mismatch between the two sub-ADCs.The whole chip area is about 4.0×4.0(mm2).The ADC exhibits DNL of 0.26/0.34 LSB and INL of 0.96/0.92 LSB.The ENOB is 7.1 bits and the SFDR is about 56 dB at10.1 MHz input.The SNDR is above 42 dB over the first and the second Nyquist zone.The SFDR is above45 dB over the first Nyquist zone and the second Nyquist zone.The ERBW is about 1.4 GHz.  相似文献   

10.
This paper presents an effective algorithm, interactive 1-bit feedback segmentation using transductive inference (FSTI), that interactively reasons out image segmentation. In each round of interaction, FSTI queries the user one superpixel for acquiring 1-bit user feedback to define the label of that superpixel. The labeled superpixels collected so far are used to refine the segmentation and generate the next query. The key insight is treating the interactive segmentation as a transductive inference problem, and then suppressing the unnecessary queries via an intrinsic-graph-structure derived from transductive inference. The experiments conducted on five publicly available datasets show that selecting query superpixels concerning the intrinsic-graph-structure is helpful to improve the segmentation accuracy. In addition, an efficient boundary refinement is presented to improve segmentation quality by revising the misaligned boundaries of superpixels. The proposed FSTI algorithm provides a superior solution to the interactive image segmentation problem is evident.  相似文献   

11.
Using a simple example, we demonstrate how to design and analyze asynchronous systems from labeled Petri net specifications, later refining, transforming, and translating them for implementations  相似文献   

12.
在基于VLIW结构的分组密码专用处理器设计过程中,研究了VLIW处理器的指令集体系结构建模技术.设计了一个指令精确的指令集模拟器,通过附加一个流水线相关及停顿统计模块,实现了周期精确的程序运行统计和流水线停顿统计.结合指令集模拟器、汇编器以及调试器,设计了一个面向VLIW处理器的辅助程序优化环境.利用模拟器和调试器来评估程序的指令级并行度以及资源占用情况,辅助程序开发者优化VLIW处理器程序,从而达到软硬件协作开发VLIW处理器指令级并行性的最终目的.  相似文献   

13.
数据传输与互连技术是合成孔径雷达(SynthesisApertureRadar,简写为SAR)实时成像处理系统设计的关键技术之一。本文将当前的数据传输与互连技术分成基于网络、总线、交叉开关和专用技术等4类,分析了性能,并讨论了未来互连技术的发展方向。在此基础上,结合SAR信号处理的需求,提出了基于数据帧结构的通用串行分组交换数据传输技术,设计了物理层和链路层,并采用现场可编程门阵列(FieldProgrammableGateArray,简写为FPGA)完成了该技术的实现和测试。针对不同的SAR系统拓扑结构,分析了数据传输性能指标,结果表明,该技术能够完成SAR系统的高速数据传输和模块之间互连。  相似文献   

14.
15.
In this article, the implementation of a microcomputer-based control system using a low-cost 16-bit single-board microcomputer combined with a general-purpose data acquisition board is described. It is intended that this combination forms a suitable basis of a control system for the type of complex sensor systems currently associated with modern industrial robots. The technique of interfacing a 16-bit microcomputer to a peripheral environment which has essentially an 8-bit architecture is presented. The result is a high-performance, low-cost analog data processing and control system. A novel timing device with both software- and hardware-configurable features has been developed so that an accurate interval timer for use with sample data systems is also available. To demonstrate the potential use of the system, an illustrative example of real-time adaptive process control is described.  相似文献   

16.
Graphics Processing Units (GPUs), originally developed for computer games, now provide computational power for scientific applications. In this paper, we develop a general purpose Lattice Boltzmann code that runs entirely on a single GPU. The results show that: (1) simple precision floating point arithmetic is sufficient for LBM computation in comparison to double precision; (2) the implementation of LBM on GPUs allows us to achieve up to about one billion lattice update per second using single precision floating point; (3) GPUs provide an inexpensive alternative to large clusters for fluid dynamics prediction.  相似文献   

17.
《Computer Networks》2003,41(5):667-684
The increasing complexity of innovative real-time hardware/software systems forced industry to consider system-level design methods. Before actually implementing a system with hardware and software components, system-level design methods enable analysing the performance of different design alternatives that realise the required functionality. In order to develop performance models early in the design process, the parallel object-oriented specification language (POOSL) can be used. POOSL is an expressive modelling language for analysing complex real-time distributed hardware/software systems. Being equipped with a formal semantics, POOSL ensures unambiguous execution of models and proper application of performance analysis techniques. This paper discusses the use of POOSL for analysing the performance of a network processor. A network processor consists of components that perform their behaviour in a synchronously concurrent way, whereas POOSL is based on an asynchronous modelling paradigm. In this paper, we illustrate that constructing abstract models of synchronous systems for the purpose of performance analysis may benefit from an asynchronous modelling approach.  相似文献   

18.
The SAE 81C99 processor exhibits 4 different operation modes, 8 programmable fuzzy algorithms, and up to 256 inputs, 64 outputs, and 16,384 rules. The 1.0-μm CMOS chip serves as a stand-alone device or as an on-chip module for 8- or 16-bit microcontrollers. At 20-MHz crystal frequency and a maximum inference speed of 10 million rules/s, it supports very complex systems and millisecond (and faster) processes such as automotive electronics and pattern recognition  相似文献   

19.
Masataka Sassa 《Software》1979,9(6):439-456
A general-purpose pattern matching macro processor is described. Macro patterns can be defined using regular expressions. Macro calls are treated by balancing pattern matching at the token unit level, allowing options, alternatives and repetition. Thus, text in a language with a nested structure can be dealt with. In a macro body, Algol-style macro-time operations are allowed, which improves writing and reading. Our macro processor can also be used as a tool for language conversion since it incorporates a feature to declare language-dependent constructs such as comments, string notations and parentheses pairs. Although our macro processor is not biased towards any particular language, it has successfully converted an Algol 68-style text into a Fortran text. Problems of language conversion using macros are briefly discussed based upon the experience obtained through this macro processor.  相似文献   

20.
A class of critical computer requirements for real-time scan T.V. computer graphics is examined in relation to commercially available CPU architectures. Finding general purpose processors not suited, a new processor is proposed which is designed around the concept of ‘instruction set partitioning.’ In this design, special hardware-implemented algorithms may be included in the machine instruction set, and these processors allowed to operate asynchronously from each other. The design is projected to generate a complete new frame of a color T.V. picture every 0.1-0.8 s depending on image complexity. Due to its inherent generality, the CPU may be similarly expanded to encompass a wide variety of other specialized, or real-time tasks with minimal additional hardware. The 32-bit parallel processor has a design cycle time of 100 ns and is in the price class of a minicomputer.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号