首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
卷积神经网络-支持向量机(CNN-SVM)混合算法结合了 CNN特征提取能力和SVM分类性能,在计算复杂度和解决小样本问题上具有一定优势,目前已在故障诊断、医学图像处理等领域得到了一定应用,同时,由于其计算复杂度较低,也引起了边缘计算领域的关注.针对边缘计算场景中对算法性能和功耗的要求,提出了一种面向FPGA平台的CN...  相似文献   

2.
The high computational burden posed by modern control algorithms often precludes their industrial application using present day microcomputers. We evaluate the computational load of different logical and arithmetic operations and the capabilities of several computing systems (software and hardware). Real-time limitations are alleviated through the adoption of general techniques associated with the data representation. Such techniques achieve not only a more efficient management of the computational resources but also provide a deeper insight on developments toward future computer control architectures  相似文献   

3.
潮流计算是电力系统计算的基础,其核心是LU分解计算,因此电力系统潮流计算加速的关键在于LU分解加速。当前,基于中央处理器(CPU)的并行算法已经成熟,性能提升空间有限。图形处理器(GPU)作为协处理器,在科学计算方面具有强大的优越性,被广泛应用到电力系统潮流计算中。文中首先分析了GPU结构和并行运行架构,然后介绍了LU分解原理,并选择了合适的矩阵排序算法和稀疏矩阵存储模型,借助统一计算设备架构(CUDA)编程模型实现了基于GPU的单个LU分解和批量LU分解并行加速,最后在仿真设备上测试了5个不同的案例,对比分析其并行算法的加速效果。仿真测试结果表明,基于GPU的批量稀疏LU分解并行算法,平均可以获得25~50倍的加速效果。  相似文献   

4.
The central nervous system receives a vast amount of sensory inputs, and it should be able to discriminate and recognize different kinds of multisensory information. Winner-take-all (WTA) consists of a simple recurrent neural network carrying out discrimination of input signals through competition. This paper presents a real-time scalable digital hardware implementation of the spiking WTA network. The need for concurrent computing, real-time performance, proper accuracy, and the reconfigurable device has led to the field-programmable gate array (FPGA) as the target hardware platform. A set of techniques is employed to lessen memory and resource usage. The proposed architecture consists of multiprocessing elements, which share hardware resources between a specific number of neurons. We introduce a novel connectivity array for neurons (dedicated to the WTA network) to cut down memory usage. Also, a multiplier-less method in the neuron model and a novel tree adder in the synapse processing unit are designed to improve computational efficiency. The proposed network simulates 4,500 neurons in real time on a Xilinx Artix-7 FPGA, while a scalable architecture facilitates the implementation of up to 20,000 neurons on this device. The pipeline structure can guarantee real-time performance for large-scale networks. Based on simulation and physical synthesis results, the presented network mimics biological WTA dynamics and consumes efficient hardware resources.  相似文献   

5.
This paper extends the A* methodology to shortest path problems in dynamic networks, in which arc travel times are time dependent. We present efficient adaptations of the A* algorithm for computing fastest (minimum travel time) paths from one origin node to one destination node, for one as well as multiple departure times at the origin node, in a class of dynamic networks the link travel times of which satisfy the first-in-first-out property. We summarize useful properties of dynamic networks and develop improved lower bounds on minimum travel times. These lower bounds are exploited in designing efficient adaptations of the A* algorithm to solve instances of the one-to-one dynamic fastest path problem. The developed algorithms are implemented and their computational performance is analyzed experimentally. The performance of the computer implementations of the adaptations of the A* algorithm are compared to a dynamic adaptation of Dijkstra's algorithm, stopped when the destination node is selected. Comparative computational results obtained demonstrate that the algorithms of this paper are efficient. Using a network containing 3000 nodes, 10 000 links, and 100 time intervals, the dynamic adaptations of the A* led to a savings ratio of 11, in terms of number of nodes selected, and to a savings ratio of five in terms of computation time. The effect of the network size on the performance of these adaptations is also studied. It is shown that the computational savings in term of both the number of nodes selected and the computation time, increase with the size of the network topology  相似文献   

6.
Stone  J. Ercal  F. 《Potentials, IEEE》2001,20(2):31-33
Workstation clusters have become an increasingly popular alternative to traditional parallel supercomputers for many workloads requiring high performance computing. The use of parallel computing for scientific simulations has increased tremendously in the last ten years, and parallel implementations of scientific simulation codes are now in widespread use. There are two dominant parallel hardware/software architectures in use today: distributed memory, and shared memory. Systems implementing shared memory provide cooperating processes with a shared memory address space that can be accessed by all processors. In shared memory systems, parallel processing occurs through the use of shared data structures, or through emulation of message passing semantics in software. Distributed memory systems are composed of a number of interconnected computational nodes, which do not share memory, but can communicate with each other through a high-performance network of some kind. Parallelism is achieved on distributed memory systems with multiple copies of the parallel program running on different nodes, sending messages to each other to coordinate computations. The messages used in a distributed memory parallel program typically contain application data, synchronization information, and other data that controls the execution of the parallel program  相似文献   

7.
《Potentials, IEEE》2001,20(3):29-32
Active noise control (ANC) grabbed the research community's attention during the last half of the 20th Century; however, technological and computational limitations prevented any ANC algorithms from widespread use. Such is no longer the case. In this paper, the author describes how advances in computer and digital signal processing technology along with the relatively low cost of such hardware have facilitated their widespread use in the noise control of confined spaces  相似文献   

8.
改进差拍控制算法在有源滤波器中的应用   总被引:1,自引:1,他引:1  
为克服有源滤波器的无差拍控制法因算法复杂导致计算延时而影响补偿效果的缺点,优化了有源滤波器的预测控制目标函数,提出了一种改进差拍控制算法并仿真验证了其可行性。用该算法控制三相有源滤波器时,每个控制周期内只需12次加法和5次乘法运算,计算复杂度明显降低。仿真结果表明,改进算法的控制精度与传统算法十分接近;与定时比较控制法相比,改进算法可减少部分硬件环节,提高了数字化程度。将主要结论与谐波检测预报相结合可实现对无差拍控制法的简化。  相似文献   

9.
以求解无功优化问题的内嵌离散惩罚非线性原对偶内点法为基础,利用高性能图形处理器实现了线性修正方程的并行求解。将计算密集部分在图形处理器上实现,其余部分在CPU上执行,并且采用单精度和双精度两种模式进行对照。该算法充分利用了图形处理器强大的并行处理能力和极高的存储器带宽,可获得显著的加速效果。在IEEE 118节点系统和实际538、1133和2212节点系统的计算表明,采用单精度浮点运算的无功优化计算速度最快,加速效果最好,在2212节点系统上的加速比达到近30倍。  相似文献   

10.
A hardware platform using broadband antenna, oscilloscope, and spectrum analyzer is designed to receive radio frequency (RF) signals from electromagnetic radiation leakage of computers in the office environment. The process of receiving and the processing techniques have also been given. Then, the software radio-based computing models and software algorithms are proposed to demodulate and decode the RF signals. An experimental result shows that the text information can be recovered from electromagnetic (EM) leakage wave of computer by this interception system. This architecture not only reduces the cost of the system’s hardware but also makes interception more flexible. The innovation points of this paper are recovering the video information in EM leakage wave of computers in an ordinary office environment based on public equipments and proposing the process of receiving processing techniques that only use the software radio-based computing models and software algorithms.  相似文献   

11.
How can I select the best simulator? Most people would think it is obvious. Some will take the most accurate. Others will take the quickest, or the cheapest. All these selection criteria need to be looked at as a whole in order to make an informed choice. Agood overall criterion would be something like the simulator's quality factor: Q Accuracy/Effort. Therefore, choose the program which gives you the best accuracy for a given simulation duration, or sum of money, or RAM, or all three together (Effort) and you won't do anything wrong. Or, if it's accuracy that is of utmost importance, choose the program that achieves the desired accuracy with the least time and memory effort. By the way, do not forget about the labor costs while integrating the software into your design flow. A program with a good user interface and a high degree of automation will save valuable engineer's worktime and therefore money. Beware of brute-force hardware arguments like "on a cluster, program X is also very quick". An intelligent algorithm is quick on any type of hardware and is even quicker on a faster computer, clusters or graphics acceleration card. It's the combination of intelligent algorithms and best available hardware that will give the user the optimal computing speed. As we have seen, there is not one single solver approach best suited for all types of applications. It is very convenient if several solver types can be selected just from the one modeling interface. It would be even more convenient, if the software chose the best suited solver by itself.  相似文献   

12.
Khel  I.A.K. Ali  M.A. 《Potentials, IEEE》1999,18(2):33-35
High speed networks and improved microprocessor performances are making workstations an appealing prospect for parallel computing. With just commodity hardware and software, networked workstations can offer parallel processing at a relatively low cost. Parallel computing can be implemented in two ways. The networked workstations can be set up as a processor bank with dedicated processors providing computing cycles. Or, it can consist of a dynamically varying set of machines that perform long running computations during idle periods. In the latter case, the hardware cost is essentially zero since many organizations already have extensive workstation networks. For some applications, networked workstations can approach or exceed supercomputer performance. However, these loosely coupled multiprocessors will by no means replace the more tightly coupled designs. Supercomputer lower latencies (time elapsed between issuing a memory request and receiving the corresponding data from memory) and higher bandwidths are more efficient for applications with stringent synchronization and communication requirements. But, advances in networking technology and processor performance are expanding the applications that can be executed efficiently on networked workstations  相似文献   

13.
科学计算可视化及其在电力系统中的应用前景   总被引:17,自引:2,他引:17  
韩祯祥  吕捷 《电网技术》1996,20(7):22-27
科学计算可视化是近年发展起来的计算机应用技术。它融合了图形学、图象处理、数据管理、计算机网络和其它相关领域技术,目的在于解决巨量数据的处理和信息的综合表示问题,提高信息的利用效率。电力系统是一个复杂的大系统,应用科学计算可视化将有助于电力系统的研空发和发展。本介绍了科学计算可视化的技术和应用状况,并结合电力系统中信片 数据的特点,提出了可视化技术在电力系统研究中应用的方式和设想。  相似文献   

14.
针对电容型电气设备介质损耗角在线检测问题,设计了专用通用硬件并利用National Instrument公司的CVI软件开发工具,开发出的虚拟仪器很好地实现了电容型设备介质损耗角的在线检测。该虚拟仪器大大突破了传统仪器在数据处理、显示、传送、存储等方面的限制,具有智能化程度高、性能价格比优越、使用方便以及研制周期短等特点。文章介绍了该虚拟仪器的工作原理及软、硬件设计方法。最后给出了仪器的测试结果。  相似文献   

15.
16.
This paper addresses the problem of solving computationally intensive algorithms such as multimedia and graphics applications. A novel methodology to design embedded compute-intensive processing elements (ECIPEs) is proposed. In order to identify common data flow patterns among core data flow graphs (DFGs), a low-complexity and parallelism-aware common subgraph extraction algorithm is proposed. In addition, a reconfiguration-aware static scheduling technique to manage task and resource dependencies is proposed. To validate the success of this approach, estimates of reconfiguration times obtained by performing several experiments (on an assorted set of algorithms taken from media standards such as MPEG-4 and frequently used graphics algorithms) are provided, and the potential for reduction in the number of reconfiguration cycles is shown.  相似文献   

17.
基于FPGA/Nios-Ⅱ的矩阵运算硬件加速器设计   总被引:3,自引:0,他引:3  
针对复杂算法中矩阵运算量大,计算复杂,耗时多,制约算法在线计算性能的问题,从硬件实现角度,研究基于FPGA/Nios-Ⅱ的矩阵运算硬件加速器设计,实现矩阵并行计算。首先根据矩阵运算的算法分析,设计了矩阵并行计算的硬件实现结构,并在Modelsim中进行功能模块的仿真,然后将功能模块集成一个自定制组件,并通过Avalon总线与NiosⅡ主处理器通信,作为硬件加速器。最后在FPGA芯片中构建SoPC系统,并在Altera DE3开发板中进行矩阵实时计算测试。测试结果验证了基于FPGA/Nios-Ⅱ矩阵运算硬件加速器的正确性、可行性以及较高的计算性能。  相似文献   

18.
日光跟踪技术能明显提高光伏系统的效率,传统的光电检测日光跟踪技术的跟踪精度低,基于图像处理的日光检测跟踪技术精度虽高但算法复杂,系统成本高。通过图像处理进行太阳光斑检测跟踪的一般方法,包括太阳光斑图像的二值化、形心提取、差量计算等算法的分析,提出了一种简化的形心提取方法,通过与传统形心提取算法比较,误差很小并不足以对跟踪精度造成实质性的影响,最后用ARM-CortexM3内核的芯片作为载体实现了对太阳光斑图像的检测跟踪算法。  相似文献   

19.
在实时信号处理系统的设计中,要求用尽量少的硬件资源实现高速的FFT蝶形运算,本文介绍了一种高效复数流水线蝶形单元的FPGA实现,该方法充分结合信号处理算法和EDA优化手段,从成本和速度两个方面折中考虑,在大大减少存储单元和提高速度的同时,不牺牲额外的硬件成本.其性能对于大点数FFT运算有明显的优势.  相似文献   

20.
一类智能控制的认知框架及其实现   总被引:3,自引:1,他引:2  
基于认知科学的观点,提出了一类智能控制的认知框架,它由智能计算结构和人构成,能实现人机结合、智能互补。其中的智能计算结构是一种包含两级处理的紧耦合结构,前级通过非线性变换,实现对输入信息的扩展与增强;后级完成中枢处理的动态局域网络,能实现信息空间的优化分解,降低计算复杂性,提高效率实时性。该文分析了框架的功能及有关环节的实现手段,并给出了设计实例。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号