首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
不同框架深度学习模型部署是人工智能落地的核心,然而模型计算量和参数量过大、编程模型未统一导致了各种新型的专用卷积神经网络(CNN)加速器层出不穷,增加了模型的部署难度。对模型压缩和编译工具链这两个方面进行了改进:在模型压缩方面,提出新的通道剪枝标准,结合了通道的相关性和影响性以及输出通道对应的激活值,在保证精度的同时可以极大地削减卷积神经网络的计算量和参数量;在编译工具链方面,设计了一套自动的端到端优化堆栈,提出了针对基于现场可编程门阵列(FPGA)的深度学习编译器设计方法,并在中间表示中添加了所提出的排序标准的剪枝算法。实验结果表明,所设计的编译器于舰船目标检测的任务中,在通用设备上,保证精度损失不超过1%的情况下取得了1.3倍的加速效果;在专用的CNN加速器上取得了1.6倍的加速效果,在部署中能够有效地针对卷积网络进行加速。  相似文献   

2.
不同框架深度学习模型部署是人工智能落地的核心,然而模型计算量和参数量过大、编程模型未统一导致了各种新型的专用卷积神经网络(CNN)加速器层出不穷,增加了模型的部署难度。对模型压缩和编译工具链这两个方面进行了改进:在模型压缩方面,提出新的通道剪枝标准,结合了通道的相关性和影响性以及输出通道对应的激活值,在保证精度的同时可以极大地削减卷积神经网络的计算量和参数量;在编译工具链方面,设计了一套自动的端到端优化堆栈,提出了针对基于现场可编程门阵列(FPGA)的深度学习编译器设计方法,并在中间表示中添加了所提出的排序标准的剪枝算法。实验结果表明,所设计的编译器于舰船目标检测的任务中,在通用设备上,保证精度损失不超过1%的情况下取得了1.3倍的加速效果;在专用的CNN加速器上取得了1.6倍的加速效果,在部署中能够有效地针对卷积网络进行加速。  相似文献   

3.
为提升轻量级卷积神经网络在硬件平台的资源利用效率和推理速度,基于软硬件协同优化的思想,提出一种面向FPGA平台的轻量级卷积神经网络加速器,并针对网络结构的特性设计专门的硬件架构。与多级并行策略结合,设计一种统一的卷积层计算单元。为降低模型存储成本、提高加速器的吞吐量,提出一种基于可微阈值的选择性移位量化方案,使计算单元能够以硬件友好的形式执行计算。实验结果表明,在Arria 10 FPGA平台上部署的MobileNetV2加速器能够达到311 fps的推理速度,相比CPU版本实现了约9.3倍的加速比、GPU版本约3倍的加速比。在吞吐量方面,加速器能够实现98.62 GOPS。  相似文献   

4.
随着以卷积神经网络为代表的深度学习得到广泛应用,神经网络模型中的计算量也急速增长,推动了深度学习加速器的发展。如何针对加速器硬件的体系结构特性进行加速和优化神经网络模型的性能成为研究热点。针对自主设计的多核向量加速器FT-M7004上的VGG网络模型推理和训练算法,分别提出了卷积、池化和全连接等核心算子的向量化映射方法,采用SIMD向量化、DMA双缓冲传输和权值共享等优化策略,充分发挥了向量加速器的体系结构优势,取得了较高的计算效率。实验结果表明,在FT-M7004平台上,卷积层推理和训练的平均计算效率分别达到了86.62%和69.63%;全连接层推理和训练的平均计算效率分别达到了93.17%和81.98%;VGG网络模型在FT-M7004上的推理计算效率超过GPU平台20%以上。  相似文献   

5.
基于软硬件协同设计的思想,利用HLS工具,在PYNQ-Z2平台上设计并实现了一个卷积神经网络加速器,对卷积运算采用矩阵切割的优化方法,均衡了资源消耗和计算资源,使得加速器的性能达到了最优。利用MNIST数据集对加速器IP核进行性能测试,实验结果表明:对单张图片的测试,该加速器相对于ARM平台实现了5.785的加速效果,对于1 000张图片的测试则可达到9.72的加速效果,随着测试图片数量的不断增加,加速器的性能也将越来越优。  相似文献   

6.
针对卷积神经网络在嵌入式系统需要耗费大量计算资源、计算复杂度高等问题,提出一种基于ZYNQ系列FPGA的加速方法。通过HLS工具对卷积神经网络加速器进行设计,提出相邻层位宽合并和权重参数重排序的策略实现数据传输的优化,利用卷积分解、并行展开充分发挥FPGA并行计算的优势。为验证卷积神经网络加速器的加速效果,将YOLO目标检测模型进行部署。实验结果表明,在PYNQ-Z2上达到了39.39GOP/s的计算性能,是intel i5-2400 CPU的3.4倍,是ARM-Cortex A9 CPU的147.5倍。在相同FPGA平台上与之前的工作相较也有更高的性能。  相似文献   

7.
传统的卷积神经网络加速器及推理框架在资源约束的FPGA上部署模型时,往往面临设备种类繁多且资源极端受限、数据带宽利用不充分、算子操作类型复杂难以适配且调度不合理等诸多挑战.提出一种面向嵌入式FPGA的卷积神经网络稀疏化加速框架(sparse acceleration framework of convolutional neural network, SAF-CNN),通过软硬件协同设计的方法,从硬件加速器与软件推理框架2个角度进行联合优化.首先, SAF-CNN构建并行计算阵列,并且设计并行编解码方案,实现单周期多数据的传输,有效减少通信代价.其次,设计细粒度结构化块划分剪枝算法,于输入通道维度进行块内裁剪来获得稀疏且规则的权重矩阵,借此显著降低计算规模和DSP乘法器等资源占用.然后,提出一种兼容深度可分离卷积的输入通道维度动态拓展及运行时调度策略,实现输入通道参数灵活适配与逐通道卷积和逐点卷积的资源复用.最后,提出一种计算图重构及硬件算子融合优化方法,提升硬件执行效率.实验采用2种资源受限的低端FPGA异构平台Intel CycloneV与Xilinx ZU3EG,结果表明SAF-...  相似文献   

8.
卷积神经网络(Convolutional Neural Network,CNN)是目前主流视觉算法不可或缺的关键部分.为提高CNN模型推理速度,学界提出了众多异构加速方法以满足不同场景下的多元加速需求.但如何在资源与能耗受限的在轨卫星上稳定高效地加速CNN仍是极具挑战的课题.为此,本文通过软硬件协同设计,着力优化微指令编码、指令级并行和运算级并行3个加速器设计的关键部分,在星上常见的Xilinx VX690T FPGA芯片上设计实现了一种微指令序列调度数据流的CNN加速器.在软件层面,本文提出一种可扩展的微指令编码格式及相应的编译方法.通过卷积循环分块和算子融合策略实现图级别优化,生成加速器可执行的微指令序列.在硬件层面,本文设计实现了一个由微控制器与逻辑运算器组成的RTL级CNN加速器.微控制器通过粗粒度流水线实现各类指令的并行执行.逻辑运算器通过DSP48E1计算资源级联所构建的计算阵列实现卷积算子的细粒度并行运算.实验结果表明,加速器设计功耗10.68W,在加速YOLOV3Tiny算法时,峰值吞吐率(Runtime Max Throughput,RMT)达到378.63 GOP/...  相似文献   

9.
张坤宁  赵烁  何虎  邓宁  杨旭 《计算机工程》2021,47(4):153-157
为提高卷积神经网络(CNN)的计算效率和能效,以8 bit定点数据作为输入,设计一个支持激活、批标准化以及池化等CNN网络中常见计算类型的卷积加速器,优化循环计算顺序并将其与数据复用技术相结合,以提高卷积计算的效率。基于软硬件协同设计思想,构建包含RISC-V处理器和卷积加速器的SoC系统,RISC-V处理器基于开源的指令集标准,可以根据具体的设计需求扩展指令功能。将该SoC系统部署在Xilinx ZCU102开发板上,RISC-V处理器和卷积加速器分别工作在100 MHz和300 MHz频率下,测试结果表明,该加速器的算力达到153.6 GOP/s,运行VGG16网络进行图片推理计算时加速效果较好。  相似文献   

10.
神经网络参数量和运算量的扩大,使得在资源有限的硬件平台上流水线部署神经网络变得更加困难。基于此,提出了一种解决深度学习模型在小型边缘计算平台上部署困难的方法。该方法基于应用于自定义数据集的深度可分离网络模型,在软件端使用迁移学习、敏感度分析和剪枝量化的步骤进行模型压缩,在硬件端分析并设计了适用于有限资源FPGA的流水线硬件加速器。实验结果表明,经过软件端的网络压缩优化,这种量化部署模型具有94.60%的高准确率,16.64 M的较低的单次推理定点数运算量和0.079 M的参数量。此外,经过硬件资源优化后,在国产FPGA开发板上进行流水线部署,推理帧率达到了366 FPS,计算能效为8.57 GOPS/W。这一研究提供了一种在小型边缘计算平台上高性能部署深度学习模型的解决方案。  相似文献   

11.
Abstract This paper describes an approach to the design of interactive multimedia materials being developed in a European Community project. The developmental process is seen as a dialogue between technologists and teachers. This dialogue is often problematic because of the differences in training, experience and culture between them. Conditions needed for fruitful dialogue are described and the generic model for learning design used in the project is explained.  相似文献   

12.
European Community policy and the market   总被引:1,自引:0,他引:1  
Abstract This paper starts with some reflections on the policy considerations and priorities which are shaping European Commission (EC) research programmes. Then it attempts to position the current projects which seek to capitalise on information and communications technologies for learning in relation to these priorities and the apparent realities of the marketplace. It concludes that while there are grounds to be optimistic about the contribution EC programmes can make to the efficiency and standard of education and training, they are still too technology driven.  相似文献   

13.
融合集成方法已经广泛应用在模式识别领域,然而一些基分类器实时性能稳定性较差,导致多分类器融合性能差,针对上述问题本文提出了一种新的基于多分类器的子融合集成分类器系统。该方法考虑在度量层融合层次之上通过对各类基多分类器进行动态选择,票数最多的类别作为融合系统中对特征向量识别的类别,构成一种新的自适应子融合集成分类器方法。实验表明,该方法比传统的分类器以及分类融合方法识别准确率明显更高,具有更好的鲁棒性。  相似文献   

14.
Development of software intensive systems (systems) in practice involves a series of self-contained phases for the lifecycle of a system. Semantic and temporal gaps, which occur among phases and among developer disciplines within and across phases, hinder the ongoing development of a system because of the interdependencies among phases and among disciplines. Such gaps are magnified among systems that are developed at different times by different development teams, which may limit reuse of artifacts of systems development and interoperability among the systems. This article discusses such gaps and a systems development process for avoiding them.  相似文献   

15.
This paper presents control charts models and the necessary simulation software for the location of economic values of the control parameters. The simulation program is written in FORTRAN, requires only 10K of main storage, and can run on most mini and micro computers. Two models are presented - one describes the process when it is operating at full capacity and the other when the process is operating under capacity. The models allow the product quality to deteriorate to a further level before an existing out-of-control state is detected, and they can also be used in situations where no prior knowledge exists of the out-of-control causes and the resulting proportion defectives.  相似文献   

16.
Going through a few examples of robot artists who are recognized worldwide, we try to analyze the deepest meaning of what is called “robot art” and the related art field definition. We also try to highlight its well-marked borders, such as kinetic sculptures, kinetic art, cyber art, and cyberpunk. A brief excursion into the importance of the context, the message, and its semiotics is also provided, case by case, together with a few hints on the history of this discipline in the light of an artistic perspective. Therefore, the aim of this article is to try to summarize the main characteristics that might classify robot art as a unique and innovative discipline, and to track down some of the principles by which a robotic artifact can or cannot be considered an art piece in terms of social, cultural, and strictly artistic interest. This work was presented in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008  相似文献   

17.
Although there are many arguments that logic is an appropriate tool for artificial intelligence, there has been a perceived problem with the monotonicity of classical logic. This paper elaborates on the idea that reasoning should be viewed as theory formation where logic tells us the consequences of our assumptions. The two activities of predicting what is expected to be true and explaining observations are considered in a simple theory formation framework. Properties of each activity are discussed, along with a number of proposals as to what should be predicted or accepted as reasonable explanations. An architecture is proposed to combine explanation and prediction into one coherent framework. Algorithms used to implement the system as well as examples from a running implementation are given.  相似文献   

18.
This paper provides the author's personal views and perspectives on software process improvement. Starting with his first work on technology assessment in IBM over 20 years ago, Watts Humphrey describes the process improvement work he has been directly involved in. This includes the development of the early process assessment methods, the original design of the CMM, and the introduction of the Personal Software Process (PSP)SM and Team Software Process (TSP){SM}. In addition to describing the original motivation for this work, the author also reviews many of the problems he and his associates encountered and why they solved them the way they did. He also comments on the outstanding issues and likely directions for future work. Finally, this work has built on the experiences and contributions of many people. Mr. Humphrey only describes work that he was personally involved in and he names many of the key contributors. However, so many people have been involved in this work that a full list of the important participants would be impractical.  相似文献   

19.
基于复小波噪声方差显著修正的SAR图像去噪   总被引:4,自引:1,他引:3  
提出了一种基于复小波域统计建模与噪声方差估计显著性修正相结合的合成孔径雷达(Synthetic Aperture Radar,SAR)图像斑点噪声滤波方法。该方法首先通过对数变换将乘性噪声模型转化为加性噪声模型,然后对变换后的图像进行双树复小波变换(Dualtree Complex Wavelet Transform,DCWT),并对复数小波系数的统计分布进行建模。在此先验分布的基础上,通过运用贝叶斯估计方法从含噪系数中恢复原始系数,达到滤除噪声的目的。实验结果表明该方法在去除噪声的同时保留了图像的细节信息,取得了很好的降噪效果。  相似文献   

20.
蒙古语言是中国蒙古族使用的通用语言,由于蒙古文区别于其他文字的书写方式和其自身变形机制等特点,在很多通用的文字处理引擎中都不被支持。在嵌入式产品开发与应用领域中Linux加QTE已经成为流行方式。该文给出了一种在QTE环境上实现基于标准Unicode的蒙古文点阵显示和变形算法, 并自定义了支持蒙古文的QTE组件,扩展了QTE功能,为在Linux加QTE方式的嵌入式体系结构中处理蒙古文提供了一种解决方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号