共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
基于嵌入式平台的复杂背景目标跟踪技术在智能视频监控设备、无人机跟踪等领域有重要作用.卷积神经网络在跟踪问题上有准确率高、鲁棒性强的优点,但基于卷积特征的算法计算复杂度高,受嵌入式平台面积和功耗的限制,实时性难以满足嵌入式平台应用场景的需求.针对基于卷积特征的跟踪算法计算复杂度高、存储参数量大的难题,率先提出一种利用FPGA实现基于卷积神经网络的复杂背景目标跟踪硬件加速架构.该方法通过利用KL相对熵对目标跟踪算法Siamese-FC进行定点量化,设计了基于通道并行的卷积层加速架构.实验结果表明,定点量化后跟踪算法相比于原算法的平均精度损失不超过4.57%,FPGA部署后前向推理耗时仅为CPU的16.15%,功耗仅为CPU的13.7%. 相似文献
3.
针对目前大多数嵌入式人脸检测系统实时性差的问题,通过优化的人脸检测算法和软硬件协同处理方式达到加速人脸检测的目的。基于ZYNQ SoC架构下,利用YCbCr肤色空间算法在FPGA部分加速提取肤色区域,利用优化的Adaboost算法与Phash算法在双核ARM中完成人脸检测与追踪,输出检测到的人脸。实验表明,提出的优化人脸检测算法相比传统的Adaboost人脸检测算法更具实时性,并且通过合理的软硬件协同处理也可以加快人脸检测速率,同时减少系统硬件资源消耗量从而降低成本。 相似文献
4.
Ulf Jensen Patrick Kugler Matthias Ring Bjoern M. Eskofier 《Pattern Analysis & Applications》2016,19(3):839-855
Smart embedded systems often run sophisticated pattern recognition algorithms and are found in many areas like automotive, sports and medicine. The developer of such a system is often confronted with the accuracy–cost conflict as the resulting system should be as accurate as possible while being able to run on resource constraint hardware. This article introduces a method to support the solution of this design conflict with accuracy–cost reports. These reports compare classification systems regarding their classification rate (accuracy) and the mathematical operations and parameters of the working phase (cost). Our method is used to deduce the specific cost of various popular pattern recognition algorithms and to derive the overall cost of a classification system. We also show how our analysis can be used to estimate the computational cost for specific hardware architectures. A software toolbox to create accuracy–cost reports was implemented to facilitate the automatic classification system comparison with the presented methodology. The software is available for download and as supplementary material. We performed different experiments on synthetic and real-world data to underline the value of this analysis. Accurate and computationally cheap classification systems were easily identified. We were even able to find a better implementation candidate in an existing embedded classification problem. This work is the first step towards a comprehensive support tool for the design of embedded classification systems. 相似文献
5.
Hardware–software partitioning (HW/SW) divides an application into software and hardware. It is one of the crucial steps in embedded system design. For a given task, hardware with different areas may provide different execution speeds due to the potential of parallel execution in hardware implementation. Thus, one task may have multiple-choice in hardware implementation according to the available hardware areas. Existing HW/SW partitioning approaches typically consider only a single implementation manner in hardware, overlooking the multiple-choice of hardware implementations. This paper presents a computing model to cater for the HW/SW partitioning problems with the multiple-choice implementation in hardware. An efficient heuristic algorithm is proposed to rapidly generate approximate solution, that is further refined by a tabu search algorithm also customized in this paper. Moreover, a dynamic programming algorithm is proposed for the exact solution of the relatively small problems. Extensive simulation results show that the approximate solutions are very close to the exact ones, and they can be refined by tabu search to the solutions with the error no more than 1.5% for all cases considered in this paper. 相似文献
6.
Sankalita Saha Neal K. Bambha Shuvra S. Bhattacharyya 《Computer Vision and Image Understanding》2010,114(11):1203-1214
Particle filtering methods are gradually attaining significant importance in a variety of embedded computer vision applications. For example, in smart camera systems, object tracking is a very important application and particle filter based tracking algorithms have shown promising results with robust tracking performance. However, most particle filters involve vast amount of computational complexity, thereby intensifying the challenges faced in their real-time, embedded implementation. Many of these applications share common characteristics, and the same system design can be reused by identifying and varying key system parameters and varying them appropriately. In this paper, we present a System-on-Chip (SoC) architecture involving both hardware and software components for a class of particle filters. The framework uses parameterization to enable fast and efficient reuse of the architecture with minimal re-design effort for a wide range of particle filtering applications as well as implementation platforms. 相似文献
7.
8.
Classification techniques development constitutes a foundation for machine learning evolution, which has become a major part of the current mainstream of Artificial Intelligence research lines. However, the computational cost associated with these techniques limits their use in resource constrained embedded platforms. As the classification task is often combined with other high computational cost functions, efficient performance of the main modules is fundamental requirements to achieve hard real-time speed for the whole system. Graph-based machine learning techniques offer a powerful framework for building classifiers. Optimum-Path Forest (OPF) is a graph-based classifier presenting the interesting ability to provide nonlinear classes separation surfaces. This work proposes a SoC/FPGA based design and implementation of an architecture for embedded applications, presenting a hardware converted algorithm for an OPF classifier. Comparison of the achieved results with an embedded processor software implementation shows accelerations of the OPF classification from 2.18 to 9 times, which permits to expect real-time performance to embedded applications. 相似文献
9.
针对现存很多跟踪算法在速度和准确度方面很难满足嵌入式跟踪开发的需要,提出一种基于Harris角点和金字塔光流法的快速跟踪算法,并详细给出了DSP-FPGA的硬件设计。首先,使用Harris角点提取目标角点特征;然后,使用金字塔光流法为后续视频帧匹配角点;最后,基于角点的质心跟踪算法用于匹配目标的重心,确定目标的位置,重心跟踪算法可以较好地抵消由于旋转或扭曲带来的形变问题。在硬件实现过程中,FPGA方便电路设计,使用硬件描述程序语言实现硬件算法、逻辑控制和外部接口,DSP则运行目标跟踪算法。实验结果验证了本文硬件实现算法的有效性,相比于AVT21开发板的质心跟踪算法、相位相关跟踪算法和金字塔相关性跟踪算法相比,本文算法在平均重叠和平均中心误差方面具有一定优势,在720p的视频流上可以满足25fps。 相似文献
10.
11.
Yen-Hsiang Chen Kai-Ti Hu Shanq-Jang Ruan 《Engineering Applications of Artificial Intelligence》2012,25(7):1331-1337
Skin color is the significant information for many emerging applications in surveillance systems. However, the common skin color models usually need to perform color space transformation. This is not suitable for direct hardware implementation. This paper develops a statistical skin color model using the default RGB color space, which is especially suitable to implement on hardware for image processing applications. Moreover, an efficient face detection system is also proposed with our skin color model for hardware implementation. Compared with other skin color models, the proposed model produces the highest detection rate. Furthermore, the extended face detection system also significantly decreases the computational cost of the hardware implementation based on our skin color model. Experimental results demonstrate that our proposed detection system can be easily implemented on a field-programmable gate array (FPGA), where only 3202 logic cells is occupied with the high detection rate. 相似文献
12.
F. Javier Toledo-Moreo J. Javier Martínez-Alvarez Javier Garrigós-Guerrero J. Manuel Ferrández-Vicente 《Journal of Systems Architecture》2012,58(8):277-285
Bidimensional convolution is a low-level processing algorithm of interest in many areas, but its high computational cost constrains the size of the kernels, especially in real-time embedded systems. This paper presents a hardware architecture for the FPGA-based implementation of 2-D convolution with medium–large kernels. It is a multiplierless solution based on Distributed Arithmetic implemented using general purpose resources in FPGAs. Our proposal is modular and coefficient independent, so it remains fully flexible and customizable for any application. The architecture design includes a control unit to manage efficiently the operations at the borders of the input array. Results in terms of occupied resources and timing are reported for different configurations. We compare these results with other approaches in the state of the art to validate our approach. 相似文献
13.
《Journal of Systems Architecture》2013,59(3):155-164
In this paper, we report a hardware/software (HW/SW) co-designed K-means clustering algorithm with high flexibility and high performance for machine learning, pattern recognition and multimedia applications. The contributions of this work can be attributed to two aspects. The first is the hardware architecture for nearest neighbor searching, which is used to overcome the main computational cost of a K-means clustering algorithm. The second aspect is the high flexibility for different applications which comes from not only the software but also the hardware. High flexibility with respect to the number of training data samples, the dimensionality of each sample vector, the number of clusters, and the target application, is one of the major shortcomings of dedicated hardware implementations for the K-means algorithm. In particular, the HW/SW K-means algorithm is extendable to embedded systems and mobile devices. We benchmark our multi-purpose K-means system against the application of handwritten digit recognition, face recognition and image segmentation to demonstrate its excellent performance, high flexibility, fast clustering speed, short recognition time, good recognition rate and versatile functionality. 相似文献
14.
15.
基于Bootloader的可靠嵌入式软件远程更新机制 总被引:6,自引:0,他引:6
嵌入式软件的远程自动更新技术能够显著的降低嵌入式系统的维护成本,而更新过程的可靠性直接影响着远程更新的质量.本文针对基于bootloader的嵌入式系统,提出了一种高可靠的嵌入式软件远程自动更新机制,并以采用ARM微处理器、嵌入式Linux操作系统和无线网络接口的嵌入式平台为例给出了更新机制的软硬件实现方案.最后在实际系统中对更新机制的性能进行了测试.测试结果表明,本更新机制具有良好的抗干扰能力,能有效地提高嵌入式软件远程更新的可靠性. 相似文献
16.
混合高斯模型由于其计算量大,算法结构复杂,难以在嵌入式系统中实现运动物体的实时检测,为解决此问题,文中提出了一种基于改进型混合高斯模型的实时运动检测方案,对混合高斯模型进行简化和结构调整,同时进行了C语言层面和CPU层级的优化,使其更合适于嵌入式平台,并详细分析了DM6446平台的软硬件设计,介绍了该算法在DM6446平台上的实现过程;实验结果表明:该系统能够有效克服外界环境变化带来的干扰,能够实时检测,可以实现多目标跟踪。 相似文献
17.
嵌入式系统GUI调色板查找改进算法 总被引:1,自引:0,他引:1
通过分析硬件调色板的基本工作原理和嵌入式系统GUI图形引擎中调色板查找算法的实现,提出了一种应用于硬件调色板的嵌入式系统GUI中,基于软件Cache技术的改进调色板查找算法,极大地提高了嵌入式系统GUI图形引擎的效率。 相似文献
18.
19.
This paper presents a novel algorithm for field programmable gate array (FPGA) realization of vector quantizer (VQ) encoders using partial distance search (PDS). In most applications, the PDS is adopted as a software approach for attaining moderate codeword search acceleration. In this paper, a novel PDS algorithm well suited for hardware realization is proposed. The algorithm employs subspace search, bitplane reduction, and multiple-coefficient accumulation techniques for the effective reduction of the area complexity and computation latency. Concurrent encoding of different input vectors for further computation acceleration is also allowed by the employment of multiple-module PDS. The proposed implementation has been embedded in a softcore CPU for physical performance measurement. Experimental results show that the implementation provides a cost-effective solution to the FPGA realization of VQ encoding systems where both high throughput and high fidelity are desired. 相似文献
20.
A new real-time stereo system is presented based on a hardware implementation of an efficient Dynamic Programming algorithm. A simple state-machine calculates the cost-matrix along the diagonal of the 2-D disparity space for each epipolar pair of image scan-lines. Minimum transition costs are stored in embedded RAM and are used to backtrack disparities at clock rate. All calculations are within a pre-determined slice of the cost plane, representing the useful disparity range. The system is designed as a VHDL library component and is implemented as a SoC in a medium-capacity Field Programmable Gate Array chip. It can process stereo-pairs in full VGA resolution at a rate of 25 Mpixels/s and produces 8-bit dense disparity maps within a range of disparities up to 65 pixels. The design is evaluated comparing to ground truth and in terms of resource usage. It is also compared to a software implementation of the Dynamic Programming algorithm and to other FPGA-based stereo systems. 相似文献