期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

庞宇吴天次王元发贾美平周前能《计算机应用研究》2024,41(6)

内窥镜去雾算法在医疗领域具有广泛应用,为临床医生提供清晰、实时的图像。去雾技术虽然已经取得较大的进步,但去雾算法的复杂度较高,在内窥镜等复杂情况下硬件实现较为困难。为了在硬件上实现内窥镜实时去雾效果,对暗通道先验算进行改进,降低硬件资源消耗和时间复杂度。该改进算法选择适合硬件的大气光照强度估计值、透射率补偿值以及采用流水线结构实现有雾图像的处理。采用Xilinx的ZYNQ7020实现该算法硬件电路,实时处理分辨率为640×480的视频图像,速度可达到260 fps,消耗LUT仅为1.28K,寄存器619个单元。实验结果表明,相比于传统算法,改进算法具有处理速度快、功耗低、可移植性强的特点,满足内窥镜需要实时处理视频的要求。相似文献

2.

Resource management and task partitioning and scheduling on a run-time reconfigurable embedded system

Radha Guha^{Author Vitae} Nader Bagherzadeh Author Vitae Author Vitae 《Computers & Electrical Engineering》2009,35(2):258-285

There are many design challenges in the hardware-software co-design approach for performance improvement of data-intensive streaming applications with a general-purpose microprocessor and a hardware accelerator. These design challenges are mainly to prevent hardware area fragmentation to increase resource utilization, to reduce hardware reconfiguration cost and to partition and schedule the tasks between the microprocessor and the hardware accelerator efficiently for performance improvement and power savings of the applications.In this paper a modular and block based hardware configuration architecture named memory-aware run-time reconfigurable embedded system (MARTRES) is proposed for efficient resource management and performance improvement of streaming applications. Subsequently we design a task placement algorithm named hierarchical best fit ascending (HBFA) algorithm to prove that MARTRES configuration architecture is very efficient in increased resource utilization and flexible in task mapping and power savings. The time complexity of HBFA algorithm is reduced to O(n) compared to traditional Best Fit (BF) algorithm’s time complexity of O(n²), when the quality of the placement solution by HBFA is better than that of BF algorithm. Finally we design an efficient task partitioning and scheduling algorithm named balanced partitioned and placement-aware partitioning and scheduling algorithm (BPASA). In BPASA we exploit the temporal parallelism in streaming applications to reduce reconfiguration cost of the hardware, while keeping in mind the required throughput of the output data. We balance the exploitation of spatial parallelism and temporal parallelism in streaming applications by considering the reconfiguration cost vs. the data transfer cost. The scheduler refers to the HBFA placement algorithm to check whether contiguous area on FPGA is available before scheduling the task for HW or for SW. 相似文献

3.

Algorithmic aspects for multiple-choice hardware/software partitioning

Jigang Wu Qiqiang Sun Thambipillai Srikanthan 《Computers & Operations Research》2012

Hardware–software partitioning (HW/SW) divides an application into software and hardware. It is one of the crucial steps in embedded system design. For a given task, hardware with different areas may provide different execution speeds due to the potential of parallel execution in hardware implementation. Thus, one task may have multiple-choice in hardware implementation according to the available hardware areas. Existing HW/SW partitioning approaches typically consider only a single implementation manner in hardware, overlooking the multiple-choice of hardware implementations. This paper presents a computing model to cater for the HW/SW partitioning problems with the multiple-choice implementation in hardware. An efficient heuristic algorithm is proposed to rapidly generate approximate solution, that is further refined by a tabu search algorithm also customized in this paper. Moreover, a dynamic programming algorithm is proposed for the exact solution of the relatively small problems. Extensive simulation results show that the approximate solutions are very close to the exact ones, and they can be refined by tabu search to the solutions with the error no more than 1.5% for all cases considered in this paper. 相似文献

4.

New Model and Algorithm for Hardware/Software Partitioning 总被引：1，自引：0，他引：1

下载免费PDF全文

Ji-Gang Wu Thambipillai Srikanthan and Guang-Wei Zou 《计算机科学技术学报》2008,23(4):644-651

This paper focuses on the algorithmic aspects for the hardware/software （HW/SW） partitioning which searches a reasonable composition of hardware and software components which not only satisfies the constraint of hardware area but also optimizes the execution time. The computational model is extended so that all possible types of communications can be taken into account for the HW/SW partitioning. Also, a new dynamic programming algorithm is proposed on the basis of the computational model, in which source data, rather than speedup in previous work, of basic scheduling blocks are directly utilized to calculate the optimal solution. The proposed algorithm runs in O（n·A） for n code fragments and the available hardware area A. Simulation results show that the proposed algorithm solves the HW/SW partitioning without increase in running time, compared with the algorithm cited in the literature. 相似文献

5.

A CMOS feedforward neural-network chip with on-chip parallel learning for oscillation cancellation

Liu J. Brooke M.A. Hirotsu K. 《Neural Networks, IEEE Transactions on》2002,13(5):1178-1186

The paper presents a mixed signal CMOS feedforward neural-network chip with on-chip error-reduction hardware for real-time adaptation. The chip has compact on-chip weighs capable of high-speed parallel learning; the implemented learning algorithm is a genetic random search algorithm: the random weight change (RWC) algorithm. The algorithm does not require a known desired neural network output for error calculation and is suitable for direct feedback control. With hardware experiments, we demonstrate that the RWC chip, as a direct feedback controller, successfully suppresses unstable oscillations modeling combustion engine instability in real time. 相似文献

6.

CRYSTAL-KYBER硬件设计优化空间探索

下载免费PDF全文

穆嘉楠赵艺璇严寒宋金峰叶靖李华伟李晓维《信息安全学报》2021,6(6):51-63

公钥密码学对全球数字信息系统的安全起着至关重要的作用。然而,随着量子计算机研究的发展和Shor算法等的出现,公钥密码学的安全性受到了潜在的极大的威胁。因此,能够抵抗量子计算机攻击的密码算法开始受到密码学界的关注,美国国家标准与技术研究院(National Institute of Standards and Technology,NIST)发起了后量子密码(Post-quantum cryptography,PQC)算法标准全球征集竞赛。在参选的算法中,基于格的算法在安全性、公钥私钥尺寸和运算速度中达到了较好的权衡,因此是最有潜力的后量子加密算法体制。而CRYSTALS-KYBER作为基于格的密钥封装算法(Key encapsulation mechanism,KEM),通过了该全球征集竞赛的三轮遴选。对于后量子密码算法,算法的硬件实现效率是一个重要评价指标。因此,本文使用高层次综合工具(High-level synthesis,HLS),针对CRYSTALS-KYBER的三个主模块(密钥生成,密钥封装和密钥解封装),在不同参数集下探索了硬件设计的实现和优化空间。作为一种快速便捷的电路设计方法,HLS可以用来对不同算法的硬件实现进行高效和便捷的探索。本文利用该工具,对CRYSTALS-KYBER的软件代码进行了分析,并尝试不同的组合策略来优化HLS硬件实现结果,并最终获得了最优化的电路结构。同时,本文编写了tcl-perl协同脚本,以自动化地搜索最优优化策略,获得最优电路结构。实验结果表明,适度优化循环和时序约束可以大大提高HLS综合得到的KYBER电路性能。与已有的软件实现相比,本文具有明显的性能优势。与HLS实现工作相比,本文对Kyber-512的优化使得封装算法的性能提高了75%,解封装算法的性能提高了55.1%。与基准数据相比,密钥生成算法的性能提高了44.2%。对于CRYSTALS-KYBER的另外两个参数集(Kyber-768和Kyber-1024),本文也获得了类似的优化效果。相似文献

7.

Low-complex dynamic programming algorithm for hardware/software partitioning 总被引：1，自引：0，他引：1

Jigang Wu Thambipillai Srikanthan 《Information Processing Letters》2006,98(2):41-46

A low-complex algorithm is proposed for the hardware/software partitioning. The proposed algorithm employs dynamic programming principles while accounting for communication delays. It is shown that the time complexity of the latest algorithm has been reduced from O(n²⋅A) to O(n⋅A), without increase in space complexity, for n code fragments and hardware area A. 相似文献

8.

An accurate and cost-effective stereo matching algorithm and processor for real-time embedded multimedia systems

Kyeong-ryeol Bae Byungin Moon 《Multimedia Tools and Applications》2017,76(17):17907-17922

相似文献

9.

Algorithmic aspects of area-efficient hardware/software partitioning 总被引：1，自引：0，他引：1

Wu Jigang Thambipillai Srikanthan 《The Journal of supercomputing》2006,38(3):223-235

Area efficiency is one of the major considerations in constraint aware hardware/software partitioning process. This paper focuses on the algorithmic aspects for hardware/software partitioning with the objective of minimizing area utilization under the constraints of execution time and power consumption. An efficient heuristic algorithm running in O(n log n) is proposed by extending the method devised for solving the 0-1 knapsack problem. Also, an exact algorithm based on dynamic programming is proposed to produce the optimal solution for small-sized problems. Simulation results show that the proposed heuristic algorithm yields very good approximate solutions while dramatically reducing the execution time. 相似文献

10.

基于AccelDSP的LBP算法在人脸识别中的应用

徐钊吴光敏覃世欢《微机发展》2014,(1):51-53

针对人脸识别中实时性的要求,采用FPGA硬件方式实现人脸的实时识别,对传统的LBP算法在硬件实现上存在的问题进行了详细分析,并提出了一种符合硬件数据流处理的LBP优化算法。利用AccelDSP综合工具对该优化算法进行硬件设计,并在Dasal公司的Anaconda卡的IPUFPGA上进行实验验证,满足了人脸识别中实时性的要求。实验结果表明优化后的LBP算法不仅人脸识别率得到了提高,而且在硬件上特征值提取速度是软件上的19倍,能够满足实时性的要求,达到每秒处理100幅人脸图像。相似文献

11.

图像中值滤波算法的优化及其硬件实现

袁浩浩张联盟《工业控制计算机》2010,23(4):56-57

讨论了FPGA图像处理算法的几种实现途径,在分析和研究中值滤波算法的基础上提出了一种优化的算法,该算法既能满足硬件的流水实现,又可在效率上得到明显提高。设计以FPGA为硬件平台,用Verilog语言实现了中值滤波的优化算法。通过与软件中值滤波进行比较,可以看到硬件实现的效率优势和算法可行性。相似文献

12.

A parallel genetic algorithm for adaptive hardware and its application to ECG signal classification

Yutana Jewajinda Prabhas Chongstitvatana 《Neural computing & applications》2013,22(7-8):1609-1626

This paper presents a parallel genetic algorithm (GA) called the cellular compact genetic algorithm (c-cGA) and its implementation for adaptive hardware. An adaptive hardware based on the c-cGA is proposed to automate real-time classification of ECG signals. The c-cGA not only provides a strong search capability while maintaining genetic diversity using multiple GAs but also has a cellular-like structure and is a straight-forward algorithm suitable for hardware implementation. The c-cGA hardware and an adaptive digital filter structure also perform an adaptive feature selection in real time. The c-cGA is applied to a block-based neural network (BbNN) for online learning in the hardware. Using an adaptive hardware approach based on the c-cGA, an adaptive hardware system for classifying ECG signals is feasible. The proposed adaptive hardware can be implemented in a field programmable gate array (FPGA) for an adaptive embedded system applied to personalised ECG signal classifications for long-term patient monitoring. 相似文献

13.

MorphoSys reconfigurable hardware for?cryptography:?the?twofish?case 总被引：1，自引：0，他引：1

Sohaib Majzoub Hassan Diab 《The Journal of supercomputing》2012,59(1):22-41

This paper presents the mapping and performance analysis of the Twofish algorithm on MorphoSys. MorphoSys is a reconfigurable architecture that can provide high performance compared to custom hardware and yet preserves a level of flexibility compared to general-purpose processors. With today’s high demand for secure data transfer mediums including wired and wireless networks, there is a growing demand for real-time implementation of cryptographic algorithms. The choice of the Twofish algorithm, one of the five AES finalists, is because it is computationally intensive algorithm. It requires lookup tables, logical and arithmetic computations that stipulate high flexibility and performance. So it is a perfect algorithm to be mapped in order to evaluate such hardware. 相似文献

14.

Harris角点结合金字塔光流法的目标跟踪算法设计研究

下载免费PDF全文

徐里萍耿斌李小龙赵丽《计算机测量与控制》2018,26(5):162-165

针对现存很多跟踪算法在速度和准确度方面很难满足嵌入式跟踪开发的需要,提出一种基于Harris角点和金字塔光流法的快速跟踪算法,并详细给出了DSP-FPGA的硬件设计。首先,使用Harris角点提取目标角点特征;然后,使用金字塔光流法为后续视频帧匹配角点;最后,基于角点的质心跟踪算法用于匹配目标的重心,确定目标的位置,重心跟踪算法可以较好地抵消由于旋转或扭曲带来的形变问题。在硬件实现过程中,FPGA方便电路设计,使用硬件描述程序语言实现硬件算法、逻辑控制和外部接口,DSP则运行目标跟踪算法。实验结果验证了本文硬件实现算法的有效性,相比于AVT21开发板的质心跟踪算法、相位相关跟踪算法和金字塔相关性跟踪算法相比,本文算法在平均重叠和平均中心误差方面具有一定优势,在720p的视频流上可以满足25fps。相似文献

15.

CORDIC算法在正余弦函数中的应用及其FPGA实现

下载免费PDF全文

常柯阳曾岳南陈平覃曾攀《计算机工程与应用》2013,49(7):140-143

正余弦函数在工程实现中应用很广泛。常用的查找表方法实现简单,但占用存储器资源较多,计算精度与存储容量的矛盾比较突出;传统的CORDIC（坐标旋转数字计算）方法虽占用存储资源少,但硬件资源消耗大,且输出时延长。鉴于此,提出一种改进型的CORDIC算法,将查找表和CORDIC算法相结合,完成了该算法的设计仿真和基于FPGA的硬件测试;结果表明该算法能够利用少量硬件资源和部分存储资源,实现较高的计算精度和较低的输出时延。相似文献

16.

采用预配置策略的可重构混合任务调度算法 总被引：2，自引：2，他引：2

梁樑周学功王颖彭澄廉《计算机辅助设计与图形学学报》2007,19(5):635-641

在对可重构硬件资源进行抽象的基础上,采用软硬件混合任务有向无环图来描述应用,提出一种基于列表的混合任务调度算法.该算法通过任务计算就绪顺序及可重构资源状态确定硬件任务的动态预配置优先级,按此优先级进行硬件任务预配置,隐藏硬件任务的配置时间,从而获得硬件任务运算加速.实验结果表明,针对可重构系统中的软硬件混合任务调度,能够有效地降低配置时间对应用执行时间的影响. 相似文献

17.

List decoding of Hermitian codes using Gröbner bases

Kwankyu Lee Michael E. O&#x;Sullivan 《Journal of Symbolic Computation》2009,44(12):1662-1675

List decoding of Hermitian codes is reformulated to yield an efficient and simple algorithm for the interpolation step. The algorithm is developed using the theory of Gröbner bases of modules. The computational complexity of the algorithm seems comparable to previously known algorithms achieving the same task, and the algorithm is better suited for hardware implementation. 相似文献

18.

A True O(1) Parallel Deadlock Detection Algorithm for Single-Unit Resource Systems and Its Hardware Implementation

Xiao Xiang Lee Jaehwan John 《Parallel and Distributed Systems, IEEE Transactions on》2010,21(1):4-19

Due to rapid technology advance, Multiprocessor System-on-Chips (MPSoCs) are likely to become commodity computing platforms for embedded applications. In the future, it is possible that an MPSoC is equipped with a large number of processing elements as well as on-chip resources. The management of these faces many challenges, among which deadlock is one of the most crucial issues. This paper presents a novel hardware-oriented deadlock detection algorithm suitable for current and future MPSoCs. Unlike previously published methods whose runtime complexities are often affected by the number of processing elements and resources in the system, the proposed algorithm leverages specialized hardware to guarantee O(1) overall runtime complexity. Such complexity is achieved by: 1) classifying resource allocation events; 2) for each type of events, using hardware to perform a set of specific detection and/or preparation operations that only takes constant runtime; and 3) updating necessary information for multiple resources in parallel in hardware. We implement the algorithm in Verilog HDL and demonstrate through simulation that each algorithm invocation takes at most four clock cycles. 相似文献

19.

High-performance short sequence alignment with GPU acceleration

Mian Lu Yuwei Tan Ge Bai Qiong Luo 《Distributed and Parallel Databases》2012,30(5-6):385-399

Sequence alignment is a fundamental task for computational genomics research. We develop G-Aligner, which adopts the GPU as a hardware accelerator to speed up the sequence alignment process. A leading CPU-based alignment tool is based on the Bi-BWT index; however, a direct implementation of this algorithm on the GPU cannot fully utilize the hardware power due to its irregular algorithmic structure. To better utilize the GPU hardware resource, we propose a filtering-verification algorithm employing both the Bi-BWT search and direct matching. We further improve this algorithm on the GPU through various optimizations, e.g., the split of a large kernel, the warp based implementation to avoid user-level synchronization. As a result, G-Aligner outperforms another state-of-the-art GPU-accelerated alignment tools SOAP3 by 1.8–3.5 times for in-memory sequence alignment. 相似文献

20.

基于DSP的无人机遥感影像SIFT算法设计与实现

孙鹏肖经赵海盟刘帆晏磊赵红颖《计算机应用》2020,40(4):1237-1242

为了满足尺度不变特征变换（SIFT）算法临场处理大尺寸无人机（UAV）组网遥感观测影像的实时快速需求,提出一种基于数字信号处理器（DSP）内核的硬件乘法器来处理单精度浮点型像素数据乘法的算法实现方案。首先,根据DSP内核的硬件乘法器的数据输入、输出特性,重构SIFT算法的图像数据结构和图像函数,以实现硬件乘法器对SIFT算法单精度浮点型像素数据的乘法计算;其次,采用软件流水技术重新编排迭代计算,以增强算法的并行计算能力;最后,将在算法计算过程中产生的动态数据迁移至第三代双倍速率同步动态随机存储器（DDR3）中,以提升算法数据的存储空间。实验结果表明,DSP平台的SIFT算法可以实现对1 000×750的UAV遥感影像的高精度快速处理,所提方案满足无人机组网遥感影像临场处理对SIFT算法的实时快速要求。相似文献