共查询到20条相似文献,搜索用时 78 毫秒
1.
在构建数据融合仿真系统通用架构的基础上,介绍了系统主要功能和仿真流程;研究了数据融合模块设计、效能评估指标体系、效能评估数学模型等仿真系统实现的关键技术,从数据融合算法设计及系统软件开发两个方面,提出了数据融合仿真系统通用架构开发实现的基本思路和方法.该系统不但可用来研究评价不同数据融合模型、算法的性能,而且可用来评估数据融合系统的整体性能,对实际系统实现也有相当的借鉴作用. 相似文献
2.
高性能计算(HPC)已经进入后E级时代。作为超算系统核心器件,高性能处理器通过核心运算架构为HPC提供超强算力。核心运算架构的研究进展代表了高性能处理器体系结构的发展方向。以面向E级计算的先进高性能处理器为目标,从运算资源组织结构、数据和指令级并行方式、领域专用加速结构、支持数据类型和算力等方面对核心运算架构研究进展进行分析和探讨,并展望了高性能处理器核心运算架构的发展趋势。超宽向量SIMD和SIMT、领域专用加速结构加速矩阵运算、支持多种低精度运算以加速HPC和AI融合,将是未来高性能处理器核心运算架构研究和发展的主要方向。 相似文献
3.
4.
随着GPU技术的发展,GPU比CPU拥有了更高的处理能力。本文提出将多层显微图像融合计算由CPU转移到GPU上进行,提升融合速度,最终达到图像融合与图像采集同步。两者的对比实验结果表明GPU在进行图像融合有明显的速度优势;将图像融合嵌入到图像采集程序后的测试表明通过GPU进行图像融合完全可以与相机采集相同步,完成采集、融合实时进行。这一结果改变了长期以来研究人员进行图像融合时,先拍照,再融合的工作流程,只需要将相机在不同聚焦高度扫描一次即可得到多层聚焦位置的融合图。 相似文献
6.
海思半导体作为国产芯片生产的领先者,近几年相继推出了多款高性能的数字媒体处理芯片。作为监控领域的代表Hi3516、Hi3518芯片在很多厂家得到了充分的应用。SDK中媒体驱动相关部分不提供源码,都是只以模块ko和库lib的形式提供。因此在进行系统移植时需要特别注意。分析了Hi3518 SDK的架构,提出了相关的系统移植方法。 相似文献
7.
8.
针对云计算、大数据等应用对异构资源管理和聚合的需求,提出了一种融合架构云服务器体系结构及其关键支撑技术。融合架构云服务器利用硬件资源池化技术,实现计算、存储、网络、供电、制冷和管理模块的解耦与融合重构,具有高密度、低功耗、易扩展、易管理,易维护特点,兼具横向扩展和纵向扩展优势,可优化系统部署、运维和能耗成本,显著降低总体拥有成本(TCO)。在金融、电信和互联网行业的实际应用案例表明,融合架构云服务器功耗降低超过15%,总体拥有成本降低近15%,为云计算、大数据等应用提供了更具性能功耗比优势的IT基础设施设计方案。 相似文献
9.
10.
11.
12.
Shiqing ZHANG Zheng QIN Yaohua YANG Li SHEN Zhiying WANG 《Frontiers of Computer Science》2020,14(3):143101-13
Despite the increasing investment in integrated GPU and next-generation interconnect research,discrete GPU connected by PCIe still account for the dominant position of the market,the management of data communication between CPU and GPU continues to evolve.Initially,the programmer explicitly controls the data transfer between CPU and GPU.To simplify programming and enable systemwide atomic memory operations,GPU vendors have developed a programming model that provides a single,virtual address space for accessing all CPU and GPU memories in the system.The page migration engine in this model automatically migrates pages between CPU and GPU on demand.To meet the needs of high-performance workloads,the page size tends to be larger.Limited by low bandwidth and high latency interconnects compared to GDDR,larger page migration has longer delay,which may reduce the overlap of computation and transmission,waste time to migrate unrequested data,block subsequent requests,and cause serious performance decline.In this paper,we propose partial page migration that only migrates the requested part of a page to reduce the migration unit,shorten the migration latency,and avoid the performance degradation of the full page migration when the page becomes larger.We show that partial page migration is possible to largely hide the performance overheads of full page migration.Compared with programmer controlled data transmission,when the page size is 2MB and the PCIe bandwidth is 16GB/sec,full page migration is 72.72×slower,while our partial page migration achieves 1.29×speedup.When the PCIe bandwidth is changed to 96GB/sec,full page migration is 18.85×slower,while our partial page migration provides 1.37×speedup.Additionally,we examine the performance impact that PCIe bandwidth and migration unit size have on execution time,enabling designers to make informed decisions. 相似文献
13.
Cebrián-Márquez Gabriel Galiano Vicente Migallón Héctor Martínez José Luis Cuenca Pedro López-Granado Otoniel 《The Journal of supercomputing》2019,75(3):1215-1226
The Journal of Supercomputing - The high efficiency video coding (HEVC) standard has opened the door to high-quality multimedia contents and new formats such as ultra-high definition as a result of... 相似文献
14.
15.
16.
描述了矩阵乘法在CPU上的三种实现方法和在GPU上基于CUDA架构的四种实现方法,分析了高性能方法的原由,发现它们的共同特点都是合理地组织数据并加以利用,这样能有效地减少存取开销,极大地提高算法的速度。其中CPU上的最优实现方法比普通算法快了200多倍,GPU上的最优实现方法又比CPU上的最优实现方法快了约6倍。 相似文献
17.
采取 CPU 分发图像滤波任务和回收滤波结果、将多个图像数据划分分配给多个 GPU 及其线程块、GPU 调用核函数库对图像进行傅里叶变换和反傅里叶变换的方法,设计实现了 CPU 和 GPU 协同计算的多图像同态滤波并行算法。实验结果表明,给出的多图像同态滤波并行算法高效,与单 GPU 计算的并行算法相比,多 GPU 协同计算的并行算法显著缩短了多个图像同态滤波处理所需的时间。 相似文献
18.
Martin Kruli? Tomá? Skopal Jakub Loko? Christian Beecks 《Distributed and Parallel Databases》2012,30(3-4):179-207
The Signature Quadratic Form Distance on feature signatures represents a flexible distance-based similarity model for effective content-based multimedia retrieval. Although metric indexing approaches are able to speed up query processing by two orders of magnitude, their applicability to large-scale multimedia databases containing billions of images is still a challenging issue. In this paper, we propose a parallel approach that balances the utilization of CPU and many-core GPUs for efficient similarity search with the Signature Quadratic Form Distance. In particular, we show how to process multiple distance computations and other parts of the search procedure in parallel, achieving maximal performance of the combined CPU/GPU system. The experimental evaluation demonstrates that our approach implemented on a common workstation with 2?GPU cards outperforms traditional parallel implementation on a high-end 48-core NUMA server in terms of efficiency almost by an order of magnitude. If we consider also the price of the high-end server that is ten times higher than that of the GPU workstation then, based on price/performance ratio, the GPU-based similarity search beats the CPU-based solution by almost two orders of magnitude. Although proposed for the SQFD, our approach of fast GPU-based similarity search is applicable for any distance function that is efficiently parallelizable in the SIMT execution model. 相似文献
19.
20.
波束形成的实时性一直是声纳和雷达等领域信号处理过程中的重点和难点。本文采用基于CUDA(Compute Unified Device Architecture,统一计算设备架构)的GPU(Graphic Processing Unit,图形处理器)与CPU协作处理方法,实现了宽带波束形成的实时处理。本方法的处理速度相较于matlab和CPU平台可以提高一至两个数量级,相较于同等处理速度的多DSP平台则体现了开发周期短、费用低、工作量小和可靠性高等众多优势。 相似文献