首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
随着多核处理器芯片在嵌入式应用领域越来越受到关注,提高应用程序开发产能同时获得并行性能收益是多核大众化并行计算研究的核心目标。着重综述了嵌入式应用领域面临的三个关键问题。首先,对当前的高性能嵌入式计算与超级计算做了比较,并对嵌入式应用领域做了分类总结。其次,对当前的适用于嵌入式的片上多核处理器架构做了研究。最后,综述了多核并行编程的方式的研究现状,并总结了嵌入式多核并行未来的研究问题。  相似文献   

2.
文章基于多核机群系统对并行编程模型进行了深入研究,实现了多层次并行体系结构的OpenMP/MPI混合编程模型的设计.在以SMP机群系统为背景的情况下,实现其节点间和节点内的分层,运用多层次的并行编程模型进行实验与分析.同时对多层次并行编程模型的性能进行深入的研究,提出了一种大同步混合设计新思路.设计了N-Body问题的大同步优化并行算法,并在曙光TC 5000A机群上与传统的并行算法作了性能方面的比较.通过理论研究并结合大量的实验分析统计,得到了多核机群的混合并行编程模型的性能优化的诸多结论.  相似文献   

3.
构建了一种适用于多核集群的混合并行编程模型.该模型融合了共享内存的面向任务的TBB编程和基于消息传递的MPI编程两种模式.结合两者的优势,实现进程到处理节点和进程内线程到处理器核的两级并行.相对于单一编程方式下的程序性能,采用这种混合并行编程模型的算法不但可以减少程序执行时间,获得更好的加速比和执行效率,而且明显地提高了集群性能.  相似文献   

4.
MapReduce并行编程模型研究综述   总被引:40,自引:0,他引:40       下载免费PDF全文
李建江  崔健  王聃  严林  黄义双 《电子学报》2011,39(11):2635-2642
 MapReduce并行编程模型通过定义良好的接口和运行时支持库,能够自动并行执行大规模计算任务,隐藏底层实现细节,降低并行编程的难度.本文对MapReduce的国内外相关研究现状进行了综述,阐述和分析了当前国内外与MapReduce相关的典型研究成果的特点和不足,重点对MapReduce涉及的关键技术(包括:模型改进、模型针对不同平台的实现、任务调度、负载均衡和容错)的研究现状进行了深入的分析.本文最后还对MapReduce未来的发展趋势进行了展望.  相似文献   

5.
并行化程序的出现大大提高了应用程序的执行效率,多核程序设计时需要对程序的性能进行考虑。本文重点讨论OpenMP编程模型中多核多线程程序在并行化开销、负载均衡、线程同步开销方面对程序性能的影响。  相似文献   

6.
在民用和军事领域中,机动目标跟踪都有着广泛的应用.在介绍机动目标跟踪方法与原理的基础上,主要对可调白噪声模型、Singer模型和当前统计模型三种机动目标跟踪算法进行了研究与仿真分析.最后通过对各模型的滤波性能进行比较,得出相应的结论.  相似文献   

7.
多核编程 1970~2005年,处理器性能增加是由时钟频率的提高来推动的,从过去的1MHz到当今的几GHz;晶体管的几何尺寸不断缩小,从而允许处理器中的晶体管数量从最初的2300增加到10亿个以上,与此同时,处理器的电源电压也在降低;芯片级功耗随着性能的提升而增加.但是今天,由于功耗的限制,处理器性能很难再由提高时钟频率来驱动了,多核结构则可降低电压、频率和功耗.  相似文献   

8.
异构众核系统已成为当前高性能计算领域重要的发展趋势。针对异构众核系统,从架构、编程、所支持的应用三方面分析对比当前不同异构系统的特点,揭示了异构系统的发展趋势及异构系统相对于传统多核并行系统的优势;然后从编程模型和性能优化方面分析了异构系统存在的问题和面临的挑战,以及国内外研究现状,结合当前研究存在的问题和难点,探讨了该领域进一步深入的研究方向;同时对两种典型的异构众核系统CPU+GPU和CPU+MIC进行不同应用类型的Benchmark测试,验证了两种异构系统不同的应用特点,为用户选择具体异构系统提供参考,在此基础上提出将两种众核处理器(GPU和MIC)结合在一个计算节点内构成新型混合异构系统;该新型混合异构系统可以利用两种众核处理器不同的处理优势,协同处理具有不同应用特点的复杂应用,同时分析了在该混合异构系统下必须要研究和解决的关键问题;最后对异构众核系统面临的挑战和进一步的研究方向进行了总结和展望。  相似文献   

9.
先进装备的新型信息处理平台采用高性能多核处理器为核心的一体化架构,来满足高性能和智能化需求。为了支撑密集计算任务,需要并行计算框架来解决多核同步、负载均衡、任务调度、数据分发等并行计算应用难题。然而,现有的并行计算框架多为基于Linux的开源框架,不支持国产多核处理器;同时,基于并行计算框架的编程方式与传统的以控制算法为中心的结构化编程思路不同,对于习惯了C/C++编写串行程序的用户,基于并行计算框架编程面临许多难题。针对上述问题,在解决国产多核处理器操作系统并行计算框架适配问题的同时,基于低代码设计思想,研究简化并行计算框架编程方法、提升并行应用编程效率的技术途径,通过可视化组件配置与代码自动生成的方式,真正地降低并行编程难度,充分发挥国产多核处理器的并行计算效能。  相似文献   

10.
针对当前网络通信业务量大,业务种类多的特点,对近年来网络流量预测模型研究现状进行了综述,分析了多种网络流量预测模型,针对网络流量的不同特点对各种模型从计算复杂度、应用场合及适用范围等方面展开比较分析。比较结果表明,预测模型与所分析流量特性及应用场合关系密切,在具体应用中应充分考虑预测目标和具体的网络流量特点,选择合适的预测模型。  相似文献   

11.
Recently, embedded multicore platforms have become popular for signal processing, but software development for such platforms is still very slow. First, parallel programming is more challenging than sequential programming to average programmers. To make the problem worse, software is not portable among the platforms, since each multicore signal-processing platform offers its own programming interface/language. We believe this problem can be relieved by adding the support of a standard message-passing programming to embedded multicore platforms. In particular, we would like to leverage MPI, the most successful message-passing system, which practically enables the development of portable applications to run on many parallel machines. There are technical challenges to support MPI on embedded multicore platforms: the size of the library, architecture issues, and performance issues. This paper identifies and addresses these issues. To enable the reuse of existing MPI programs and make message-passing programming portable and efficient, we designed a light-weight MPI-like message-passing library with a three-layer modular design, where the top two layers are mostly platform-independent, and the bottom layer enables platform-specific optimizations. This approach has allowed us to effectively support message-passing on several popular embedded multicore signal-processing platforms, including the IBM CELL and the ITRI PAC Duo. Our results show that message-passing programming is a viable solution for multicore signal processing applications and may be considered by platform vendors.  相似文献   

12.
In recent years embedded systems have entered the multicore era. As the number of cores keeps growing in embedded systems, it becomes more important to provide programming support which considers embedded system constraints and in the meanwhile helps utilize multicore systems. So far though C still dominates embedded programming, C++ is gaining in importance in parallel programming. It is promising to support C++ for embedded multicore systems. However, embedded systems usually have tight resource budgets, and C++ is commonly considered having huge code size that embedded systems can not afford. Therefore, in this paper we investigate the code size requirement of a C++ library and propose a layered design to provide a code size aware library support. On the other hand, to utilize embedded multicore systems, we employ C++ linguistic features to facilitate embedded multicore programming. With C++, we incorporate high-level abstractions and design patterns into the programming support to enhance low-level programming APIs that can be used to exploit DSPs, SIMD instructions, and DMAs on embedded multicore systems. At last, we evaluate our C++ support with a Blur and a JPEG program. Our result on a dual-DSP platform shows that we can obtain speedups of 3.32 and 3.09 for the Blur and JPEG program, respectively.  相似文献   

13.
Including multiple cores on a single chip has become the dominant mechanism for scaling processor performance. Exponential growth in the number of cores on a single processor is expected to lead in a short time to mainstream computers with hundreds of cores. Scalable implementations of parallel algorithms will be necessary in order to achieve improved single-application performance on such processors. In addition, memory access will continue to be an important limiting factor on achieving performance, and heterogeneous systems may make use of cores with varying capabilities and performance characteristics. An appropriate programming model can address scalability and can expose data locality while making it possible to migrate application code between processors with different parallel architectures and variable numbers and kinds of cores. We survey and evaluate a range of multicore processor architectures and programming models with a focus on GPUs and the Cell BE processor. These processors have a large number of cores and are available to consumers today, but the scalable programming models developed for them are also applicable to current and future multicore CPUs.  相似文献   

14.

Recent advances in general-purpose graphics processing units (GPGPUs) have resulted in massively parallel hardware that is widely available to achieve high performance in desktop, notebook, and even mobile computer systems. While multicore technology has become the norm of modern computers, programming such systems requires the understanding of underlying hardware architecture and hence posts a great challenge for average programmers, who might be professionals in specific domains, but not experts in parallel programming. This paper presents a GUI tool called GPUBlocks that can facilitate parallel programming on multicore computer systems. GPUBlocks is developed based on the OpenBlocks framework, an extendable tool for graphical programming, to construct the GUI-based programming environment for CUDA and OpenCL parallel computing platforms. Programmers simply need to drag-n-drop blocks, fill the fields of the blocks, and connect them according to array or matrix computations that are specified by algorithms. GPUBlocks can then translate block-based code to CUDA or OpenCL programs. Furthermore, a couple of optimization constructs have also been offered for rapid program optimization. Experimental results have shown that the generated CUDA and OpenCL programs can achieve reasonable speedups on GPUs. Consequently, GPUBlocks can be used as a tool for fast prototyping of GPU applications or a platform for educational parallel programming.

  相似文献   

15.
Based on its simplicity and user-friendly characteristics, OpenMP has become the standard model for programming on shared-memory architectures. Checkpointing-aided parallel execution (CAPE) is an approach that utilizes the discontinuous incremental checkpointing technique (DICKPT) to translate and execute OpenMP programs on distributed-memory architectures automatically. Currently, CAPE implements the OpenMP execution model by utilizing the DICKPT to distribute parallel jobs and their data to slave machines, and then collects the results after executing these distributed jobs. Although this model has been proven to be effective in terms of performance and compatibility with OpenMP on distributed-memory systems, it cannot fully exploit the capabilities of multicore processors. This paper presents a novel execution model for CAPE that utilizes two levels of parallelism. In the proposed model, we add another level of parallelism in the form of multithreaded processes on slave machines with the goal of better exploiting their multicore CPUs. Initial experimental results presented near the end of this paper demonstrate that this model provides significantly enhanced CAPE performance.  相似文献   

16.
Parallel computing is rapidly entering mainstream computing, and multicore processors can now be found in the heart of supercomputers, desktop computers, and laptops. Consequently, applications will increasingly need to be parallelized to fully exploit the multicore processor throughput gains that are becoming available. Unfortunately, writing parallel code is more complex than writing serial code. An introductory parallel computing course aims to introduce students to this technology shift and to explain that parallelism calls for a different way of thinking and new programming skills. The course covers theoretical topics and offers practical experience in writing parallel algorithms on state-of-the-art parallel computers, parallel programming environments, and tools.  相似文献   

17.
This paper examined how recent innovations in processor technology are pushing the limits for ATE applications. Various multicore programming techniques were discussed including task parallelism, data parallelism, and pipelining. In addition, an example of optimizing complex analysis was covered. The benefits of adopting multicore technology and parallel software architectures include a reduction in overall test time, more sophisticated simulation approaches, and the ability to analyze complex systems.  相似文献   

18.
Shared virtual memory: progress and challenges   总被引:1,自引:0,他引:1  
Shared virtual memory, a technique for supporting a shared address space in software on parallel systems, has undergone a decade of research, with significant maturing of protocols and communication layers having now been achieved. We provide a survey of the key developments in this research, placing the multitrack flow of ideas and results obtained so far in a comprehensive new framework. Four major research tracks are covered: relaxed consistency models; protocol laziness; architectural support; and application-driven research. Several related avenues are also discussed, such as fine grained software coherence, software protocols across multiprocessor nodes, and performance scalability. We summarize comparative performance results from the literature, discuss their limitations, and identify lessons learned so far, key outstanding questions, and important directions for future research in this area  相似文献   

19.
Claimed as the next generation programming paradigm, mobile agent technology has attracted extensive interests in recent years. However, up to now, limited research efforts have been devoted to the performance study of mobile agent system and most of these researches focus on agent behavior analysis resulting in that models are hard to apply to mobile agent systems.To bridge the gap, a new performance evaluation model derived from operation mechanisms of mobile agent platforms is proposed. Details are discussed for the design of companion simulation software, which can provide the system performance such as response time of platform to mobile agent. Further investigation is followed on the determination of model parameters. Finally comparison is made between the model-based simulation results and measurement-based real performance of mobile agent systems. The results show that the proposed model and designed software are effective in evaluating performance characteristics of mobile agent systems. The proposed approach can also be considered as the basis of performance analysis for large systems composed of multiple mobile agent platforms.  相似文献   

20.
李文石  姚宗宝 《电子学报》2012,40(2):230-234
在评价多核CPU加速比已知模型的基础上,基于第一性计算原理融合理解阿姆达尔定律和兰特法则,提出描述多核CPU加速比的一个新模型.研究方法是从传统的阿姆达尔定律切入,论述的逻辑顺序分别基于约束固定任务,固定时间,存储器和互连复杂性;兼顾了举例论述同构多核的NoC带宽性质和最大温度特性.计算表明:基于固定时间模型与存储器模型预测多核的加速能力,容易得到估计结果的乐观上限;我们提出的基于兰特法则的模型计算结果,在并行比例较大时稍小于但接近前述模型估计值,而比固定任务模型的保守结果要好;NoC带宽和最大温度的结果提示,多(同构)核CPU期盼相对高的并行度架构.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号