期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

戴乐育李伟徐金甫李军伟《计算机工程与设计》2015,36(1):98-102

为解决在多核密码处理器算法映射中单密码算法高速实现、多密码算法并行实现和复杂信息安全协议实现带来的数据分配问题,对多核密码处理器密码算法的映射方式进行研究,对多核密码处理器进行任务级划分,构建信息安全系统的使用需求、多核密码处理器密码算法的映射方式和多核密码处理器的数据分配方式三者之间的桥梁,提出一种面向任务级的多核密码处理器的数据分配机制。对比实验结果表明,面向任务级的数据分配机制具有更高的性能和灵活性。相似文献

2.

SHA-2算法在多核密码处理器上的实现研究

《计算机应用与软件》2016,(4)

为了找出一种适合多核密码处理器的SHA-2算法高速实现方式,提高SHA-2算法在多核密码处理器上的执行速度。首先研究SHA-256、SHA-512算法在密码处理器上的实现方式,并研究多核密码处理器的结构特点与数据传输方式,分析SHA-2算法在多核上的高速实现原理。然后对SHA-2算法进行任务划分,提出SHA-2在多核密码处理器上的调度与映射算法并使用软件实现调度算法。在ASIC上的仿真验证结果表明,经优化后的SHA-2算法在多核上并行执行吞吐率有了较大提升,满足性能上的需求。相似文献

3.

基于TBB任务调度器的N皇后多核并行算法

郑晓薇张建强《计算机工程与设计》2010,31(15)

为了充分利用多核处理器资源,研究了Intel线程构建模块并行编程模式.基于任务调度器,建立了逻辑线程和物理线程最佳匹配和映射的面向任务编程模式.利用任务调度器,设计了N皇后问题在多核处理器的并行算法.该算法将任务自动地映射到多线程,减少消息传递和数据移动带来的额外开销,提高多核CPU的使用效率.并行算法的加速比接近核数,CPU使用效率超过90%,实验结果表明,该算法有效地提升了多核计算机资源的利用率. 相似文献

4.

基于BMP架构的多核差异化运行技术研究

下载免费PDF全文

王继刚刘韫晖《计算机工程与应用》2019,55(7):66-70

随着高速网络及多核处理器技术的快速发展,业务应用的复杂度也在日益增加。为了保证复杂业务的吞吐量及实时性,基于BMP架构提出了多核环境下操作系统任务差异化运行方案,将多核处理器分为数据面与控制面,数据面核的处理能力提供给高性能要求的循环任务使用,控制面核的任务处理不影响数据面核的性能。方案在Linux内核上进行了改造实现,实验结果表明,可有效提升复杂业务实时响应及业务吞吐能力。相似文献

5.

基于动态任务调度的STDS算法设计研究

刘正《智能系统学报》2015,(2):324-332

任务调度是计算机多核处理器系统获得高性能的关键,而现有的多核任务调度算法研究,大多侧重于静态调度下的算法优化和负载均衡,对动态调度及动态负载均衡研究较少。针对动态调度,并结合异构多核的特点,提出一种基于核负载均衡的动态任务调度算法STDS。算法通过合理设定调度粒度,降低调度频率,从而减少调度消耗时间;根据异构多核处理器各核处理性能的差异,设置内核负载上下限值,控制内核负载保持在同一水平,以达到负载均衡效果。算法依据等待时间长短、任务间通信大小和内核负载轻重因素对任务进行实时调度,并可通过实时因子、负载因子等参数设置3种因素的影响比重,以满足系统的不同需求。仿真实验显示,在内核数目较多的系统中,STDS算法更加高效,在保证任务处理速度的同时有较好负载均衡。相似文献

6.

一种异构多核处理器嵌入式实时操作系统构架设计 总被引：2，自引：1，他引：2

蒋建春汪同庆《计算机科学》2011,38(6):298

由于异构多核处理器和多处理器系统及同构多核处理器的构架存在很大差别,应用于多处理器系统的分布式结构以及应用于同构多核系统的主从式结构操作系统不能解决异构多核处理器的实时调度和效率问题。对异构多核处理器的特点及发展趋势进行了研究,提出了一种适用异构多核处理器的多主模式实时操作系统构架。这种构架将通信总线中的多主模式引入多核操作系统构架中,采用对称式结构及组件模式设计操作系统模型,使多核处理器中每个内核都可以作为主核实现对资源、任务的实时管理,提高系统性能,同时可以解决主从式操作系统存在的由于处理器核增多而带来的主内核不能满足系统性能要求的瓶颈问题。通过这种单一构架模型可以进行灵活配置,以适应不同结构及功能要求的处理器内核,降低操作系统开发难度。相似文献

7.

基于多核处理器的关联任务并行感知调度算法

梁秋玲张向利张红梅闫坤《计算机工程》2021,47(7):212-217

关联任务在多核处理器上并行调度所产生的通信时延,会对任务调度长度和处理器利用率造成负面影响,为了改善多核系统对关联任务的处理性能,针对关联任务在多核处理器上的调度特点,提出一种并行感知调度算法。计算各任务与终点间的最长路径值,按照该值的降序来分配任务调度次序,在分配处理器内核时兼顾关联度和任务最早可执行时间,设置最佳匹配评价函数。实验结果表明,与busHEFT和DTSV算法相比,该算法具有更短的任务调度时延、更少的通信量以及更高的处理器利用率。相似文献

8.

一种面向多核系统的并行计算任务分配方法 总被引：2，自引：0，他引：2

卢宇彤杨学军所光《计算机研究与发展》2009,46(Z1)

随着多核处理器的普及,目前的大规模并行处理系统普遍采用多核处理器,这对于资源管理和调度提出了更高的要求.提出了基于共享Cache资源划分的方法,建立了面向多核处理器支持Cache资源分配的进程调度模型,设计并实现了并行任务到多核处理器的映射算法,更好地解决了大规模资源管理系统中面向多核处理器的任务分配问题,降低了使用共享Cache的多个进程运行时的相互干扰,提升了应用程序性能. 相似文献

9.

一种适应多核处理器核间通信机制的设计

李静梅王军锋张岐《电脑学习》2011,1(4)

随着单芯片上集成处理器内核数量的增加,在支持多核处理器的应用程序方面,核间通信变得更加重要.通过分析多核运行任务特点,根据处理核上运行任务功能的不同,将处理核分成两类:控制核和计算核.根据对核的分类,提出了一种新的核间通信模型,该模型提供了三种不同的通信通道.运用这三条通道,把应用程序的I/O部分从计算核迁移到控制核来提高多核的利用率,实验结果表明该方式有效提高核间协作以及核间通信的效率,提升处理器的利用率. 相似文献

10.

基于多核处理器的TCP/IP协议栈加速技术

查奇文张武曾学文宋毅《微计算机应用》2013,2(1)

多核处理器已经成为了处理器发展的趋势.在多核处理器上运行Linux操作系统时,由于所有的TCP/IP网络协议处理都以软件的形式在Linux操作系统内核运行,处理效率很低.为了解决这个问题,本文提出了一种基于多核处理器的多核TCP/IP加速协议栈,将多核处理器的处理核心分成两部分.一部分运行Linux操作系统.另一部分处理核心运行实时系统,处理TCP/IP协议栈.由于将TCP/IP协议栈的处理卸载到了实时系统,Linux的中断处理大大减少,并且实时系统直接操作底层硬件资源,没有操作系统的参与,所以多核TCP/IP加速协议栈的处理效率会很高.通实验结果对比,在相同的硬件资源下,多核TCP/IP加速协议栈不仅比Linux TCP/IP协议栈获得了更大的网络处理吞吐率,而且消耗了更低的CPU. 相似文献

11.

Methods of resource management in problem-oriented computing environment

L. B. Sokolinsky A. V. Shamakina 《Programming and Computer Software》2016,42(1):17-26

One of the important classes of computational problems is problem-oriented workflow applications executed in distributed computing environment. A problem-oriented workflow application can be represented by a directed graph whose vertices are tasks and arcs are data flows. For a problem-oriented workflow application, we can get a priori estimates of the task execution time and the amount of data to be transferred between the tasks. A distributed computing environment designed for the execution of such tasks in a certain subject domain is called problem-oriented environment. To efficiently use resources of the distributed computing environment, special scheduling algorithms are applied. Nowadays, a great number of such algorithms have been proposed. Some of them (like the DSC algorithm) take into account specific features of problem-oriented workflow applications. Others (like Min–Min algorithm) take into account many-core structure of nodes of the computational network. However, none of them takes into account both factors. In this paper, a mathematical model of problem-oriented computing environment is constructed, and a new problem-oriented scheduling (POS) algorithm is proposed. The POS algorithm takes into account both specifics of the problem-oriented jobs and multi-core structure of the computing system nodes. Results of computational experiments comparing the POS algorithm with other known scheduling algorithms are presented. 相似文献

12.

基于图形处理器的Cuboid算法

周国亮冯海军何国明陈红李翠平王珊《计算机研究与发展》2009,46(Z2)

近年来,基于图形处理器的通用计算获得了广泛关注,并在多个领域取得了进展.内存OLAP减少了磁盘I/O,但基于单核或多核CPU的计算能力及cache miss成为新的性能瓶颈,从而无法保证好的效率.而图形处理器由于其众多核和高带宽能够很好地适应OLAP计算特性.通过图形处理器来加速任一cuboid的计算,从而提高整个内存OLAP系统的性能.提出了基于图形处理器的分块并行算法,并对算法进行了优化及讨论了数据稀疏和数据分布倾斜等不同条件下的算法.算法通过扩展可以突破内存限制,组成磁盘、内存、显存三级流水线,适应海量数据计算;同时算法也可以作为计算整个cube的基础.通过实验比较,基于图形处理器的算法明显优于四核CPU算法. 相似文献

13.

Improving inter-node communications in multi-core clusters using a contention-free process mapping algorithm

Mohsen Soryani Morteza Analoui Ghobad Zarrinchian 《The Journal of supercomputing》2013,66(1):488-513

High performance clusters, which are established by connecting many computing nodes together, are known as one of main architectures to obtain extremely high performance. Currently, these systems are moving from multi-core architectures to many-core architectures to enhance their computational capabilities. This trend would eventually cause network interfaces to be a performance bottleneck because these interfaces are few in number and cannot handle multiple network requests at a time. The consequence of such issue would be higher waiting time at the network interface queue and lower performance. In this paper, we tackle this problem by introducing a process mapping algorithm, which attempts to improve inter-node communications in multi-core clusters. Our mapping strategy reduces accesses to the network interface by distributing communication-intensive processes among computing nodes, which leads to lower waiting time at the network interface queue. Performance results for synthetic and real workloads reveal that the proposed strategy improves the performance from 8 % up to 90 % in tested cases compared to other methods. 相似文献

14.

一种面向多核处理器并行系统的启发式任务分配算法 总被引：2，自引：0，他引：2

刘轶张昕李鹤钱德沛《计算机研究与发展》2009,46(6)

多核处理器使得并行系统的结构更加复杂并且其中任务个数大大增加,为了在这类系统中高效地进行任务分配,建立了任务分配模型,并提出了一种包含两轮操作的启发式任务分配算法,分别完成进程到处理节点和进程内线程到处理器核的分配.每轮操作经过带回溯的多次迭代处理,最终得到任务到处理器核的分配方案.与穷举查找法和遗传算法的对比测试表明该算法能在较短时间内求得近优解,并且当线程个数增大时,算法的求解时间远小于遗传算法. 相似文献

15.

Heterogeneous parallel computing accelerated generalized likelihood uncertainty estimation (GLUE) method for fast hydrological model uncertainty analysis purpose

Kan Guangyuan He Xiaoyan Ding Liuqian Li Jiren Hong Yang Liang Ke 《Engineering with Computers》2020,36(1):75-96

The generalized likelihood uncertainty estimation (GLUE) is a famous and widely used sensitivity and uncertainty analysis method. It provides a new way to solve the “equifinality” problem encountered in the hydrological model parameter estimation. In this research, we focused on the computational efficiency issue of the GLUE method. Inspired by the emerging heterogeneous parallel computing technology, we parallelized the GLUE in algorithmic level and then implemented the parallel GLUE algorithm on a multi-core CPU and many-core GPU hybrid heterogeneous hardware system. The parallel GLUE was implemented using OpenMP and CUDA software ecosystems for multi-core CPU and many-core GPU systems, respectively. Application of the parallel GLUE for the Xinanjiang hydrological model parameter sensitivity analysis proved its much better computational efficiency than the traditional serial computing technology, and the correctness was also verified. The heterogeneous parallel computing accelerated GLUE method has very good application prospects for theoretical analysis and real-world applications.

相似文献

16.

Architecture-based design and optimization of genetic algorithms on multi- and many-core systems

《Future Generation Computer Systems》2014

A Genetic Algorithm (GA) is a heuristic to find exact or approximate solutions to optimization and search problems within an acceptable time. We discuss GAs from an architectural perspective, offering a general analysis of performance of GAs on multi-core CPUs and on many-core GPUs. Based on the widely used Parallel GA (PGA) schemes, we propose the best one for each architecture. More specifically, the Asynchronous Island scheme, Island/Master–Slave Hierarchy PGA and Island/Cellular Hierarchy PGA are the best for multi-core, multi-socket multi-core and many-core architectures, respectively. Optimization approaches and rules based on a deep understanding of multi- and many-core architectures are also analyzed and proposed. Finally, the comparison of GA performance on multi-core and many-core architectures are discussed. Three real GA problems are used as benchmarks to evaluate our analysis and findings.There are three extra contributions compared to previous work. Firstly, our findings based on deeply analyzing architectures can be applied to all GA problems, even for other parallel computing, not for a particular GA problem. Secondly, the performance of GAs in our work not only concerns execution speed, also the solution quality has not been considered seriously enough. Thirdly, we propose the theoretical performance and optimization models of PGA on multi-core and many-core architectures, finding a more practical result of the performance comparison of the GA on these architectures, so that the speedup presented in this work is more reasonable and is a better guide to practical decisions. 相似文献

17.

高通量众核处理器设计

叶笑春李文明张洋张浩王达范东睿《数据与计算发展前沿》2020,2(1):70-84

【目的】随着云计算、物联网以及人工智能等新型高通量应用的迅速兴起,高性能计算的主要应用从传统的科学与工程计算为主逐步演变为以新兴数据处理为核心,这给传统处理器带来了巨大的挑战,而高通量众核处理器作为面向此类应用的新型处理器结构成为重要的研究方向。【方法】针对上述问题,本文分析了高通量典型应用特征,从数据处理端、传输端以及存储端三个核心环节开展了高通量众核处理器关键技术设计探讨,包括实时任务动态调度、高密度片上网络设计、片上存储层次优化等。【结果】实验结果显示上述机制可以有效确保任务的服务质量,提升网络的数据吞吐率,以及简化片上存储层次。【结论】随着万物互联时代对高并发强实时处理的迫切需求,高通量众核处理器有望成为未来数据中心的核心处理引擎。相似文献

18.

Mapping of option pricing algorithms onto heterogeneous many-core architectures

Shuai Zhang Zhao Wang Ying Peng Bertil Schmidt Weiguo Liu 《The Journal of supercomputing》2017,73(9):3715-3737

The rapid development of technologies and applications in recent years poses high demands and challenges for high-performance computing. Because of their competitive performance/price ratio, heterogeneous many-core architectures are widely used in high-performance computing areas. GPU and Xeon Phi are two popular general-purpose many-core accelerators. In this paper, we demonstrate how heterogeneous many-core architectures, powered by multi-core CPUs, CUDA-enabled GPUs and Xeon Phis can be used as an efficient computational platform to accelerate popular option pricing algorithms. In order to make full use of the compute power of this architecture, we have used a hybrid computing model which consists of two types of data parallelism: worker level and device level. The worker level data parallelism uses a distributed computing infrastructure for task distribution, while the device level data parallelism uses both the multi-core CPUs and many-core accelerators for fast option pricing calculation. Experiments show that our implementations achieve good performance and scalability on this architecture and also outperform other state-of-the-art GPU-based solutions for Monte Carlo European/American option pricing and BSDE European option pricing. 相似文献

19.

Adaptive Allocation of Independent Tasks to Maximize Throughput 总被引：1，自引：0，他引：1

Bo Hong Prasanna V.K. 《Parallel and Distributed Systems, IEEE Transactions on》2007,18(10):1420-1435

In this paper, we consider the task allocation problem for computing a large set of equal-sized independent tasks on a heterogeneous computing system where the tasks initially reside on a single computer (the root) in the system. This problem represents the computation paradigm for a wide range of applications such as SETI@home and Monte Carlo simulations. We consider the scenario where the systems have a general graph-structured topology and the computers are capable of concurrent communications and overlapping communications with computation. We show that the maximization of system throughput reduces to a standard network flow problem. We then develop a decentralized adaptive algorithm that solves a relaxed form of the standard network flow problem and maximizes the system throughput. This algorithm is then approximated by a simple decentralized protocol to coordinate the resources adaptively. Simulations are conducted to verify the effectiveness of the proposed approach. For both uniformly distributed and power law distributed systems, a close-to-optimal throughput is achieved, and improved performance over a bandwidth-centric heuristic is observed. The adaptivity of the proposed approach is also verified through simulations. 相似文献

20.

基于申威众核架构的分组卷积计算加速与优化

王鑫张铭《计算机应用研究》2023,40(6):1745-1749

针对应用普通卷积结构的卷积计算复杂度较高、计算量与参数量较大的问题,提出以国产SW26010P众核处理器为平台的并行分组卷积算法。核心思想是利用独特的数据布局,通过多核映射处理进行并行计算。实验测试结果表明,与单核串行算法相比,使用该并行分组卷积算法可以获得79.5的最高加速比及186.7MFLOPS的最大有效算力。通过SIMD指令对并行分组卷积算法进行数据并行优化后,与使用优化前的并行分组卷积算法相比,可以获得10.2的最高加速比。相似文献