首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Transaction parallelism in database systems is an attractive way of improving transaction performance.There exists two levels of transaction parallelism,inter-transaction level and intra-transaction level.With the advent of multicore processors,new hopes of improving transaction parallelism appear on the scene.The greatest execution efficiency of concurrent transactions comes from the lowest dependencies of them.However,the dependencies of concurrent transactions stand in the way of exploiting parallelism.In this paper,we present Resource Snapshot Model(RSM) for resource modeling in both levels.We propose a non-restarting scheduling algorithm in the inter-transaction level and a processor assignment algorithm in the intra-transaction level in terms of multi-core processors.Through these algorithms,execution performance of transaction streams will be improved in a parallel system with multiple heterogeneous processors that have different number of cores.  相似文献   

2.
一种改进的OpenMP指导调度策略研究   总被引:3,自引:0,他引:3  
在科学计算中,循环结构是最重要的并行对象之一.考虑到负载平衡、调度开销等多方面因素,OpenMP标准提供静态调度、动态调度、指导调度和运行时调度等不同策略.针对指导调度策略不适合递减型循环结构的问题,提出一种改进的new_guided指导调度策略,并在OMPi编译器上加以实现.New_guided调度策略的主要思想是对前半部分的循环采用静态调度,后半部分的循环采用指导调度.针对不同循环结构,在多核处理器上对不同调度策略进行评测.结果表明,在一般情况下,OpenMP默认的静态策略的调度性能最差;对于规则的循环结构和递增的循环结构,动态调度、指导调度和new_guided策略的性能差别不大;对于递减型的循环结构,动态调度和new_guided策略的性能相当,要优于指导调度策略;对于某些极不规则的随机循环结构,动态调度明显优于其他策略,new_guided策略的性能介于动态调度和指导调度之间.  相似文献   

3.
异构多核处理器体系结构设计研究   总被引:2,自引:0,他引:2  
多核技术成为当今处理器发展的重要方向,异构多核处理器由于可将不同类型的计算任务分配到不同类型的处理器核上并行处理,从而为不同需求的应用提供更加灵活、高效的处理机制而成为当今研究的热点.本文从体系结构的角度探讨了异构多核处理器设计中的关键点,从内核结构、互连方式、存储系统、操作系统支持、测试与验证、动态电压调节等方面分析...  相似文献   

4.
In the ongoing quest for greater computational power, efficiently exploiting parallelism is of paramount importance. Architectural trends have shifted from improving single-threaded application performance, often achieved through instruction level parallelism (ILP), to improving multithreaded application performance by supporting thread level parallelism (TLP). Thus, multi-core processors incorporating two or more cores on a single die have become ubiquitous. To achieve concurrent execution on multi-core processors, applications must be explicitly restructured to exploit parallelism, either by programmers or compilers. However, multithreaded parallel programming may introduce overhead due to communications among threads. Though some resources are shared among processor cores, current multi-core processors provide no explicit communications support for multithreaded applications that takes advantage of the proximity between cores. Currently, inter-core communications depend on cache coherence, resulting in demand-based cache line transfers with their inherent latency and overhead. In this paper, we explore two approaches to improve communications support for multithreaded applications. Prepushing is a software controlled data forwarding technique that sends data to destination’s cache before it is needed, eliminating cache misses in the destination’s cache as well as reducing the coherence traffic on the bus. Software Controlled Eviction (SCE) improves thread communications by placing shared data in shared caches so that it can be found in a much closer location than remote caches or main memory. Simulation results show significant performance improvement with the addition of these architecture optimizations to multi-core processors.  相似文献   

5.
赵姗  杨秋松  李明树 《软件学报》2019,30(4):1164-1190
为了满足应用程序的多样化需求,异构多核处理器出现并逐渐进入市场,其中的处理核心(core)具有不同的微架构或者指令集架构(ISA),为应用提供多样化特性支持,比如指令级并行(ILP)、内存级并行(MLP),这些核心协同工作满足整个计算系统的优化目标,比如高性能、低功耗或者良好的能效.然而,目前主流的调度技术主要是针对传统同构处理器架构设计,没有考虑异构硬件能力的差异性.在异构多核处理器环境下,调度技术如何感知硬件的异构特性,为不同类型的应用程序提供更加合适和匹配的硬件资源,这是值得探索的问题.对近年来在该研究领域的成果进行了综述研究,特别是在性能非对称多核处理器架构下,异构调度技术面临的优化目标、分析模型、调度决策和算法评估等主要问题进行了分析和描述,并依次对相关技术进行了系统的总结,最后从软硬件融合的角度对今后的研究工作进行了展望.  相似文献   

6.
为了在多核处理器上充分利用多核资源以提升挖掘性能,提出了一种动态与静态任务分配机制相结合的基于多核的并行序列模式挖掘算法。该算法采用数据并行与任务并行相结合的策略,在各处理器核生成局部序列模式后,再与其他处理器核协同,以最终获得所有的全局序列模式。算法通过并行局部归约技术消除了局部序列的重复生成与计算,并可结合静态与动态任务分配机制解决处理器的负载不均衡问题。理论分析和实验都证实了该算法可有效利用多核计算平台及多核体系结构优势,具有较高的运行效率和加速比。  相似文献   

7.
Recent research has highlighted the potential benefits of single-ISA heterogeneous multicore processors over cost-equivalent homogeneous ones, and it is likely that future processors will integrate cores that have the same instruction set architecture (ISA) but offer different performance and power characteristics. To fully tap into the potential of these processors, the operating system must be aware of the hardware asymmetry when making scheduling decisions and map applications to cores in consideration of their performance characteristics. We propose a Heterogeneity-Aware Signature-Supported (HASS) scheduling algorithm that performs this mapping using per-thread architectural signatures, which are compact summaries of threads’ architectural properties. We implemented HASS in OpenSolaris, and demonstrated that it always outperforms a heterogeneity-agnostic scheduler (by as much as 12.5%) for workloads exhibiting sufficient diversity. Our evaluation also includes an extensive comparison with other heterogeneity-aware schedulers to provide a more clear understanding of the pros and cons behind HASS.  相似文献   

8.
主从式单边异构体系结构的异构多核处理器广泛应用于面向专门应用领域的计算加速,如异构多核嵌入式处理器、DSP、SoC等;高性能的该类处理器也可用于一些大规模科学和工程计算问题的处理。主从式单边异构处理器对编程模型和编译技术提出了很多挑战性问题,如编程模型的选择、编程语言的设计、编译器架构设计以及运行库的设计等。本文分析了这一类处理器结构特点和执行模型,认为功能卸载模型是最适用于这一体系结构的编程模型;并分析了面向功能卸载模型的编程语言设计关键问题,提出了编译系统的架构,讨论了相应的运行库设计问题。  相似文献   

9.
相对于对称多核处理器,非对称多核处理器具有更高的效能,将成为未来并行操作系统中的主流体系结构.对于非对称多核处理器上操作系统的并行任务调度问题,现有的研究假设所有核心频率恒定,缺乏理论分析,也没有考虑算法的效能和通用性.针对该问题,该文首先建立非线性规划模型,分析得出全面考虑并行任务同步特性、核心非对称性以及核心负载的调度原则.然后,基于调度原则提出一个集成调度算法,该算法通过集成线程调度和动态电压频率调整来提高效能,并通过参数调整机制实现了算法的通用性.提出的算法是第一个在非对称多核处理器上结合线程调度和动态电压频率调整的调度算法.实际平台上的实验表明:该算法可适用于多种环境,且效能比其他同类算法高24%~50%.  相似文献   

10.
In this paper, a programming model is presented which enables scalable parallel performance on multi-core shared memory architectures. The model has been developed for application to a wide range of numerical simulation problems. Such problems involve time stepping or iteration algorithms where synchronization of multiple threads of execution is required. It is shown that traditional approaches to parallelism including message passing and scatter-gather can be improved upon in terms of speed-up and memory management. Using spatial decomposition to create orthogonal computational tasks, a new task management algorithm called H-Dispatch is developed. This algorithm makes efficient use of memory resources by limiting the need for garbage collection and takes optimal advantage of multiple cores by employing a “hungry” pull strategy. The technique is demonstrated on a simple finite difference solver and results are compared to traditional MPI and scatter-gather approaches. The H-Dispatch approach achieves near linear speed-up with results for efficiency of 85% on a 24-core machine. It is noted that the H-Dispatch algorithm is quite general and can be applied to a wide class of computational tasks on heterogeneous architectures involving multi-core and GPGPU hardware.  相似文献   

11.
关联任务在多核处理器上并行调度所产生的通信时延,会对任务调度长度和处理器利用率造成负面影响,为了改善多核系统对关联任务的处理性能,针对关联任务在多核处理器上的调度特点,提出一种并行感知调度算法。计算各任务与终点间的最长路径值,按照该值的降序来分配任务调度次序,在分配处理器内核时兼顾关联度和任务最早可执行时间,设置最佳匹配评价函数。实验结果表明,与busHEFT和DTSV算法相比,该算法具有更短的任务调度时延、更少的通信量以及更高的处理器利用率。  相似文献   

12.
Many embedded or portable devices have large demands on running real-time applications. The designers start to adopt the multicore processors in these devices. The multi-core processors, however, cause much higher power consumption than ever before. To resolve this problem, many researchers have focused their studies on designing the energy-aware task scheduling algorithms for multicore processors. Conventional scheduling algorithms assumed that each core can operate under different voltage levels. However, they have not considered the effects of voltage transition overheads, which may defeat the benefit of task scheduling. In this paper, we aim to resolve this scheduling problem with voltage transition overhead consideration. We formalize this problem by an integer linear programming model and propose a heuristic algorithm for a runtime environment. The experimental results show that the proposed online heuristic algorithm can obtain the comparable results with the optimal scheduling derived by the offline integer linear programming approach.  相似文献   

13.
Low-latency and high-throughput processing are key requirements of data stream management systems (DSMSs). Hence, multi-core processors that provide high aggregate processing capacity are ideal matches for executing costly DSMS operators. The recently developed Cell processor is a good example of a heterogeneous multi-core architecture and provides a powerful platform for executing data stream operators with high-performance. On the down side, exploiting the full potential of a multi-core processor like Cell is often challenging, mainly due to the heterogeneous nature of the processing elements, the software managed local memory at the co-processor side, and the unconventional programming model in general. In this paper, we study the problem of scalable execution of windowed stream join operators on multi-core processors, and specifically on the Cell processor. By examining various aspects of join execution flow, we determine the right set of techniques to apply in order to minimize the sequential segments and maximize parallelism. Concretely, we show that basic windows coupled with low-overhead pointer-shifting techniques can be used to achieve efficient join window partitioning, column-oriented join window organization can be used to minimize scattered data transfers, delay-optimized double buffering can be used for effective pipelining, rate-aware batching can be used to balance join throughput and tuple delay, and finally single-instruction multiple-data (SIMD) optimized operator code can be used to exploit data parallelism. Our experimental results show that, following the design guidelines and implementation techniques outlined in this paper, windowed stream joins can achieve high scalability (linear in the number of co-processors) by making efficient use of the extensive hardware parallelism provided by the Cell processor (reaching data processing rates of ≈13 GB/s) and significantly surpass the performance obtained form conventional high-end processors (supporting a combined input stream rate of 2,000 tuples/s using 15 min windows and without dropping any tuples, resulting in ≈8.3 times higher output rate compared to an SSE implementation on dual 3.2 GHz Intel Xeon).  相似文献   

14.
李士刚  胡长军  王珏  李建江 《软件学报》2013,24(12):2782-2796
低功耗及廉价性使得异构多核在超级计算机计算资源中占有重要比例.然而,异构多核具有高带宽及松耦合一致性等特点,获得理想的存储及计算性能需要更多地考虑底层硬件细节.实现了一种针对典型的异构多核Cell BE 处理器的多级并行模型CellMLP,通过C 语言扩展编译指导语句,实现了对数据并行、任务并行以及流水并行编程模型的支持,提高了并行程序生产率.运行支持优化方面,数据并行采用SPE 并行数据传输、双缓冲等优化手段来提高数据传输带宽;任务并行使用一种新式混合任务队列以支持异步任务窃取,降低SPE 线程间竞争,提高了任务并行的可扩展性;流水并行首次使用阻塞信号传输机制实现SPE 线程间的低开销同步操作.实验对Stream,NASBenchmark 及BOTS 等应用进行了测试,结果表明,CellMLP 可对多种典型并行应用进行高效支持.与目前同类编程模型SARC 及CellSs 进行性能对比,其结果表明,CellMLP 实际数据传输带宽以及非规则应用的支持方面具有明显优势.  相似文献   

15.
Complex parallel applications can often be modeled as directed acyclic graphs of coarse-grained application tasks with dependences. These applications exhibit both task and data parallelism, and combining these two (also called mixed parallelism) has been shown to be an effective model for their execution. In this paper, we present an algorithm to compute the appropriate mix of task and data parallelism required to minimize the parallel completion time (makespan) of these applications. In other words, our algorithm determines the set of tasks that should be run concurrently and the number of processors to be allocated to each task. The processor allocation and scheduling decisions are made in an integrated manner and are based on several factors such as the structure of the task graph, the runtime estimates and scalability characteristics of the tasks, and the intertask data communication volumes. A locality-conscious scheduling strategy is used to improve intertask data reuse. Evaluation through simulations and actual executions of task graphs derived from real applications and synthetic graphs shows that our algorithm consistently generates schedules with a lower makespan as compared to Critical Path Reduction (CPR) and Critical Path and Allocation (CPA), two previously proposed scheduling algorithms. Our algorithm also produces schedules that have a lower makespan than pure task- and data-parallel schedules. For task graphs with known optimal schedules or lower bounds on the makespan, our algorithm generates schedules that are closer to the optima than other scheduling approaches.  相似文献   

16.
An algorithm has been developed to dynamically schedule heterogeneous tasks on heterogeneous processors in a distributed system. The scheduler operates in an environment with dynamically changing resources and adapts to variable system resources. It operates in a batch fashion and utilises a genetic algorithm to minimise the total execution time. We have compared our scheduler to six other schedulers, three batch-mode and three immediate-mode schedulers. Experiments show that the algorithm outperforms each of the others and can achieve near optimal efficiency, with up to 100,000 tasks being scheduled  相似文献   

17.
Computing has recently reached an inflection point with the introduction of multi-core processors. On-chip thread-level parallelism is doubling approximately every other year. Concurrency lends itself naturally to allowing a program to trade performance for power savings by regulating the number of active cores, however in several domains users are unwilling to sacrifice performance to save power. We present a prediction model for identifying energy-efficient operating points of concurrency in well-tuned multithreaded scientific applications, and a runtime system which uses live program analysis to optimize applications dynamically. We describe a dynamic, phase-aware performance prediction model that combines multivariate regression techniques with runtime analysis of data collected from hardware event counters to locate optimal operating points of concurrency. Using our model, we develop a prediction-driven, phase-aware runtime optimization scheme that throttles concurrency so that power consumption can be reduced and performance can be set at the knee of the scalability curve of each program phase. The use of prediction reduces the overhead of searching the optimization space while achieving near-optimal performance and power savings. A thorough evaluation of our approach shows a reduction in power consumption of 10.8% simultaneous with an improvement in performance of 17.9%, resulting in energy savings of 26.7%.  相似文献   

18.
随着嵌入式设备应用场景日趋复杂的变化,异构多核架构逐渐成为嵌入式处理器的主流架构.目前,多核处理器主要采用的单操作系统模式在实际应用中存在诸多局限性.为了充分发挥异构处理器的多核特性,针对异构处理器不同核部署相应的操作系统并实现多操作系统协同处理技术至关重要.本文对异构多核处理器(ARM+DSP)操作系统进行了研究,在异构多核平台上成功移植了嵌入式Linux和国产DSP实时操作系统ReWorks;为实现ReWorks与Linux操作系统协同处理,本文对核间通信的关键技术进行分析研究,并以TI公司的AM5718为例,设计了一系列多核异构通信组件.经测试,本文设计的异构通信组件实现了在ARM上对DSP核进行ReWorks操作系统和应用程序的动态加载、Linux与ReWorks核间消息收发、以及Linux与ReWorks的协同计算等功能.  相似文献   

19.
This paper studies energy efficient scheduling of periodic real-time tasks on multi-core processors with voltage islands, in which cores are partitioned into multiple blocks (termed voltage islands) and each block has its own power source to supply voltage. Cores in the same block always operate at the same voltage level, but can be adjusted by using Dynamic Voltage and Frequency Scaling (DVFS). We propose a Voltage Island Largest Capacity First (VILCF) algorithm for energy efficient scheduling of periodic real-time tasks on multi-core processors. It achieves better energy efficiency by fully utilizing the remaining capacity of an island before turning on more islands or increasing the voltage level of the current active islands. We provide detailed theoretical analysis of the approximation ratio of the proposed VILCF algorithm in terms of energy efficiency. In addition, our experimental results show that VILCF significantly outperforms the existing algorithms when there are multiple cores in a voltage island.  相似文献   

20.
Operating systems code is often developed according to principles like simplicity, low overhead, and low memory footprint. Schedulers are no exceptions. A scheduler is usually developed with flexibility in mind, and this restricts the ability to provide real-time guarantees. Moreover, even when schedulers can provide real-time guarantees, it is unlikely that these guarantees are properly quantified using theoretical analysis that carries on to the implementation. To be able to analyze the guarantees offered by operating systems’ schedulers, we developed a publicly available tool that analyzes timing properties extracted from the execution of a set of threads and computes the lower and upper bounds to the supply function offered by the execution platform, together with information about migrations and statistics on execution times. rt-muse evaluates the impact of many application and platform characteristics including the scheduling algorithm, the amount of available resources, the usage of shared resources, and the memory access overhead. Using rt-muse, we show the impact of Linux scheduling classes, shared data and application parallelism, on the delivered computing capacity. The tool provides useful insights on the runtime behavior of the applications and scheduler. In the reported experiments, rt-muse detected some issues arising with the real-time Linux scheduler: despite having available cores, Linux does not migrate SCHED_RR threads which are enqueued behind SCHED_FIFO threads with the same priority.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号