期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Precise contention-aware performance prediction on virtualized multicore system

《Journal of Systems Architecture》2017

Multicore systems are widely deployed in both the embedded and the high end computing infrastructures. However, traditional virtualization systems can not effectively isolate shared micro architectural resources among virtual machines (VMs) running on multicore systems. CPU and memory intensive VMs contending for these resources will lead to serious performance interference, which makes virtualization systems less efficient and VM performance less stable. In this paper, we propose a contention-aware performance prediction model on the virtualized multicore systems to quantify the performance degradation of VMs. First, we identify the performance interference factors and design synthetic micro-benchmarks to obtain VM’s contention sensitivity and intensity features that are correlated with VM performance degradation. Second, based on the contention features, we build VM performance prediction model using machine learning techniques to quantify the precise levels of performance degradation. The proposed model can be used to optimize VM performance on multicore systems. Our experimental results show that the performance prediction model achieves high accuracy and the mean absolute error is 2.83%. 相似文献

2.

Using OS Observations to Improve Performance in Multicore Systems 总被引：2，自引：0，他引：2

《Micro, IEEE》2008,28(3):54-66

Today's operating systems don't adequately handle the complexities of Multicore processors. Architectural features confound existing OS techniques for task scheduling, load balancing, and power management. This article shows that the OS can use data obtained from dynamic runtime observation of task behavior to ameliorate performance variability and more effectively exploit multicore processor resources. The authors' research prototypes demonstrate the utility of observation-based policy. 相似文献

3.

实时多核嵌入式系统研究综述

陈刚关楠吕鸣松王义《软件学报》2018,29(7):2152-2176

随着计算机系统与物理世界的结合越来越紧密,实时系统需要承担越来越复杂的运算任务.多核处理器的兴起为同时满足实时性约束和高性能这两方面的需求提供了可能.基于多核处理器的实时嵌入式系统的研究已成为近几年研究的热点.对现有的面向实时多核嵌入式系统的研究工作进行了综述,介绍了实时多核嵌入式系统的关键设计问题,从多核共享资源干扰及管理、多核实时调度、多核实时程序并行化、多核虚拟化技术、多核能耗管理和优化等几个方面对现有研究工作进行了分析和总结,并展望了实时多核系统领域进一步的研究方向. 相似文献

4.

基于任务分类的虚拟CPU调度模型

吴瑾朱智强孙磊郭松辉《计算机应用研究》2020,37(7):2087-2092

为了桥接语义鸿沟,提升I/O性能,需要对执行不同类型负载的虚拟CPU（vCPU）采取不同的调度策略,故而虚拟CPU调度算法亟需优化。基于KVM虚拟化平台提出一种基于任务分类的虚拟CPU调度模型STC（virtual CPU scheduler based on task classification）,它将虚拟CPU（vCPU）和物理CPU分别分为两个类型,分别为short vCPU和long vCPU,以及short CPU 和long CPU,不同类型的vCPU分配至对应类型的物理CPU上执行。同时,基于机器学习理论,STC构建分类器,通过提取任务行为特征将任务分为两类,I/O密集型的任务分配至short vCPU上,而计算密集型任务则分配至long vCPU上。STC在保证计算性能的基础上,提高了I/O的响应速度。实验结果表明,STC与系统默认的CFS相比,网络延时降低18%,网络吞吐率提高17%~25%,并且保证了整个系统的资源共享公平性。相似文献

5.

Software Standards for the Multicore Era

Holt Jim Agarwal Anant Brehmer Sven Domeika Max Griffin Patrick Schirrmeister Frank 《Micro, IEEE》2009,29(3):40-51

Systems architects commonly use multiple cores to improve system performance. Unfortunately, multicore hardware is evolving faster than software technologies. New multicore software standards are necessary in light of the new challenges and capabilities that embedded multicore systems provide. The newly released Multicore Communications API standard targets small-footprint, highly efficient intercore and interchip communications. 相似文献

6.

Design and implementation of process-aware predictive scheduling scheme for virtual machine

Xia Xie Wenzhi Cao Hai Jin Xijiang Ke Shuwen Luo 《The Journal of supercomputing》2014,70(3):1577-1587

Although the virtualization technology is widely applied in cloud computing environment, the virtual machine always suffers from the I/O performance degradation problem because it is difficult for the VM to obtain the process information from the upper layers and consequently result that the different processes from the same VM are not distinguished. To fill this gap, this paper presents a Process-aware disk predictive scheduling algorithm, where the VM manager indirectly learns and utilizes the process information. The process awareness is based on the relationship between the process and the address space. As a consequence, the I/O request can be distinguished. Moreover, the Process-aware predictive scheduling scheme is implemented and the result is tested. The experimental results illustrate that the Process-aware predictive scheduling algorithm is feasible and the disk I/O speed can also be improved significantly. 相似文献

7.

一种灵活高效的虚拟CPU调度算法

刘珂男童薇冯丹刘景宁张炬《软件学报》2017,28(2):398-410

目前,虚拟化已经广泛应用于数据中心,但主流的虚拟CPU调度策略并没有实现对I/O性能的保障,尤其当延时敏感型负载的虚拟机和计算敏感型负载的虚拟机竞争CPU资源时,其性能显著下降.针对上述问题,本文提出了一种灵活、高效的虚拟CPU调度算法（FLMS）.FLMS通过采用虚拟机分类、虚拟CPU绑定、多类时间片等技术降低了虚拟机的响应延时,同时基于多处理器架构重新设计了负载均衡策略,优化了虚拟CPU迁移.FLMS通用于目前主流的虚拟化方案,在软件虚拟化方式下相比于最新的优化方案延时降低了30%,带宽有10%的提升;在使用硬件辅助虚拟化的系统中,通过FLMS能够获得接近原生系统的I/O性能,并且保证了整个系统的公平性. 相似文献

8.

基于轻量操作系统的虚拟机内省与内存安全监测

马乐乐岳晓萌王玉庆杨秋松《计算机应用》2015,35(6):1555-1559

针对在传统特权虚拟机中利用虚拟机内省实时监测其他虚拟机内存安全的方法不利于安全模块与系统其他部分的隔离,且会拖慢虚拟平台的整体性能的问题,提出基于轻量操作系统实现虚拟机内省的安全架构,并提出基于内存完整性度量的内存安全监测方案。通过在轻量客户机中实现内存实时检测与度量,减小了安全模块的可攻击面,降低了对虚拟平台整体性能的影响。通过无干涉的内存度量和自定义的虚拟平台授权策略增强了安全模块的隔离性。基于Xen中的小型操作系统Mini-OS实现了虚拟机内省与内存检测系统原型,评估表明该方案比在特权虚拟机中实现的同等功能减少了92%以上的性能损耗,有效提高了虚拟机内省与实时度量的效率。相似文献

9.

Effects of dynamic isolation for full virtualized RTOS and GPOS guests

《Future Generation Computer Systems》2017

Industrial systems currently include not only control processing with real-time operating system (RTOS) but also information processing with general-purpose operating system (GPOS). Multicore-based virtualization is an attractive option to provide consolidated environment when GPOS and RTOS are put in service on a single hardware platform. Researches on this technology have predominantly focused on the schedulability of RTOS virtual machines (VMs) by completely dedicated physical-CPUs (pCPUs) but have rarely considered parallelism or the throughput of the GPOS. However, it is also important that the multicore-based hypervisor adaptively selects pCPU assignment policy to efficiently manage resources in modern industrial systems. In this paper, we report our study on the effects of dynamic isolation when two mixed criticality systems are working on one platform. Based on our investigation of mutual interferences between RTOS VMs and GPOS VMs, we found explicit effects of dynamic isolation by special events. While maintaining low RTOS VMs scheduling latency, a hypervisor should manage pCPUs assignment by event-driven and threshold-based strategies to improve the throughput of GPOS VMs. Furthermore, we deal with implicit negative effects of dynamic isolation caused by the synchronization inside a GPOS VM, then propose a process of urgent boosting with dynamic isolation. All our methods are implemented in a real hypervisor, KVM. In experimental evaluation with benchmarks and an automotive digital cluster application, we analyzed that proposed dynamic isolation guarantees soft real-time operations for RTOS tasks while improving the throughput of GPOS tasks on a virtualized multicore system. 相似文献

10.

HSCS: a hybrid shared cache scheduling scheme for multiprogrammed workloads

Jingyu ZHANG Chentao WU Dingyu YANG Yuanyi CHEN Xiaodong MENG Liting XU Minyi GUO 《Frontiers of Computer Science》2018,12(6):1090-1104

The traditional dynamic random-access memory (DRAM) storage medium can be integrated on chips via modern emerging 3D-stacking technology to architect a DRAM shared cache in multicore systems. Compared with static random-access memory (SRAM), DRAM is larger but slower. In the existing research, a lot of work has been devoted to improving the workload performance using SRAM and stacked DRAM together in shared cache systems, ranging from SRAM structure improvement to optimizing cache tags and data access. However, little attention has been paid to designing a shared cache scheduling scheme for multiprogrammed workloads with different memory footprints in multicore systems. Motivated by this, we propose a hybrid shared cache scheduling scheme that allows a multicore system to utilize SRAM and 3D-stacked DRAM efficiently, thus achieving better workload performance. This scheduling scheme employs (1) a cache monitor, which is used to collect cache statistics; (2) a cache evaluator, which is used to evaluate the cache information during the process of programs being executed; and (3) a cache switcher, which is used to self-adaptively choose SRAM or DRAM shared cache modules. A cache data migration policy is naturally developed to guarantee that the scheduling scheme works correctly. Extensive experiments are conducted to evaluate the workload performance of our proposed scheme. The experimental results showed that our method can improve the multiprogrammed workload performance by up to 25% compared with state-of-the-art methods (including conventional and DRAM cache systems). 相似文献

11.

Performance‐based parallel loop self‐scheduling using hybrid OpenMP and MPI programming on multicore SMP clusters

Chao‐Tung Yang Chao‐Chin Wu Jen‐Hsiang Chang 《Concurrency and Computation》2011,23(8):721-744

Parallel loop self‐scheduling on parallel and distributed systems has been a critical problem and it is becoming more difficult to deal with in the emerging heterogeneous cluster computing environments. In the past, some self‐scheduling schemes have been proposed as applicable to heterogeneous cluster computing environments. In recent years, multicore computers have been widely included in cluster systems. However, previous researches into parallel loop self‐scheduling did not consider certain aspects of multicore computers; for example, it is more appropriate for shared‐memory multiprocessors to adopt Open Multi‐Processing (OpenMP) for parallel programming. In this paper, we propose a performance‐based approach using hybrid OpenMP and MPI parallel programming, which partition loop iterations according to the performance weighting of multicore nodes in a cluster. Because iterations assigned to one MPI process are processed in parallel by OpenMP threads run by the processor cores in the same computational node, the number of loop iterations allocated to one computational node at each scheduling step depends on the number of processor cores in that node. Experimental results show that the proposed approach performs better than previous schemes. Copyright © 2010 John Wiley & Sons, Ltd. 相似文献

12.

Toward scalable Web systems on multicore clusters: making use of virtual machines

Xuanhua Shi Hai Jin Hongbo Jiang Xiaodong Pan Dachuan Huang Bo Yu 《The Journal of supercomputing》2012,61(1):27-45

Limited by the existing design pattern, a lot of existing softwares have not yet taken full use of multicore processing power, incurring low utilization of hardware, even a bottleneck of the whole system. To address this problem, in this paper, we propose a VM-based Web system on multicore clusters. The VM-based Web system is scheduled by Linux Virtual Server (LVS) and we implement the web server with Tomcat. In the mean time, we develop VNIX, a set of VM management toolkit, to facilitate managing VMs on clusters, aiming at improving the usage of multicore CPU power. To reduce resources contention among VMs, we propose to deploy LVS schedulers distributively on different physical nodes. To evaluate our approach, we conduct extensive experiments to compare VM-based Web system with classical physical machine-based Web system. Our experimental results demonstrate that the proposed VM-based Web system can result in throughput improvements of up to three times compared with the same multicore clusters, with an error rate at the server side as low as 20% of that of classic systems. 相似文献

13.

Intel virtualization technology 总被引：3，自引：0，他引：3

Uhlig R. Neiger G. Rodgers D. Santoni A.L. Martins F.C.M. Anderson A.V. Bennett S.M. Kagi A. Leung F.H. Smith L. 《Computer》2005,38(5):48-56

A virtualized system includes a new layer of software, the virtual machine monitor. The VMM's principal role is to arbitrate accesses to the underlying physical host platform's resources so that multiple operating systems (which are guests of the VMM) can share them. The VMM presents to each guest OS a set of virtual platform interfaces that constitute a virtual machine (VM). Once confined to specialized, proprietary, high-end server and mainframe systems, virtualization is now becoming more broadly available and is supported in off-the-shelf systems based on Intel architecture (IA) hardware. This development is due in part to the steady performance improvements of IA-based systems, which mitigates traditional virtualization performance overheads. Intel virtualization technology provides hardware support for processor virtualization, enabling simplifications of virtual machine monitor software. Resulting VMMs can support a wider range of legacy and future operating systems while maintaining high performance. 相似文献

14.

Memory aware load balance strategy on a parallel branch‐and‐bound application

Juliana M.N. Silva Cristina Boeres Lúcia M.A. Drummond Artur A. Pessoa 《Concurrency and Computation》2015,27(5):1122-1144

The latest trends in high performance computing systems show an increasing demand on the use of a large scale multicore system in an efficient way so that high compute‐intensive applications can be executed reasonably well. However, the exploitation of the degree of parallelism available at each multicore component can be limited by the poor utilization of the memory hierarchy. Actually, the multicore architecture introduces some distinct features that are already observed in shared memory and distributed environments. One example is that subsets of cores can share different subsets of memory. In order to achieve high performance, it is imperative that a careful allocation scheme of an application is carried out on the available cores, based on a scheduling specification that considers not only processors characteristics but also memory contention. This paper proposes a multicore cluster representation that captures relevant performance characteristics in multicores systems such as the influence of memory hierarchy and contention on application performance. Improved performance was achieved by a branch‐and‐bound application applied to the partitioning sets problem that incorporated a memory aware load balancing strategy based on the proposed multicore cluster representation. An in‐depth analysis on this application execution showed its applicability to modern systems. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献

15.

Fuzzy logic based energy and throughput aware design space exploration for MPSoCs

《Microprocessors and Microsystems》2016

Multicore architectures were introduced to mitigate the issue of increase in power dissipation with clock frequency. Introduction of deeper pipelines, speculative threading etc. for single core systems were not able to bring much increase in performance as compared to their associated power overhead. However for multicore architectures performance scaling with number of cores has always been a challenge. The Amdahl’s law shows that the theoretical maximum speedup of a multicore architecture is not even close to the multiple of number of cores. With less amount of code in parallel having more number of cores for an application might just contribute in greater power dissipation instead of bringing some performance advantage. Therefore there is a need of an adaptive multicore architecture that can be tailored for the application in use for higher energy efficiency. In this paper a fuzzy logic based design space exploration technique is presented that is targeted to optimize a multicore architecture according to the workload requirements in order to achieve optimum balance between throughput and energy of the system. 相似文献

16.

基于Xen的I／O准虚拟化驱动研究 总被引：3，自引：2，他引：1

下载免费PDF全文

胡冷非李小勇《计算机工程》2009,35(23):258-259

针对全虚拟化下客户端虚拟机无法“感知”虚拟机监视器的问题,对基于Xen的I／O准虚拟化驱动进行研究,通过实验可知,准虚拟化驱动能够消除全虚拟化方式下虚拟机监视器“黑箱”特性的限制,可以实现和虚拟机监视器的密切配合,从而提高I／O性能。在虚拟机Xen的全虚拟化环境中加入准虚拟化驱动,采用对比测试方法验证了该驱动能大幅提升网络性能。相似文献

17.

Energy-efficiency enhanced virtual machine scheduling policy for mixed workloads in cloud environments

Peng Xiao Zhigang Hu Dongbo Liu Xizheng Zhang Xilong Qu 《Computers & Electrical Engineering》2014

Virtualization technology is an effective approach to improving the energy-efficiency in cloud platforms; however, it also introduces many energy-efficiency losses especially when I/O virtualization is involved. In this paper, we present an energy-efficiency enhanced virtual machine (VM) scheduling policy, namely Share-Reclaiming with Collective I/O (SRC-I/O), with aiming at reducing the energy-efficiency losses caused by I/O virtualization. The proposed SRC-I/O scheduler allows VMs to reclaim extra CPU shares in certain conditions so as to increase CPU utilization. Meanwhile, it separates I/O-intensive VMs from CPU-intensive ones and schedules them in a collective manner, so as to reduce the context-switching cost when scheduling mixed workloads. Extensive experiments are conducted on various platforms to investigate the performance of the proposed scheduler. The results indicate that when the system is in presence of mixed workloads, SRC-I/O scheduler outperforms many existing VM schedulers in terms of energy-efficiency and I/O responsiveness. 相似文献

18.

Hardware transactional memory: A high performance parallel programming model

Chen Fu Dongxin Wen Xiaoqun Wang Xiaozong Yang 《Journal of Systems Architecture》2010,56(8):384-391

The transactional memory in multicore processors has been a major research area over past several years. Many transactional memory systems have been proposed to be used to solve the synchronization problem of multicore processors. Hardware transactional memory is one of the critical methods to speedup communications in multicore environment. In this paper, we give a review of the current hardware transactional memory systems for multicore processors. We take a top-down approach to characterizing and classifying various hardware transactional design issues and present a taxonomy of hardware transactional memory systems which is consist of the five fundamental design issues: version management, conflict detection, contention management, virtualization and nesting. Finally, we discussed the active research challenge: the relationship between transactional memory and Input/Output operations and system calls. 相似文献

19.

基于机器学习的异构感知多核调度方法

安鑫康安夏近伟李建华陈田任福继《计算机应用》2020,40(10):3081-3087

异构多核处理器已成为现代嵌入式系统的主流解决方案,而好的在线映射或调度方法对其充分发挥高性能和低功耗的优势起着至关重要的作用。针对异构多核处理系统上的应用程序动态映射和调度问题,提出一种基于机器学习、能快速准确评估程序性能和程序行为阶段变化的检测技术来有效确定重映射时机从而最大化系统性能的映射和调度解决方案。该方案一方面通过合理选择处理核和程序运行时的静态和动态特征来有效感知异构处理所带来的计算能力和工作负载运行行为的差异,从而能够构建更加准确的预测模型;另一方面通过引入阶段检测来尽可能减少在线映射计算的次数,从而能够提供更加高效的调度方案。最后,在SPLASH-2数据集上验证了所提出调度方案的有效性。实验结果表明,与Linux默认的完全公平调度（CFS）方法相比,所提出的方法在系统计算性能方面提高了52%,在CPU资源利用率上提高了9.4%。这表明所提方法在系统计算性能和CPU资源利用率方面具备优良的性能,可以有效提升异构多核系统的应用动态映射和调度效果。相似文献

20.

Energy Efficient Scheduling of Real-Time Tasks on Multicore Processors

Euiseong Seo Jinkyu Jeong Seonyeong Park Joonwon Lee 《Parallel and Distributed Systems, IEEE Transactions on》2008,19(11):1540-1552

Multicore processors deliver a higher throughput at lower power consumption than unicore pro- cessors. In the near future, they will thus be widely used in mobile real-time systems. There have been many research on energy-efficient scheduling of real-time tasks using DVS. These approaches must be modified for multicore processors, however, since normally all the cores in a chip must run at the same performance level. Thus blindly adopting existing DVS algorithms which do not consider the restriction will result in a waste of energy. This article suggests Dynamic Repartitioning algorithm based on existing partitioning approaches of multiprocessor systems. The algorithm dynamically balances the task loads of multiple cores to optimize power consumption during execution. We also suggest Dynamic Core Scaling algorithm which adjusts the number of active cores to reduce leakage power consumption under low load conditions. Simulation results show that Dynamic Repartitioning can produce energy savings of about 8% even with the best energy-efficient partitioning algorithm. The results also show that Dynamic Core Scaling can reduce energy consumption by about 26% under low load conditions. 相似文献