期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using Bytecode Instruction Counting as Portable CPU Consumption Metric

Walter Binder Jarle Hulaas 《Electronic Notes in Theoretical Computer Science》2006,153(2):57

Accounting for the CPU consumption of applications is crucial for software development to detect and remove performance bottlenecks (profiling) and to evaluate the performance of algorithms (benchmarking). Moreover, extensible middleware may exploit resource consumption information in order to detect a resource overuse of client components (detection of denial-of-service attacks) or to charge clients for the resource consumption of their deployed components. The Java Virtual Machine (JVM) is a predominant target platform for application and middleware developers, but it currently lacks standard mechanisms for resource management.In this paper we present a tool, the Java Resource Accounting Framework, Second Edition (J-RAF2), which enables precise CPU management on standard Java runtime environments. J-RAF2 employs a platform-independent CPU consumption metric, the number of executed JVM bytecode instructions. We explain the advantages of this approach to CPU management and present five case studies that show the benefits in different settings. 相似文献

2.

一种基于Java虚拟机的动静结合自适应优化方法

张海军郑艳叶俊白书敬《计算机工程与科学》2019,41(6):981-986

动态语言可以利用程序运行时获取的动态信息,指导程序进行各种优化。但是,现有的Java虚拟机没有将运行过程中收集的信息有效利用,而是在运行结束后直接丢弃,下一次执行程序的时候重新监测、收集、优化需要的信息。基于HotSpot虚拟机提出一种动静结合的自适应优化方法,将运行过程中优化对象迭代搜索到的最佳参数或者信息保存到资源库中;能够从资源库中学习获得适合当前程序的最佳参数或选项,可有效地利用运行过程中积累的数据;资源分析是静态且离线的,不占用应用程序运行的开销;迭代学习的过程中,通过避免冗余实例入库以及从库中剔除噪声实例,保证资源库学习过程的精度与效率。实验表明,该框架对指导Java虚拟机在不同的平台上自适应优化具有一定的实用性。相似文献

3.

Flexible resource monitoring of Java programs

《Journal of Systems and Software》2014

Monitoring resource consumptions is fundamental in software engineering, e.g., in validation of quality requirements, performance engineering, or adaptive software systems. However, resource monitoring does not come for free as it typically leads to overhead in the observed program. Minimizing this overhead and increasing the reliability of the monitored data is a major goal in realizing resource monitoring tools. Typically, this is achieved by limiting capabilities, e.g., supported resources, granularity of the monitoring focus, or runtime access to results. Thus, in practice often several approaches must be combined to obtain relevant information.We describe SPASS-meter, a novel resource monitoring approach for Java and Android Apps, which combines these conflicting capabilities with low overhead. SPASS-meter supports a large set of resources, flexible configuration of the monitoring scope even for user-defined semantic units (components), runtime analysis and online access to monitoring results in a platform-independent way. We discuss the concepts of SPASS-meter, its architecture, realization and validation, the latter in terms of case studies and an overhead analysis based on performance experiments with SPASS-meter, OpenCore and Kieker. SPASS-meter provides a detailed view of the runtime resource consumption at reasonable overhead of less than 3% processing power and 0.5% memory consumption in our experiments. 相似文献

4.

Program transformations for light-weight CPU accounting and control in the Java virtual machine

Jarle Hulaas Walter Binder 《Higher-Order and Symbolic Computation》2008,21(1-2):119-146

相似文献

5.

Advanced array optimizations for high performance functionallanguages

Cann D.C. Evripidou P. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(3):229-239

We discuss and evaluate three optimizations for reducing memory management overhead and data copying costs in SISAL 1.2 programs that build arrays. The first, called framework preconstruction, eliminates superfluous allocate-deallocate sequences in cyclic computations. The second, called aggregate storage subsumption, reduces the management overhead for compound array components. The third, called predictive storage preallocation, eliminates superfluous data copying in filtered array constructions and simplifies their parallelization. We have added all three optimizations to the Optimizing SISAL Compiler with rewarding improvements in SISAL program performance on vector-parallel machines such as those built by Cray Computer Corporation, Convex, and Cray Research 相似文献

6.

Operating System Models in a Concurrent Pascal Environment: Complexity and Performance Considerations

《IEEE transactions on pattern analysis and machine intelligence》1985,(1):136-141

Empirical observations of computer operating systems have shown that operating systems are designed with one of two object oriented strategies: a process or a monitor oriented approach. This paper compares the two design approaches in a Concurrent Pascal environment. Resource manager programs that are implemented in conformity with each model are evaluated using software complexity measures and program performance measures. The average complexity of resource manager processes is 94 percent larger than the average complexity of resource manager monitors. The runtime synchronization overhead of the process model program is two-eight times higher than that of its counterpart. 相似文献

7.

Enabling dynamic file I/O path selection at runtime for parallel file system

Xiuqiao Li Limin Xiao Meikang Qiu Bin Dong Li Ruan 《The Journal of supercomputing》2014,68(2):996-1021

Parallel file systems are experiencing more and more applications from various fields. Various applications have different I/O workload characteristics, which have diverse requirements on accessing storage resources. However, parallel file systems often adopt the “one-size-fits-all” solution, which fails to meet specific application needs and hinders the full exploitation of potential performance. This paper presents a framework to enable dynamic file I/O path selection with fine granularity at runtime. The framework adopts a file handle-rich scheme to allow file systems choose corresponding optimizations to serve I/O requests. Consistency control algorithms are proposed to ensure data consistency while changing optimizations at runtime. One case study on our prototype shows that choosing proper optimizations can improve the I/O performance for small files and large files by up to 40 and 64.4 %, respectively. Another case study shows that the data prefetch performance for real-world application traces can be improved by up to 193 % by selecting correct prefetch patterns. Simulations in large-scale environment also show that our method is scalable and both the memory consumption and the consistency control overhead can be negligible. 相似文献

8.

Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes

《Parallel and Distributed Systems, IEEE Transactions on》2008,19(10):1396-1410

Computing has recently reached an inflection point with the introduction of multi-core processors. On-chip thread-level parallelism is doubling approximately every other year. Concurrency lends itself naturally to allowing a program to trade performance for power savings by regulating the number of active cores, however in several domains users are unwilling to sacrifice performance to save power. We present a prediction model for identifying energy-efficient operating points of concurrency in well-tuned multithreaded scientific applications, and a runtime system which uses live program analysis to optimize applications dynamically. We describe a dynamic, phase-aware performance prediction model that combines multivariate regression techniques with runtime analysis of data collected from hardware event counters to locate optimal operating points of concurrency. Using our model, we develop a prediction-driven, phase-aware runtime optimization scheme that throttles concurrency so that power consumption can be reduced and performance can be set at the knee of the scalability curve of each program phase. The use of prediction reduces the overhead of searching the optimization space while achieving near-optimal performance and power savings. A thorough evaluation of our approach shows a reduction in power consumption of 10.8% simultaneous with an improvement in performance of 17.9%, resulting in energy savings of 26.7%. 相似文献

9.

Hardware-Software Collaborative Techniques for Runtime Profiling and Phase Transition Detection

下载免费PDF全文

Youfeng Wu Yong-Fong Lee 《计算机科学技术学报》2005,20(5):665-675

Dynamic optimization relies on runtime profile information to improve the performance of program execution. Traditional profiling techniques incur significant overhead and are not suitable for dynamic optimization. In this paper, a new profiling technique is proposed, that incorporates the strength of both software and hardware to achieve near-zero overhead profiling. The compiler passes profiling requests as a few bits of information in branch instructions to the hardware, and the processor executes profiling operations asynchronously in available free slots or on dedicated hardware. The compiler instrumentation of this technique is implemented using an Itanium research compiler. The result shows that the accurate block profiling incurs very little overhead to the user program in terms of the program scheduling cycles. For example, the average overhead is 0.6% for the SPECint95 benchmarks. The hardware support required for the new profiling is practical. The technique is extended to collect edge profiles for continuous phase transition detection. It is believed that the hardware-software collaborative scheme will enable many profile-driven dynamic optimizations for EPIC processors such as the Itanium processors. 相似文献

10.

A novel index system describing program runtime characteristics for workload consolidation

Lin WANG Depei QIAN Rui WANG Zhongzhi LUAN Hailong YANG Huaxiang ZHANG 《Frontiers of Computer Science》2019,13(3):489

Workload consolidation is a common method to improve the resource utilization in clusters or data centers. In order to achieve efficient workload consolidation, the runtime characteristics of a program should be taken into consideration in scheduling. In this paper, we propose a novel index system for efficiently describing the program runtime characteristics. With the help of this index system, programs can be classified by the following runtime characteristics: 1) dependence to multi-dimensional resources including CPU, disk I/O, memory and network I/O; and 2) impact and vulnerability to resource sharing embodied by resource usage and resource sensitivity. In order to verify the effectiveness of this novel index system in workload consolidation, a scheduling strategy, Sche-index, using the new index system for workload consolidation is proposed. Experiment results show that compared with traditional least-loaded scheduling strategy, Sche-index can improve both program performance and system resource utilization significantly. 相似文献

11.

A user mode CPU–GPU scheduling framework for hybrid workloads

《Future Generation Computer Systems》2016

Cloud platforms composed of multi-core CPU and many-core Graphics Processing Unit (GPU) have become powerful platforms to host incremental CPU–GPU workloads. In this paper, we study the problem of optimizing the CPU resource management while keeping the quality of service (QoS) of games. To this end, we propose vHybrid, a lightweight user mode runtime framework, in which we integrate a scheduling algorithm for GPU and two algorithms for CPU to efficiently utilize CPU resources with the control accuracy of QoS. vHybrid can maintain the desired QoS with low CPU utilization, while being able to guarantee better QoS performance with little overhead. Our evaluations show that vHybrid saves 37.29% of CPU utilization with satisfactory QoS for hybrid workloads, and reduces three orders of magnitude for QoS fluctuations, without any impact on GPU workloads. 相似文献

12.

Portable virtual cycle accounting for large-scale distributed cycle sharing systems

《Parallel Computing》2007,33(4-5):314-327

CPU cycle sharing among distributed heterogeneous computers is the key function in large-scale volunteer computing and desktop grid applications. One important problem in large-scale distributed cycle sharing system is how to account for the amount of computation work performed by a CPU cycle provider, in a uniform and portable fashion across heterogeneous hardware and operating system platforms. Such an accounting mechanism is especially desirable when CPU resources are traded and a lack of uniform workload accounting will hinder the enforcement of market-driven CPU pricing/trading policies in distributed cycle sharing systems. Java Virtual Machine (JVM) has proved to be a good match for distributed cycle sharing because of its abilities to run applications on a wide variety of platforms without modification (portability) and to host untrusted applications (safety). In this paper, we present the design, implementation, and evaluation of an efficient, application-transparent virtual cycle accounting scheme integrated into JVM. Our scheme achieves portable workload accounting across heterogeneous computing platforms by accounting for JVM virtual instructions instead of real processor cycles. Different from the existing JVM CPU accounting mechanisms that involve bytecode rewriting, our scheme is transparent to applications and does not require visible changes to application and library code interfaces which would break applications that use Reflection API. Moreover, our scheme is efficient via the use of processor registers for accounting. Our experimental results demonstrate both high accounting accuracy and low runtime overhead of virtual cycle accounting. 相似文献

13.

A Fine-Grained Runtime Power/Performance Optimization Method for Processors with Adaptive Pipeline Depth

下载免费PDF全文

姚骏 Shinobu Miwa Hajime Shimada Shinji Tomita 《计算机科学技术学报》2011,26(2):292-301

Recently,a method known as pipeline stage unification (PSU) has been proposed to alleviate the increasing energy consumption problem in modern microprocessors.PSU achieves a high energy efficiency by employing a changeable pipeline depth and its working scheme is eligible for a fine control method.In this paper,we propose a dynamic method to study fine-grained program interval behaviors based on some easy-to-get runtime processor metrics.Using this method to determine the proper PSU configurations during the program execution,we are able to achieve an averaged 13.5% energy-delay-product (EDP) reduction for SPEC CPU2000 integer benchmarks,compared to the baseline processor.This value is only 0.14% larger than the theoretically idealized controlling.Our hardware synthesis result indicates that the proposed method can largely decrease the hardware overhead in both area and delay costs,as compared to a previous program study method which is based on working set signatures. 相似文献

14.

云计算资源调度研究综述 总被引：27，自引：5，他引：22

林伟伟齐德呈《计算机科学》2012,39(10):1-6

资源调度是云计算的一个主要研究方向.首先对云计算资源调度的相关研究现状进行深入调查和分析;然后重点讨论以降低云计算数据中心能耗为目标的资源调度方法、以提高系统资源利用率为目标的资源管理方法、基于经济学的云资源管理模型,给出最小能耗的云计算资源调度模型和最小服务器数量的云计算资源调度模型,并深入分析和比较现有的云资源调度方法;最后指出云计算资源管理的未来重要研究方向:基于预测的资源调度、能耗与性能折衷的调度、面向不同应用负载的资源管理策略与机制、面向计算能力(CPU、内存)和网络带宽的综合资源分配、多目标优化的资源调度,以便为云计算研究提供有益的参考. 相似文献

15.

Platform‐independent profiling in a virtual execution environment

Walter Binder Jarle Hulaas Philippe Moret Alex Villazón 《Software》2009,39(1):47-79

Virtual execution environments, such as the Java virtual machine, promote platform‐independent software development. However, when it comes to analyzing algorithm complexity and performance bottlenecks, available tools focus on platform‐specific metrics, such as the CPU time consumption on a particular system. Other drawbacks of many prevailing profiling tools are high overhead, significant measurement perturbation, as well as reduced portability of profiling tools, which are often implemented in platform‐dependent native code. This article presents a novel profiling approach, which is entirely based on program transformation techniques, in order to build a profiling data structure that provides calling‐context‐sensitive program execution statistics. We explore the use of platform‐independent profiling metrics in order to make the instrumentation entirely portable and to generate reproducible profiles. We implemented these ideas within a Java‐based profiling tool called JP. A significant novelty is that this tool achieves complete bytecode coverage by statically instrumenting the core runtime libraries and dynamically instrumenting the rest of the code. JP provides a small and flexible API to write customized profiling agents in pure Java, which are periodically activated to process the collected profiling information. Performance measurements point out that, despite the presence of dynamic instrumentation, JP causes significantly less overhead than a prevailing tool for the profiling of Java code. Copyright © 2008 John Wiley & Sons, Ltd. 相似文献

16.

Execution profiling blueprints

Alexandre Bergel Felipe Bañados Romain Robbes Walter Binder 《Software》2012,42(9):1165-1192

Although traditional approaches to code profiling help locate performance bottlenecks, they offer only limited support for removing these bottlenecks. The main reason is the lack of detailed visual runtime information to identify and eliminate computation redundancy. We provide three profiling blueprints that help identify and remove performance bottlenecks. The structural distribution blueprint graphically represents the CPU consumption share for each method and class of an application. The behavioral distribution blueprint depicts the distribution of CPU consumption along method invocations and hints at method candidates for caching optimizations. The behavioral evolution blueprint compares profiles of different versions of a software system and highlights performance‐critical changes in the system. These three blueprints helped us to significantly optimize Mondrian, an open source visualization engine. Our implementation is freely available for the Pharo development environment and has been evaluated in a number of different scenarios. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献

17.

可重构资源管理及硬件任务布局的算法研究 总被引：1，自引：0，他引：1

李涛杨愚鲁《计算机研究与发展》2008,45(2):375-382

可重构系统具有微处理器的灵活性和接近于ASIC的计算速度,可重构硬件的动态部分重构能力能够实现计算和重构操作的重叠,使系统能够动态地改变运行任务,可重构资源管理和硬件任务布局方法是提高可重构系统性能的关键.提出了基于任务上边界计算最大空闲矩形的算法(TT-KAMER),能够有效地管理系统的空闲可重构资源;在此基础上使用FF和启发式BF算法进行硬件任务的布局.实验表明,算法能够有效地实现在线资源分配与任务布局,获得较高的资源利用率. 相似文献

18.

Configuration Reusing in On-Line Task Scheduling for Reconfigurable Computing Systems

下载免费PDF全文

Maisam Mansub Bassiri Hadi Shahriar Shahhoseini 《计算机科学技术学报》2011,26(3):463-473

Reconfigurable computing systems can be reconfigured at runtime and support partial reconfigurability which makes us able to execute tasks in a true multitasking manner.To manage such systems at runtime,a reconfigurable operating system is needed.The main part of this operating system is resource management unit which performs on-line scheduling and placement of hardware tasks at runtime.Reconfiguration overhead is an important obstacle that limits the performance of on-line scheduling algorithms in reconfigurable computing systems and increases the overall execution time.Configuration reusing (task reusing) can decrease reconfiguration overhead considerably,particularly in periodic applications or the applications in which the probability of tasks recurrence is high.In this paper,we present a technique called reusing-based scheduling (RBS),for on-line scheduling and placement in which configuration reusing is considered as a main characteristic in order to reduce reconfiguration overhead and decrease total execution time of the tasks.Several experiments have been conducted on the proposed algorithm.Obtained results show considerable improvement in overall execution time of the tasks. 相似文献

19.

MultiCL: Enabling automatic scheduling for task-parallel workloads in OpenCL

《Parallel Computing》2016

The OpenCL specification tightly binds a command queue to a specific device. For best performance, the user has to find the ideal queue-device mapping at command queue creation time, an effort that requires a thorough understanding of the underlying device architectures and kernels in the program. In this paper, we propose to add scheduling attributes to the OpenCL context and command queue objects that can be leveraged by an intelligent runtime scheduler to automatically perform ideal queuedevice mapping. Our proposed extensions enable the average OpenCL programmer to focus on the algorithm design rather than scheduling and to automatically gain performance without sacrificing programmability. As an example, we design and implement an OpenCL runtime for task-parallel workloads, called MultiCL, which efficiently schedules command queues across devices.Our case studies include the SNU benchmark suite and a real-world seismology simulation. To benefit from our runtime optimizations, users have to apply our proposed scheduler extensions to only four source lines of code, on average, in existing OpenCL applications. We evaluate both single-node and multinode experiments and also compare with SOCL, our closest related work. We show that MultiCL maps command queues to the optimal device set in most cases with negligible runtime overhead. 相似文献

20.

基于LXC的Android系统虚拟化技术

谷德贺顾乃杰刘博文苏俊杰贺爱香《计算机系统应用》2017,26(12):58-63

虚拟化技术的研究正逐渐从高性能服务器端转向移动智能设备领域. 现有的虚拟化方案多是采用多内核方案,系统负载高,效率低. 针对车载系统等平台多屏显示以及资源受限等问题,本文提出一种基于容器技术的Android轻量级虚拟化方案. 该方案通过利用Namespace资源隔离机制和Cgroup资源分配机制,使得ARM平台在资源使用较少的同时,能够同时启动多个Android虚拟机,并且各虚拟机上的屏幕显示相互独立. 测试结果表明,该方案的内存占用率较双系统方案降低了7%,而平均CPU使用率较原生Android系统仅增加了1%. 相似文献