期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A multiclass network model of a demand paging computer system

A. Krzesinski P. Teunissen 《Acta Informatica》1978,9(4):331-343

Summary The paper presents a multiclass network model of a demand paging computer system. The powerful class and class changing mechanism of a multiclass network are used to model the serial co-operation of user and system functions in program execution. The workload itself is modelled as a mix of programs, each with different CPU, I/O, paging and locality characteristics. The effect of paging, I/O and program termination overheads on systems performance is evaluated, as well as the transient overhead of rapid page loading upon program activation. The model is then used to compute the optimal multiprogramming level and the optimal multiprogramming mix as a function of workload composition and system overhead. The model is finally used to confirm certain heuristic load control rules proposed by Denning et al. 相似文献

2.

Modeling and measuring multiprogramming and system overheads on a shared-memory multiprocessor: Case study

R. T. Dimpsey R. K. Iyer 《Journal of Parallel and Distributed Computing》1991,12(4)

This paper presents methodologies capable of quantifying multiprogramming (MP) overhead on a computer system. Two methods which quantify the lower bound on MP overhead, along with a method to determine MP overhead present in real workloads, are introduced. The techniques are illustrated by determining the percentage of parallel processing time consumed by MP overhead on Alliant multiprocessors. The real workload MP overhead measurements, as well as measurements of other overhead components such as kernel lock spinning, are then used in a comprehensive case study of performance degradation due to overheads. It is found that MP overhead accounts for well over half of the total system overhead. Kernel lock spinning is determined to be a major component of both MP and total system overhead. Correlation analysis is used to uncover underlying relationships between overheads and workload characteristics. It is found that for the workloads studied, MP overhead in the parallel environment is not statistically dependent on the number of parallel jobs being multiprogrammed. However, because of increased kernel contention, serial jobs, even those executing on peripheral processors, are responsible for variation in MP overhead. 相似文献

3.

A measurement-based model to predict the performance impact ofsystem modifications: a case study

Dimpsey R.T. Iyer R.K. 《Parallel and Distributed Systems, IEEE Transactions on》1995,6(1):28-40

The paper presents a performance case study of parallel jobs executing in real multi user workloads. The study is based on a measurement based model capable of predicting the completion time distribution of the jobs executing under real workloads. The model constructed is also capable of predicting the effects of system design changes on application performance. The model is a finite state, discrete time Markov model with rewards and costs associated with each state. The Markov states are defined from real measurements and represent system/workload states in which the machine has operated. The paper places special emphasis on choosing the correct number of states to represent the workload measured. Specifically, the performance of computationally bound, parallel applications executing in real workloads on an Alliant FX/80 is evaluated. The constructed model is used to evaluate scheduling policies, the performance effects of multiprogramming overhead, and the scalability of the Alliant FX/8O in real workloads. The model identifies a number of available scheduling policies which would improve the response time of parallel jobs. In addition, the model predicts that doubling the number of processors in the current configuration would only improve response time for a typical parallel application by 25%. The model recommends a different processor configuration to more fully utilize extra processors. The paper also presents empirical results which validate the model created 相似文献

4.

基于GPU加速的NFV系统的框架设计和性能优化

郭良琛张凯《计算机应用与软件》2022,(2):113-119+137

GPU可以显著提升一些网络功能的性能,但在GPU加速的网络功能虚拟化(Network Function Virtualization,NFV)系统中,由于网络功能需要以虚拟化方式独立开发和部署,其CPU-GPU处理流水线的CPU处理阶段会有较大的额外开销,使得网络功能GPU加速的效果不明显。为解决该问题,提出一个新的支持GPU加速的NFV系统框架。利用服务链中网络功能之间共享数据和流状态的特性,设计了共享式状态管理机制,以减少网络功能中重复性的协议栈处理和流状态管理开销,提升GPU加速的效果。对原型系统进行评估表明,相比于现有的系统框架,该框架能够显著地降低多种GPU加速的网络功能中CPU处理阶段的时间开销,并在常见的网络功能服务链上实现了高达2倍的吞吐量提升。相似文献

5.

Integrated Performance Models for Distributed Processing in Computer Communication Networks

《IEEE transactions on pattern analysis and machine intelligence》1985,(10):1203-1216

This paper deals with the analysis of large-scale closed queueing network (QN) models which are used for the performance analysis of computer communication networks (CCN's). The computer systems are interconnected by a wide-area network. Users accessing local/remote computers are affected by the contention (queueing delays) at the computer systems and the communication subnet. The computational cost of analyzing such models increases exponentially with the number of user classes (chains), even when the QN is tractable (product-form). In fact, the submodels of the integrated model are generally not product-form, e.g., due to blocking at computer systems (multiprogramming level constraints) and in the communication subnet (window flow control constraints). Two approximate solution methods are proposed in this paper to analyze the integrated QN model. Both methods use decomposition and iterative techniques to exploit the structure of the QN model such that computational cost is proportional to the number of chains. The accuracy of the solution methods is validated against each other and simulation. The model is used to study the effect that channel capacity assignments, window sizes for congestion control, and routing have on system performance. 相似文献

6.

一种网络协议的自适应控制方法 总被引：4，自引：0，他引：4

潘清李未马世龙张晓清孙凌云《计算机学报》2004,27(12):1612-1616

网络服务器过载问题已经成为日益关注的问题，过载会引起服务器系统性能急剧下降，客户请求长时间得不到响应．国内外已经提出很多有关网络服务器过载的解决方案，其中，比较有效的方案是引入线程机制，通过中断和查询相结合来解决这个问题．文章从改进传统的操作系统内核事件处理机制出发，在传统的事件处理机制中引入了反馈控制，提出了基于自适应的网络协议控制方法，该方法通过控制网络协议处理中的硬件中断和软件中断处理，不仅避免了系统接收活锁问题，同时，也避免了由于过量中断所造成的网络协议处理和应用软件“饥饿”的问题、测试表明在过载情况下，UDP性能提高了100％，TCP性能也得到了明显的改善．与其它方法相比，这些方法中有些无法克服接收活锁的问题，另外一些方法由于采用了查询机制，产生了延迟和额外的开销，这就使得文章提出的方法在性能上比采用查询机制的方法更好。相似文献

7.

Libra: a computational economy‐based job scheduling system for clusters

Jahanzeb Sherwani Nosheen Ali Nausheen Lotia Zahra Hayat Rajkumar Buyya 《Software》2004,34(6):573-590

Clusters of computers have emerged as mainstream parallel and distributed platforms for high‐performance, high‐throughput and high‐availability computing. To enable effective resource management on clusters, numerous cluster management systems and schedulers have been designed. However, their focus has essentially been on maximizing CPU performance, but not on improving the value of utility delivered to the user and quality of services. This paper presents a new computational economy driven scheduling system called Libra, which has been designed to support allocation of resources based on the users' quality of service requirements. It is intended to work as an add‐on to the existing queuing and resource management system. The first version has been implemented as a plugin scheduler to the Portable Batch System. The scheduler offers market‐based economy driven service for managing batch jobs on clusters by scheduling CPU time according to user‐perceived value (utility), determined by their budget and deadline rather than system performance considerations. The Libra scheduler has been simulated using the GridSim toolkit to carry out a detailed performance analysis. Results show that the deadline and budget based proportional resource allocation strategy improves the utility of the system and user satisfaction as compared with system‐centric scheduling strategies. Copyright © 2004 John Wiley & Sons, Ltd. 相似文献

8.

A Parallel Processor Operating System Comparison

《IEEE transactions on pattern analysis and machine intelligence》1977,(6):467-475

Three different operating system strategies for a parallel processor computer system are compared, and the most effective strategy for given job loads is determined. The three strategies compare uniprogramming versus multiprogramming and distributed operating systems versus dedicated processor operating systems. The level of evaluation includes I/O operations, resource allocation, and interprocess communication. The results apply to architectures where jobs may be scheduled to processors on the basis of processor availability, memory availability, and the availability of one other resource used by all jobs. 相似文献

9.

Processor utilization study

Jzsef Tomk 《Computers & Mathematics with Applications》1975,1(3-4):337-344

相似文献

10.

Autonomic Clouds on the Grid 总被引：3，自引：0，他引：3

Michael A. Murphy Linton Abraham Michael Fenn Sebastien Goasguen 《Journal of Grid Computing》2010,8(1):1-18

Computational clouds constructed on top of existing Grid infrastructure have the capability to provide different entities with customized execution environments and private scheduling overlays. By designing these clouds to be autonomically self-provisioned and adaptable to changing user demands, user-transparent resource flexibility can be achieved without substantially affecting average job sojourn time. In addition, the overlay environment and physical Grid sites represent disjoint administrative and policy domains, permitting cloud systems to be deployed non-disruptively on an existing production Grid. Private overlay clouds administered by, and dedicated to the exclusive use of, individual Virtual Organizations are termed Virtual Organization Clusters. A prototype autonomic cloud adaptation mechanism for Virtual Organization Clusters demonstrates the feasibility of overlay scheduling in dynamically changing environments. Commodity Grid resources are autonomically leased in response to changing private scheduler loads, resulting in the creation of virtual private compute nodes. These nodes join a decentralized private overlay network system called IPOP (IP Over P2P), enabling the scheduling and execution of end user jobs in the private environment. Negligible overhead results from the addition of the overlay, although the use of virtualization technologies at the compute nodes adds modest service time overhead (under 10%) to computationally-bound Grid jobs. By leasing additional Grid resources, a substantial decrease (over 90%) in average job queuing time occurs, offsetting the service time overhead. 相似文献

11.

A dynamic resource management in mobile agent by artificial neural network

《Journal of Network and Computer Applications》2010,33(6):672-681

In this paper, a resource management for dynamic load balancing in mobile agent by artificial neural network scheme (ANN-DLB) is presented to maximize the number of the served tasks in developing high performance cluster. This dynamic load balance with the growth of the service type and user number in the mobile networks of the higher performance is required in service provision and throughput. Most of the conventional policies are used in load indices with the threshold value to decide the load status of the agent hosts by CPU or memory. The main factor influencing the workload is the competitions among the computing resources such as CPU, memory, I/O and network. There are certain I/O data of the intensive applications where load balancing becomes the important issue. This relationship between the computing resources is very complex to define the rules for deciding the workload. This paper proposed a new dynamic load balancing for evaluating the agent hosts’ workload with the artificial neural network (ANN). By applying the automatic learning of the back-propagation network (BPN) model can establish the ANN model and also can measure the agent host loading with five inputs: CPU, memory, I/O, network and run-queue length. The structure of the load balancing system is composed of three design agents: the load index agent (LIA), the resource management agent (RMA) and the load transfer agent (LTA). These experimental results reveal that the proposed ANN-DLB yields better performance than the other methods. These results demonstrate that the proposed method has high throughput, short response time and turnaround time, and less agent host negotiation complexity and migrating tasks than the previous methods. 相似文献

12.

A hybrid solution method for CFD applications on GPU-accelerated hybrid HPC platforms

《Future Generation Computer Systems》2016

Heterogeneous multiprocessor systems, where commodity multicore processors are coupled with graphics processing units (GPUs), have been widely used in high performance computing (HPC). In this work, we focus on the design and optimization of Computational Fluid Dynamics (CFD) applications on such HPC platforms. In order to fully utilize the computational power of such heterogeneous platforms, we propose to design the performance-critical part of CFD applications, namely the linear equation solvers, in a hybrid way. A hybrid linear solver includes both one CPU version and one GPU version of code for solving a linear equations system. When a hybrid linear equation solver is invoked during the CFD simulation, the CPU portion and the GPU portion will be run on corresponding processing devices respectively in parallel according to the execution configuration. Furthermore, we propose to build functional performance models (FPMs) of processing devices and use FPM-based heterogeneous decomposition method to distribute workload between heterogeneous processing devices, in order to ensure balanced workload and optimized communication overhead. Efficiency of this approach is demonstrated by experiments with numerical simulation of lid-driven cavity flow on both a hybrid server and a hybrid cluster. 相似文献

13.

负载类型相关的Xen虚拟机系统性能模型

余勇车建华徐焕良蒋诚智《计算机科学》2016,43(11):210-214

针对Xen虚拟机系统执行网络I/O密集型负载时容易耗尽Domain0的CPU资源而过载和执行计算密集型负载时在客户域平均性能与数目之间存在线性规划的问题,提出了两个负载类型相关的性能模型。首先,通过分析Xen虚拟机系统处理网络I/O操作的CPU资源消耗规律,建立了CPU核共享和CPU核隔离两种情况下的客户域网络I/O操作请求次数计算模型;然后,通过分析多个相同客户域并行执行计算密集型负载的平均性能与一个相同客户域执行相同负载的性能表现之间的关系,建立了并行执行计算密集型负载的客户域平均性能分析模型。实验结果表明,两个性能模型能够有效地限制客户域提交的网络I/O操作请求次数以防止Xen虚拟机系统过载,并求解给定资源配置情况下执行计算密集型负载的Xen虚拟机系统客户域伸缩性数目。相似文献

14.

CPU + GPU scheduling with asymptotic profiling

Zhenning Wang Long Zheng Quan Chen Minyi Guo 《Parallel Computing》2014

Hybrid systems with CPU and GPU have become new standard in high performance computing. Workload can be split and distributed to CPU and GPU to utilize them for data-parallelism in hybrid systems. But it is challenging to manually split and distribute the workload between CPU and GPU since the performance of GPU is sensitive to the workload it received. Therefore, current dynamic schedulers balance workload between CPU and GPU periodically and dynamically. The periodical balance operation causes frequent synchronizations between CPU and GPU. It often degrades the overall performance because of the overhead of synchronizations. To solve the problem, we propose a Co-Scheduling Strategy Based on Asymptotic Profiling (CAP). CAP dynamically splits and distributes the workload to CPU and GPU with only a few synchronizations. It adopts the profiling technique to predict performance and partitions the workload according to the performance. It is also optimized for GPU’s performance characteristics. We examine our proof-of-concept system with six benchmarks and evaluation result shows that CAP produces up to 42.7% performance improvement on average compared with the state-of-the-art co-scheduling strategies. 相似文献

15.

Hybrid fluid modeling approach for performance analysis of P2P live video streaming systems

Zoran Kotevski Pece Mitrevski 《Peer-to-Peer Networking and Applications》2014,7(4):410-426

In this paper a hybrid modeling approach with different modeling formalisms and solution methods is employed in order to analyze the performance of peer to peer live video streaming systems. We conjointly use queuing networks and Fluid Stochastic Petri Nets, developing several performance models to analyze the behavior of rather complex systems. The models account for: network topology, peer churn, scalability, peer average group size, peer upload bandwidth heterogeneity and video buffering, while introducing several features unconsidered in previous performance models, such as: admission control for lower contributing peers, control traffic overhead and internet traffic packet loss. Our analytical and simulation results disclose the optimum number of peers in a neighborhood, the minimum required server upload bandwidth, the optimal buffer size and the influence of control traffic overhead. The analysis reveals the existence of a performance switch-point (i.e. threshold) up to which system scaling is beneficial, whereas performance steeply decreases thereafter. Several degrees of degraded service are introduced to explore performance with arbitrary percentage of lost video frames and provide support for protocols that use scalable video coding techniques. We also find that implementation of admission control does not improve performance and may discourage new peers if waiting times for joining the system increase. 相似文献

16.

A model of a time sharing virtual memory system solved using equivalence and decomposition methods

A. Brandwajn 《Acta Informatica》1974,4(1):11-47

Summary A queueing network model of a time-sharing multiprogramming virtual memory system is presented including the effect of memory sharing among processes. An approximate explicit solution is obtained using equivalence and decomposition methods.The influence of system and program behaviour parameters on system performance (mean response time and CPU utilization) is illustrated in the results obtained. The efficiency of controlling the degree of multiprogramming in order to prevent thrashing is also studied using a similar model. 相似文献

17.

On the accuracy of two analytical models for evaluating the performance of Gigabit Ethernet hosts

Khaled Salah 《Information Sciences》2006,176(24):3735-3756

In this paper we develop and assess the accuracy of two analytical models that capture the behavior of network hosts when subjected to heavy load such as that of Gigabit Ethernet. The first analytical model is based on Markov processes and queuing theory, and the second is a pure Markov process. In order to validate the models and assess their accuracy, two different numerical examples are presented. The two numerical examples use system parameters that are realistic and appropriate for modern hardware. Both analytical models give closed-form solutions that facilitate the study of a number of important system performance metrics. These metrics include throughput, latency, stability condition, CPU utilizations of interrupt handling and protocol processing, and CPU availability for user applications. The two models give mathematically equivalent closed-form solutions for all metrics except for latency. To address latency, we compare the results of both models with the results of a discrete-event simulation. The latency accuracy of the two models is assessed relative to simulation in terms of differences and percentage errors. The paper shows that the second model is more accurate. 相似文献

18.

Approximation methods for networks of queues with priorities

J.S. Kaufman 《Performance Evaluation》1984,4(3):183-198

Queuing network models are commonly used to analyze the performance of computer system. Unfortunately, the class of queuing network models which can be exactly analyzed excludes CPU priority scheduling disciplines, conspicuously present in most computer systems.A popular approximation technique which we denote the reduced occupancy approximation, is often used to analyze such priority service disciplines because of its simplicity and intuitive appeal. However, despite its widespread use, questions about its accuracy and applicability have received very little attention. Further compounding this situation, is the existence of proprietary software packages which purport to analyze such priority disciplines, but which in fact exhibit behavior remarkably similar to the roa.In this paper we show where, and more importantly why, the roa fails. This understanding leads to a significantly improved approximation technique which sacrifices neither simplicity nor applicability. Although our primary focus is on a two class preemptive priority closed network structure, the basic idea is quite general and extensions to multiclass and nonpreemptive priority structures are indicated. 相似文献

19.

A queueing network model for a distributed database testbed system

Jenq B.-C. Kohler W.H. Towsley D. 《IEEE transactions on pattern analysis and machine intelligence》1988,14(7):908-921

A queuing network model for analyzing the performance of a distributed database testbed system with a transaction workload is developed. The model includes the effects of the concurrency control protocol (two-phase locking with distributed deadlock detection), the transaction recovery protocol (write-ahead logging of before-images), and the commit protocol (centralized two-phase commit) used in the testbed system. The queuing model differs from previous analytical models in three major aspects. First, it is a model for a distributed transaction processing system. Second, it is more general and integrated than previous analytical models. Finally, it reflects a functioning distributed database testbed system and is validated against performance measurements 相似文献

20.

The people's time sharing system

Andrew S. Tanenbaum William H. Benson 《Software》1973,3(2):109-119

A set of programs running under a multiprogramming batch operating system on the CDC 6600 which provide remote users with a time sharing service is described. The basis for the system is the ability of a user program to create job control statements during execution, thereby tricking the operating system into treating it as an ordinary batch job. The text editor and the interactive debugging facilities are described. The performance of the system, known as the People's Time Sharing System (PTSS), and user reaction to it are also described. 相似文献