共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
A virtualized system generally suffers from low I/O performance, mainly caused by its inherent abstraction overhead and frequent CPU transitions between the guest and hypervisor modes. The recent research of polling-based I/O virtualization partly solved the problem, but excessive polling trades intensive CPU usage for higher performance. This article presents a power-efficient and high-performance block I/O framework for a virtual machine, which allows us to use it even with a limited number of CPU cores in mobile or embedded systems. Our framework monitors system status, and dynamically switches the I/O process mode between the exit and polling modes, depending on the amounts of current I/O requests and CPU utilization. It also dynamically controls the polling interval to reduce redundant polling. The highly dynamic nature of our framework leads to improvements in I/O performance with lower CPU usage as well. Our experiments showed that our framework outperformed the existing exit-based mechanisms by 10.8 % higher I/O throughput, maintaining similar CPU usage by only 3.1 % increment. In comparison to the systems solely based on the polling mechanism, ours reduced the CPU usage roughly down to 10.0 % with no or negligible performance loss. 相似文献
3.
Tom van Dijk Jaco van de Pol 《International Journal on Software Tools for Technology Transfer (STTT)》2017,19(6):675-696
Decision diagrams, such as binary decision diagrams, multi-terminal binary decision diagrams and multi-valued decision diagrams, play an important role in various fields. They are especially useful to represent the characteristic function of sets of states and transitions in symbolic model checking. Most implementations of decision diagrams do not parallelize the decision diagram operations. As performance gains in the current era now mostly come from parallel processing, an ongoing challenge is to develop datastructures and algorithms for modern multi-core architectures. The decision diagram package Sylvan provides a contribution by implementing parallelized decision diagram operations and thus allowing sequential algorithms that use decision diagrams to exploit the power of multi-core machines. This paper discusses the design and implementation of Sylvan, especially an improvement to the lock-free unique table that uses bit arrays, the concurrent operation cache and the implementation of parallel garbage collection. We extend Sylvan with multi-terminal binary decision diagrams for integers, real numbers and rational numbers. This extension also allows for custom MTBDD leaves and operations and we provide an example implementation of GMP rational numbers. Furthermore, we show how the provided framework can be integrated in existing tools to provide out-of-the-box parallel BDD algorithms, as well as support for the parallelization of higher-level algorithms. As a case study, we parallelize on-the-fly symbolic reachability in the model checking toolset LTSmin. We experimentally demonstrate that the parallelization of symbolic model checking for explicit-state modeling languages, as supported by LTSmin, scales well. We also show that improvements in the design of the unique table result in faster execution of on-the-fly symbolic reachability. 相似文献
4.
Existing techniques for allocating processors in parallel and distributed systems are not suitable for use in large distributed systems. In such systems, dedicated multiprocessors should exist as an integral component of the distributed system, and idle processors should be available to applications that need them. The Prospero Resource Manager (PRM) is a scalable resource allocation system that supports the allocation of processing resources in large networks and on multiprocessor systems. PRM employs three types of managers-the job manager, the system manager and the node manager-to manage resources in a distributed system. Multiple independent instances of each type of manager exist, reducing bottlenecks. When making scheduling decisions each manager utilizes information most closely associated with the entities for which it is responsible. 相似文献
5.
Paradigm (parallel distributed global memory), a shared-memory multicomputer architecture that is being developed to show that one can build a large-scale machine using high-performance microprocessors, is discussed. The Paradigm architecture allows a parallel application program to execute any of its tasks on any processor in the machine, with all the tasks in a single address space. The focus is on novel design techniques that support scalability. The key performance issues are identified, and some results to date from this work and experience with the VMP architecture design on which it is based are summarized 相似文献
6.
This paper presents SCALE, a fully automated transactional clustering framework. The SCALE design highlights three unique
features. First, we introduce the concept of Weighted Coverage Density as a categorical similarity measure for efficient clustering
of transactional datasets. The concept of weighted coverage density is intuitive and it allows the weight of each item in
a cluster to be changed dynamically according to the occurrences of items. Second, we develop the weighted coverage density
measure based clustering algorithm, a fast, memory-efficient, and scalable clustering algorithm for analyzing transactional
data. Third, we introduce two clustering validation metrics and show that these domain specific clustering evaluation metrics
are critical to capture the transactional semantics in clustering analysis. Our SCALE framework combines the weighted coverage
density measure for clustering over a sample dataset with self-configuring methods. These self-configuring methods can automatically
tune the two important parameters of our clustering algorithms: (1) the candidates of the best number K of clusters; and (2) the application of two domain-specific cluster validity measures to find the best result from the set
of clustering results. We have conducted extensive experimental evaluation using both synthetic and real datasets and our
results show that the weighted coverage density approach powered by the SCALE framework can efficiently generate high quality
clustering results in a fully automated manner. 相似文献
7.
8.
Martin De Prycker 《Computer Communications》1986,9(6):299-302
Two possible architectural reference models are described for a network based on the fast packet switching concept. This is known as an asynchronous time division network. The first model is based on the outband principle, since signalling and data information are transmitted over different logical channels. The second model provides a complete integration at all levels of signalling and data information. It is therefore called the inband model. A comparison of both the inband and outband reference models is also given. 相似文献
9.
Most of the high-performance routers available commercially these days equip each of their line cards (LCs) with a forwarding engine (FE) to perform table lookups locally. This work introduces and evaluates a technique for speedy packet lookups, called SPAL, in such routers. The BGP routing table under SPAL is fragmented into subsets which constitute forwarding tables for different FEs so that the number of table entries in each FE drops as the router grows. This reduction in the forwarding table size drastically lowers the amount of SRAM (e.g., L3 data cache) required in each LC to hold the trie constructed according to the prefix matching algorithm. SPAL calls for caching the lookup result of a given IP address at its home LC (denoted by LC/sub ho/, using the LR-cache), such that the result can satisfy the lookup requests for the same address from not only LC/sub ho/, but also other LCs quickly. Our trace-driven simulation reveals that SPAL leads to improved mean lookup performance by a factor of at least 2.5 (or 4.3) for a router with three (or 16) LCs, if the LR-cache contains 4K blocks. SPAL achieves this significant improvement, while greatly lowering the SRAM (i.e., the L3 data cache plus the LR-cache combined) requirement in each LC and possibly shortening the worst-case lookup time (thanks to fewer memory accesses during longest-prefix matching search) when compared with a current router without partitioning the routing table. It promises good scalability (with respect to routing table growth) and exhibits a small mean lookup time per packet. With its ability to speed up packet lookup performance while lowering overall SRAM substantially, SPAL is ideally applicable to the new generation of scalable high-performance routers. 相似文献
10.
11.
V.L. Patil 《Microprocessors and Microsystems》1984,8(2):86-93
Realtime applications of any microprocessor necessitate interfacing to a large variety of peripheral devices. Various interfacing techniques are discussed. Examples are given in which Intel's 8085 is taken as the typical microprocessor. The I/O transfers considered fall into two categories: memory-mapped transfers and I/O-mapped transfers. Both synchronous and asynchronous types are dealt with. Bit masking and interrupt techniques were used for asynchronous memory-mapped I/O transfer.Also included are multiplexed channel transfers and interrupt transfers. The former are treated as a special class of I/O transfer. The latter are useful in applications where it cannot be predicted when data will arrive for transfer to the microprocessor. Unlike other types of transfer, interrupt transfers are initiated by the I/O devices and not by the microprocessor. They are subdivided into software- and hardware-polled transfers. Examples are given of daisychain and search ring transfers. 相似文献
12.
Seng-Ming Puah 《International journal of systems science》2017,48(7):1472-1484
This paper investigates the problem of networked control system for nonlinear robotic manipulators under time delays and packet loss by using passivity technique. With the utilisation of wave variables and a passive remote controller, the networked robotic system is demonstrated to be stable with guaranteed position regulation. For the input/output signals of robotic systems, a discretisation block is exploited to convert continuous-time signals to discrete-time signals, and vice versa. Subsequently, we propose a packet management, called wave-variable modulation, to cope with the proposed networked robotic system under time delays and packet losses. Numerical examples and experimental results are presented to demonstrate the performance of the proposed wave-variable-based networked robotic systems. 相似文献
13.
ParFUM: a parallel framework for unstructured meshes for scalable dynamic physics applications 总被引:1,自引:0,他引:1
Orion S. Lawlor Sayantan Chakravorty Terry L. Wilmarth Nilesh Choudhury Isaac Dooley Gengbin Zheng Laxmikant V. Kalé 《Engineering with Computers》2006,22(3-4):215-235
Unstructured meshes are used in many engineering applications with irregular domains, from elastic deformation problems to crack propagation to fluid flow. Because of their complexity and dynamic behavior, the development of scalable parallel software for these applications is challenging. The Charm++ Parallel Framework for Unstructured Meshes allows one to write parallel programs that operate on unstructured meshes with only minimal knowledge of parallel computing, while making it possible to achieve excellent scalability even for complex applications. Charm++’s message-driven model enables computation/communication overlap, while its run-time load balancing capabilities make it possible to react to the changes in computational load that occur in dynamic physics applications. The framework is highly flexible and has been enhanced with numerous capabilities for the manipulation of unstructured meshes, such as parallel mesh adaptivity and collision detection. 相似文献
14.
A fast stylized framework used for creating illustration of a given story is represented. The framework automatically searches proper online images according to the keywords of story, provides users some tools to make the search result precise, and helps users create a picture in every scene without considering the consistency of each character. With the friendly user interface, it provides user abundant interactions, which helps users express their ideas flexibly. Experimental results indicate that this framework allows users, without any art background, to produce personalized picture series with specified story. The fast process, effective interaction and generated delicate pictures can make a story more impressive. 相似文献
15.
Seung Woo Son Saba Sehrish Wei-keng Liao Ron Oldfield Alok Choudhary 《The Journal of supercomputing》2017,73(5):2069-2097
In petascale systems with a million CPU cores, scalable and consistent I/O performance is becoming increasingly difficult to sustain mainly because of I/O variability. The I/O variability is caused by concurrently running processes/jobs competing for I/O or a RAID rebuild when a disk drive fails. We present a mechanism that stripes across a selected subset of I/O nodes with the lightest workload at runtime to achieve the highest I/O bandwidth available in the system. In this paper, we propose a probing mechanism to enable application-level dynamic file striping to mitigate I/O variability. We implement the proposed mechanism in the high-level I/O library that enables memory-to-file data layout transformation and allows transparent file partitioning using subfiling. Subfiling is a technique that partitions data into a set of files of smaller size and manages file access to them, making data to be treated as a single, normal file to users. We demonstrate that our bandwidth probing mechanism can successfully identify temporally slower I/O nodes without noticeable runtime overhead. Experimental results on NERSC’s systems also show that our approach isolates I/O variability effectively on shared systems and improves overall collective I/O performance with less variation. 相似文献
16.
This note puts forward a parametrization of all stabilizing two-degrees-of-freedom controllers for (possibly unstable) processes with dead-time. The proposed parametrization is based on a doubly coprime factorization of the plant and takes the form of a generalized Smith predictor (dead-time compensator) feedback part and a finite-dimensional feedforward part (prefilter). Some alternative dead-time compensation schemes and disturbance attenuation limitations are also discussed. 相似文献
17.
In this paper, a simulation framework that enables distributed numerical computing in multi-core shared-memory environments is presented. Using multiple threads allows a single memory image to be shared concurrently across cores but potentially introduces race conditions. Race conditions can be avoided by ensuring each core operates on an isolated memory block. This is usually achieved by running a different operating system process on each core, such as multiple MPI processes. However, we show that in many computational physics problems, memory isolation can also be enforced within a single process by leveraging spatial sub-division of the physical domain. A new spatial sub-division algorithm is presented that ensures threads operate on different memory blocks, allowing for in-place updates of state, with no message passing or creation of local variables during time stepping. Additionally, the developed framework controls task distribution dynamically ensuring an events based load balance. Results from fluid mechanics analysis using Smoothed Particle Hydrodynamics (SPH) are presented demonstrating linear performance with number of cores. 相似文献
18.
Daniël Reijsbergen Pieter-Tjerk de Boer Werner Scheinhardt Boudewijn Haverkort 《Performance Evaluation》2012,69(7-8):336-355
Probabilistic model checking has been used recently to assess, among others, dependability measures for a variety of systems. However, the numerical methods employed, such as those supported by model checking tools such as PRISM and MRMC, suffer from the state-space explosion problem. The main alternative is statistical model checking, which uses standard Monte Carlo simulation, but this performs poorly when small probabilities need to be estimated. Therefore, we propose a method based on importance sampling to speed up the simulation process in cases where the failure probabilities are small due to the high speed of the system’s repair units. This setting arises naturally in Markovian models of highly dependable systems. We show that our method compares favourably to standard simulation, to existing importance sampling techniques, and to the numerical techniques of PRISM. 相似文献
19.
Multimedia content adaption strategies are becoming increasingly important for effective video streaming over the actual heterogeneous networks. Thus, evaluation frameworks for adaptive video play an important role in the designing and deploying process of adaptive multimedia streaming systems. This paper describes a novel simulation framework for rate-adaptive video transmission using the Scalable Video Coding standard (H.264/SVC). Our approach uses feedback information about the available bandwidth to allow the video source to select the most suitable combination of SVC layers for the transmission of a video sequence. The proposed solution has been integrated into the network simulator NS-2 in order to support realistic network simulations. To demonstrate the usefulness of the proposed solution we perform a simulation study where a video sequence was transmitted over a three network scenarios. The experimental results show that the Adaptive SVC scheme implemented in our framework provides an efficient alternative that helps to avoid an increase in the network congestion in resource-constrained networks. Improvements in video quality, in terms of PSNR (Peak Signal to Noise Ratio) and SSIM (Structural Similarity Index) are also obtained. 相似文献
20.
Muthukumar R.M. Janakiram D. 《Parallel and Distributed Systems, IEEE Transactions on》2006,17(2):148-159
The current state-of-the-art generational garbage collector pauses all the program threads when it performs young and old generation garbage collection. As the number of program threads increases, the delay due to garbage collection also increases, thus restricting the scalability of the collector. In order to improve the scalability and reduce the pause time, an on-the-fly generational garbage collector called Yama is proposed for multiprocessor systems. This uses the on-the-fly deferred reference counting in the young generation and the DLG (Doligez Leroy Gonthier) on-the-fly mark and sweep garbage collector in the old generation. We have proposed and experimented with two novel variations of the on-the-fly deferred reference counting called Chitragupt1 and Chitragupt2 in the young generation. Yama does not pause all the application threads simultaneously. An adaptive tenuring policy based on object reference count and survival rate is also proposed. Yama has been implemented in the IBM Jikes RVM (research virtual machine). The above claims are supported with experimental results for standard benchmark programs. The results show that Yama has an extremely low pause time in both the young and the old generation. The pause time reduction results in better response times for the user programs. 相似文献