首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
2.
3.
A virtualized system generally suffers from low I/O performance, mainly caused by its inherent abstraction overhead and frequent CPU transitions between the guest and hypervisor modes. The recent research of polling-based I/O virtualization partly solved the problem, but excessive polling trades intensive CPU usage for higher performance. This article presents a power-efficient and high-performance block I/O framework for a virtual machine, which allows us to use it even with a limited number of CPU cores in mobile or embedded systems. Our framework monitors system status, and dynamically switches the I/O process mode between the exit and polling modes, depending on the amounts of current I/O requests and CPU utilization. It also dynamically controls the polling interval to reduce redundant polling. The highly dynamic nature of our framework leads to improvements in I/O performance with lower CPU usage as well. Our experiments showed that our framework outperformed the existing exit-based mechanisms by 10.8 % higher I/O throughput, maintaining similar CPU usage by only 3.1 % increment. In comparison to the systems solely based on the polling mechanism, ours reduced the CPU usage roughly down to 10.0 % with no or negligible performance loss.  相似文献   

4.
Decision diagrams, such as binary decision diagrams, multi-terminal binary decision diagrams and multi-valued decision diagrams, play an important role in various fields. They are especially useful to represent the characteristic function of sets of states and transitions in symbolic model checking. Most implementations of decision diagrams do not parallelize the decision diagram operations. As performance gains in the current era now mostly come from parallel processing, an ongoing challenge is to develop datastructures and algorithms for modern multi-core architectures. The decision diagram package Sylvan provides a contribution by implementing parallelized decision diagram operations and thus allowing sequential algorithms that use decision diagrams to exploit the power of multi-core machines. This paper discusses the design and implementation of Sylvan, especially an improvement to the lock-free unique table that uses bit arrays, the concurrent operation cache and the implementation of parallel garbage collection. We extend Sylvan with multi-terminal binary decision diagrams for integers, real numbers and rational numbers. This extension also allows for custom MTBDD leaves and operations and we provide an example implementation of GMP rational numbers. Furthermore, we show how the provided framework can be integrated in existing tools to provide out-of-the-box parallel BDD algorithms, as well as support for the parallelization of higher-level algorithms. As a case study, we parallelize on-the-fly symbolic reachability in the model checking toolset LTSmin. We experimentally demonstrate that the parallelization of symbolic model checking for explicit-state modeling languages, as supported by LTSmin, scales well. We also show that improvements in the design of the unique table result in faster execution of on-the-fly symbolic reachability.  相似文献   

5.
云计算产业的快速发展使得虚拟化技术在各大云服务商心目中占据重要地位。为了获取更高的利润,云服务商需要在保障用户体验的前提下尽可能地利用设备性能。通过利用I/O请求的优先级和重要性等信息,研究者们已经在Linux内核中实现了很多提高程序性能的方法。然而,虚拟机中的这些信息在传递到宿主机的过程中会丢失,所以 提出了一种基于服务水平目标SLO的I/O保障框架。首先分析了I/O请求优先级等信息丢失的原因,并提出了传递这些信息需要解决的关键性问题。在此基础上,本文提出的框架通过对Linux内核、virtio协议以及KVM的I/O虚拟化程序QEMU进行扩展,成功地将虚拟机线程的SLO信息传送至宿主机并在此基础上实现了基于SLO信息的调度器。最后,通过实验验证了框架的可行性,优先级最高的线程吞吐量可以达到260 KB/s,优先级最低的线程吞吐量只有10 KB/s,成功证明了由框架传递下来的SLO信息对宿主机中调度器的调度起到了积极作用。  相似文献   

6.
Existing techniques for allocating processors in parallel and distributed systems are not suitable for use in large distributed systems. In such systems, dedicated multiprocessors should exist as an integral component of the distributed system, and idle processors should be available to applications that need them. The Prospero Resource Manager (PRM) is a scalable resource allocation system that supports the allocation of processing resources in large networks and on multiprocessor systems. PRM employs three types of managers-the job manager, the system manager and the node manager-to manage resources in a distributed system. Multiple independent instances of each type of manager exist, reducing bottlenecks. When making scheduling decisions each manager utilizes information most closely associated with the entities for which it is responsible.  相似文献   

7.
Paradigm (parallel distributed global memory), a shared-memory multicomputer architecture that is being developed to show that one can build a large-scale machine using high-performance microprocessors, is discussed. The Paradigm architecture allows a parallel application program to execute any of its tasks on any processor in the machine, with all the tasks in a single address space. The focus is on novel design techniques that support scalability. The key performance issues are identified, and some results to date from this work and experience with the VMP architecture design on which it is based are summarized  相似文献   

8.
This paper presents SCALE, a fully automated transactional clustering framework. The SCALE design highlights three unique features. First, we introduce the concept of Weighted Coverage Density as a categorical similarity measure for efficient clustering of transactional datasets. The concept of weighted coverage density is intuitive and it allows the weight of each item in a cluster to be changed dynamically according to the occurrences of items. Second, we develop the weighted coverage density measure based clustering algorithm, a fast, memory-efficient, and scalable clustering algorithm for analyzing transactional data. Third, we introduce two clustering validation metrics and show that these domain specific clustering evaluation metrics are critical to capture the transactional semantics in clustering analysis. Our SCALE framework combines the weighted coverage density measure for clustering over a sample dataset with self-configuring methods. These self-configuring methods can automatically tune the two important parameters of our clustering algorithms: (1) the candidates of the best number K of clusters; and (2) the application of two domain-specific cluster validity measures to find the best result from the set of clustering results. We have conducted extensive experimental evaluation using both synthetic and real datasets and our results show that the weighted coverage density approach powered by the SCALE framework can efficiently generate high quality clustering results in a fully automated manner.  相似文献   

9.
10.
Two possible architectural reference models are described for a network based on the fast packet switching concept. This is known as an asynchronous time division network. The first model is based on the outband principle, since signalling and data information are transmitted over different logical channels. The second model provides a complete integration at all levels of signalling and data information. It is therefore called the inband model. A comparison of both the inband and outband reference models is also given.  相似文献   

11.
Realtime applications of any microprocessor necessitate interfacing to a large variety of peripheral devices. Various interfacing techniques are discussed. Examples are given in which Intel's 8085 is taken as the typical microprocessor. The I/O transfers considered fall into two categories: memory-mapped transfers and I/O-mapped transfers. Both synchronous and asynchronous types are dealt with. Bit masking and interrupt techniques were used for asynchronous memory-mapped I/O transfer.Also included are multiplexed channel transfers and interrupt transfers. The former are treated as a special class of I/O transfer. The latter are useful in applications where it cannot be predicted when data will arrive for transfer to the microprocessor. Unlike other types of transfer, interrupt transfers are initiated by the I/O devices and not by the microprocessor. They are subdivided into software- and hardware-polled transfers. Examples are given of daisychain and search ring transfers.  相似文献   

12.
Most of the high-performance routers available commercially these days equip each of their line cards (LCs) with a forwarding engine (FE) to perform table lookups locally. This work introduces and evaluates a technique for speedy packet lookups, called SPAL, in such routers. The BGP routing table under SPAL is fragmented into subsets which constitute forwarding tables for different FEs so that the number of table entries in each FE drops as the router grows. This reduction in the forwarding table size drastically lowers the amount of SRAM (e.g., L3 data cache) required in each LC to hold the trie constructed according to the prefix matching algorithm. SPAL calls for caching the lookup result of a given IP address at its home LC (denoted by LC/sub ho/, using the LR-cache), such that the result can satisfy the lookup requests for the same address from not only LC/sub ho/, but also other LCs quickly. Our trace-driven simulation reveals that SPAL leads to improved mean lookup performance by a factor of at least 2.5 (or 4.3) for a router with three (or 16) LCs, if the LR-cache contains 4K blocks. SPAL achieves this significant improvement, while greatly lowering the SRAM (i.e., the L3 data cache plus the LR-cache combined) requirement in each LC and possibly shortening the worst-case lookup time (thanks to fewer memory accesses during longest-prefix matching search) when compared with a current router without partitioning the routing table. It promises good scalability (with respect to routing table growth) and exhibits a small mean lookup time per packet. With its ability to speed up packet lookup performance while lowering overall SRAM substantially, SPAL is ideally applicable to the new generation of scalable high-performance routers.  相似文献   

13.
Real-time systems need time-predictable platforms to allow static analysis of the worst-case execution time (WCET). Standard multi-core processors are optimized for the average case and are hardly analyzable. Within the T-CREST project we propose novel solutions for time-predictable multi-core architectures that are optimized for the WCET instead of the average-case execution time. The resulting time-predictable resources (processors, interconnect, memory arbiter, and memory controller) and tools (compiler, WCET analysis) are designed to ease WCET analysis and to optimize WCET performance. Compared to other processors the WCET performance is outstanding.The T-CREST platform is evaluated with two industrial use cases. An application from the avionic domain demonstrates that tasks executing on different cores do not interfere with respect to their WCET. A signal processing application from the railway domain shows that the WCET can be reduced for computation-intensive tasks when distributing the tasks on several cores and using the network-on-chip for communication. With three cores the WCET is improved by a factor of 1.8 and with 15 cores by a factor of 5.7.The T-CREST project is the result of a collaborative research and development project executed by eight partners from academia and industry. The European Commission funded T-CREST.  相似文献   

14.
This paper investigates the problem of networked control system for nonlinear robotic manipulators under time delays and packet loss by using passivity technique. With the utilisation of wave variables and a passive remote controller, the networked robotic system is demonstrated to be stable with guaranteed position regulation. For the input/output signals of robotic systems, a discretisation block is exploited to convert continuous-time signals to discrete-time signals, and vice versa. Subsequently, we propose a packet management, called wave-variable modulation, to cope with the proposed networked robotic system under time delays and packet losses. Numerical examples and experimental results are presented to demonstrate the performance of the proposed wave-variable-based networked robotic systems.  相似文献   

15.
16.
Unstructured meshes are used in many engineering applications with irregular domains, from elastic deformation problems to crack propagation to fluid flow. Because of their complexity and dynamic behavior, the development of scalable parallel software for these applications is challenging. The Charm++ Parallel Framework for Unstructured Meshes allows one to write parallel programs that operate on unstructured meshes with only minimal knowledge of parallel computing, while making it possible to achieve excellent scalability even for complex applications. Charm++’s message-driven model enables computation/communication overlap, while its run-time load balancing capabilities make it possible to react to the changes in computational load that occur in dynamic physics applications. The framework is highly flexible and has been enhanced with numerous capabilities for the manipulation of unstructured meshes, such as parallel mesh adaptivity and collision detection.  相似文献   

17.
In petascale systems with a million CPU cores, scalable and consistent I/O performance is becoming increasingly difficult to sustain mainly because of I/O variability. The I/O variability is caused by concurrently running processes/jobs competing for I/O or a RAID rebuild when a disk drive fails. We present a mechanism that stripes across a selected subset of I/O nodes with the lightest workload at runtime to achieve the highest I/O bandwidth available in the system. In this paper, we propose a probing mechanism to enable application-level dynamic file striping to mitigate I/O variability. We implement the proposed mechanism in the high-level I/O library that enables memory-to-file data layout transformation and allows transparent file partitioning using subfiling. Subfiling is a technique that partitions data into a set of files of smaller size and manages file access to them, making data to be treated as a single, normal file to users. We demonstrate that our bandwidth probing mechanism can successfully identify temporally slower I/O nodes without noticeable runtime overhead. Experimental results on NERSC’s systems also show that our approach isolates I/O variability effectively on shared systems and improves overall collective I/O performance with less variation.  相似文献   

18.
19.
A fast stylized framework used for creating illustration of a given story is represented. The framework automatically searches proper online images according to the keywords of story, provides users some tools to make the search result precise, and helps users create a picture in every scene without considering the consistency of each character. With the friendly user interface, it provides user abundant interactions, which helps users express their ideas flexibly. Experimental results indicate that this framework allows users, without any art background, to produce personalized picture series with specified story. The fast process, effective interaction and generated delicate pictures can make a story more impressive.  相似文献   

20.
This note puts forward a parametrization of all stabilizing two-degrees-of-freedom controllers for (possibly unstable) processes with dead-time. The proposed parametrization is based on a doubly coprime factorization of the plant and takes the form of a generalized Smith predictor (dead-time compensator) feedback part and a finite-dimensional feedforward part (prefilter). Some alternative dead-time compensation schemes and disturbance attenuation limitations are also discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号