共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Kun-Lung Wu Philip S. Yu Jen-Yao Chung James Z. Teng 《Distributed and Parallel Databases》2000,8(3):279-296
This paper studies workfile disk management for concurrent mergesorts ina multiprocessor database system. Specifically, we examine the impacts of workfile disk allocation and data striping on the average mergesort response time. Concurrent mergesorts in a multiprocessor system can creat severe I/O interference in which a large number of sequential write requests are continuously issued to the same workfile disk and block other read requests for a long period of time. We examine through detailed simulations a logical partitioning approach to workfile disk management and evaluate the effectiveness of datastriping. The results show that (1) without data striping, the best performance is achieved by using the entire workfile disks as a single partition if there are abundant workfile disks (or system workload is light); (2) however, if there are limited workfile disks (or system workload is heavy), the workfile disks should be partitioned into multiple groups and the optimal partition size is workload dependent; (3) data striping is beneficial only if the striping unit size is properly chosen; and (4) with a proper striping size, the best performance is generally achieved by using the entire disks as a single logical partition. 相似文献
3.
研究了并行存储预取优化算法,根据并行存储的主要访问模式,提出要同时对文件内数据块访问和文件间访问进行建模,并对文件内数据块访问和文件间访问建模分别提出了E_IS_PPM算法和Last_N_Successor算法。最后将两个算法结合起来,提出了文件预取综合算法,算法根据计算和存储的可重叠程度以及文件预取页面的可获得性,自适应地决定预取深度。 相似文献
4.
5.
6.
7.
8.
9.
不同的Cache预取策略适用于不同的存取模式。本文介绍了存储系统Cache预取技术的研究现状,从分析存取模式出发,构造了存取模式三元组模型,并在磁盘阵列上测试了适 用于复杂环境下的Cache预取自适应策略,结果证明,自适应策略能够在不同环境上获得磁盘阵列的最优性能。 相似文献
10.
结合访存失效队列状态的预取策略 总被引:1,自引:0,他引:1
随着存储系统的访问速度与处理器的运算速度的差距越来越显著,访存性能已成为提高计算机系统性能的瓶颈.通过对指令Cache和数据Cache失效行为的分析,提出一种预取策略--结合访存失效队列状态的预取策略.该预取策略保持了指令和数据访问的次序,有利于预取流的提取.并将指令流和数据流的预取相分离,避免相互替换.在预取发起时机的选择上,不但考虑当前总线是否空闲,而且结合访存失效队列的状态,减小对处理器正常访存请求的影响.通过流过滤机制提高预取准确性,降低预取对访存带宽的需求.结果表明,采用结合访存失效队列状态的预取策略,处理器的平均访存延时减少30%,SPEC CPU2000程序的IPC值平均提高8.3%. 相似文献
11.
主要研究XML文档的并行数据分片策略,以便能够并行处理XML查询.为了描述XML数据分片,提出了媒介节点的概念.一组媒介节点的集合可以将一棵XML数据树分割成一棵根树和一组子树的集合:根树将在所有站点中复制;而子树集合则可以根据用户查询的工作负载被均匀地分片到各个站点中.对于同一棵XML数据树,会有很多种媒介节点的集合;而不同的媒介节点集合会产生不同的数据分片结果.然后,依据各个数据分片中的用户查询工作量是否均衡,来衡量一个分片的好坏.选择一组最佳的媒介节点集合是一个NP-hard问题.为了解决此问题,设计了一组启发式优化规则.基于这一思想,提出并实现了一种基于媒介节点的XML数据分片算法WIN(workload-aware intermediary nodes data placement strategy).大量实验结果证明:WIN算法的性能要优于以往的并行XML数据分片策略. 相似文献
12.
13.
This paper presents a Page Rank based prefetching technique for accesses to Web page clusters. The approach uses the link structure of a requested page to determine the most important linked pages and to identify the page(s) to be prefetched. The underlying premise of our approach is that in the case of cluster accesses, the next pages requested by users of the Web server are typically based on the current and previous pages requested. Furthermore, if the requested pages have a lot of links to some important page, that page has a higher probability of being the next one requested. An experimental evaluation of the prefetching mechanism is presented using real server logs. The results show that the Page-Rank based scheme does better than random prefetching for clustered accesses, with hit rates of 90% in some cases. 相似文献
14.
Integrating Web Prefetching and Caching Using Prediction Models 总被引:2,自引:0,他引:2
Web caching and prefetching have been studied in the past separately. In this paper, we present an integrated architecture for Web object caching and prefetching. Our goal is to design a prefetching system that can work with an existing Web caching system in a seamless manner. In this integrated architecture, a certain amount of caching space is reserved for prefetching. To empower the prefetching engine, a Web-object prediction model is built by mining the frequent paths from past Web log data. We show that the integrated architecture improves the performance over Web caching alone, and present our analysis on the tradeoff between the reduced latency and the potential increase in network load. 相似文献
15.
《Journal of Parallel and Distributed Computing》1999,57(1):64-90
Simulated annealing based standard cell placement for VLSI designs has long been acknowledged as a computation-intensive process, and as a result, several research efforts have been undertaken to parallelize this algorithm. Parallel placement is most needed for very large circuits. Since these circuits do not fit in memory, the traditional approach has been to partition and place individual modules. This causes a degradation in placement quality in terms of area and wirelength. Our algorithm is circuit-partitioned and can handle arbitrarily large circuits on cluster-of-workstations-type parallel machines, such as the Intel Paragon and IBM SP-2. Most previous work in parallel placement has minimized just area and wirelength, but with current deep submicron designs, minimizing wirelength delay is most important. As a result the algorithm discussed in this paper also supports timing driven placement for partitioned circuits. The algorithm, calledmpiPLACE, has been tested on several large industry benchmarks on a variety of parallel architectures. 相似文献
16.
C++ was originally designed as a sequential programming language. For development of multithreaded applications, libraries, such as Pthreads, Windows threads, and Boost, are traditionally used. The C++11 standard introduced some basic concepts and means for developing parallel and concurrent programs, but the direct use of these low-level means requires high programming skills and significant efforts. The absence of high-level models of parallelism in C++ is somewhat compensated for by various parallel libraries and directive parallelization tools (such as OpenMP), as well as by language extensions supported by some compilers (Intel CilkPlus). Nevertheless, we still require more advanced means to express parallelism in programs at the level of language standard and language library. In this survey, we consider the means for parallel and concurrent programming that are included into the C++17 standard, as well as some capabilities that are to be expected in the future standards. 相似文献
17.
《Journal of Parallel and Distributed Computing》2000,60(5):585-615
In this paper we propose and evaluate a new data-prefetching technique for cache coherent multiprocessors. Prefetches are issued by a functional unit called a prefetch engine which is controlled by the compiler. We let second-level cache misses generate cache miss traps and start the prefetch engine in a trap handler. The trap handler is fast (40–50 cycles) and does not normally delay the program beyond the memory latency of the miss. Once started, the prefetch engine executes on its own and causes no instruction overhead. The only instruction overhead in our approach is when a trap handler completes after data arrives. The advantages of this technique are (1) it exploits static compiler analysis to determine what to prefetch, which is hard to do in hardware, (2) it uses prefetching with very little instruction overhead, which is a limitation for traditional software-controlled prefetching, and (3) it is accurate in the sense that it generates very little useless traffic while maintaining a high prefetching coverage. We also study whether one could emulate the prefetch engine in software, which would not require any additional hardware beyond support for generating cache miss traps and ordinary prefetch instructions. In this paper we present the functionality of the prefetch engine and a compiler algorithm to control it. We evaluate our technique on six parallel scientific and engineering applications using an optimizing compiler with our algorithm and a simulated multiprocessor. We find that the prefetch engine removes up to 67% of the memory access stall time at an instruction overhead less than 0.42%. The emulated prefetch engine removes in general less stall time at a higher instruction overhead. 相似文献
18.
Abstract. High performance applications involving large data sets require the efficient and flexible use of multiple disks. In an external
memory machine with D parallel, independent disks, only one block can be accessed on each disk in one I/ O step. This restriction leads to a load balancing problem that is perhaps the main inhibitor for the efficient adaptation
of single-disk external memory algorithms to multiple disks. We solve this problem for arbitrary access patterns by randomly
mapping blocks of a logical address space to the disks.
We show that a shared buffer of O
(D) blocks suffices to support efficient writing. The analysis uses the properties of negative association to handle dependencies
between the random variables involved. This approach might be of independent interest for probabilistic analysis in general.
If two randomly allocated copies of each block exist, N arbitrary blocks can be read within
I/ O steps with high probability. The redundancy can be further reduced from 2 to 1+1/r for any integer r without a big impact on reading efficiency. From the point of view of external memory models, these results rehabilitate
Aggarwal and Vitter's ``single-disk multi-head' model [1] that allows access to D arbitrary blocks in each I/ O step. This powerful model can be emulated on the physically more realistic independent disk model [2] with small constant
overhead factors. Parallel disk external memory algorithms can therefore be developed in the multi-head model first. The emulation
result can then be applied directly or further refinements can be added. 相似文献
19.
《Parallel and Distributed Systems, IEEE Transactions on》2008,19(12):1614-1627
In this paper, we discuss and compare several policies to place replicas in tree networks, subject to server capacity and Quality of Service (QoS) constraints. The client requests are known beforehand, while the number and location of the servers are to be determined. The standard approach in the literature is to enforce that all requests of a client be served by the closest server in the tree. We introduce and study two new policies. In the first policy, all requests from a given client are still processed by the same server, but this server can be located anywhere in the path from the client to the root. In the second policy, the requests of a given client can be processed by multiple servers. One major contribution of this paper is to assess the impact of these new policies on the total replication cost. Another important goal is to assess the impact of server heterogeneity. In this paper, we establish several new complexity results, and provide several efficient polynomial heuristics for NP-complete instances of the problem. The absolute performance of these heuristics is assessed by comparison with the optimal solution provided by the formulation of the problem in terms of the solution of an integer linear program. 相似文献
20.
魏琦 《数字社区&智能家居》2011,(7)
数据分布是并行数据库系统实现的基础,其方法的优劣,直接影响到并行数据库的运行效率。通过对一维、多维几种数据分布方法的分析、对比,阐述并行数据库数据分布策略及方向。 相似文献