首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
One has a large workload that is “divisible”—its constituent work’s granularity can be adjusted arbitrarily—and one has access to p remote worker computers that can assist in computing the workload. How can one best utilize the workers? Complicating this question is the fact that each worker is subject to interruptions (of known likelihood) that kill all work in progress on it. One wishes to orchestrate sharing the workload with the workers in a way that maximizes the expected amount of work completed. Strategies are presented for achieving this goal, by balancing the desire to checkpoint often—thereby decreasing the amount of vulnerable work at any point—vs. the desire to avoid the context-switching required to checkpoint. Schedules must also temper the desire to replicate work, because such replication diminishes the effective remote workforce. The current study demonstrates the accessibility of strategies that provably maximize the expected amount of work when there is only one worker (the case p=1) and, at least in an asymptotic sense, when there are two workers (the case p=2); but the study strongly suggests the intractability of exact maximization for p≥2 computers, as work replication on multiple workers joins checkpointing as a vehicle for decreasing the impact of work-killing interruptions. We respond to that challenge by developing efficient heuristics that employ both checkpointing and work replication as mechanisms for decreasing the impact of work-killing interruptions. The quality of these heuristics, in expected amount of work completed, is assessed through exhaustive simulations that use both idealized models and actual trace data.  相似文献   

2.
We show how the quantum paradigm can be used to speed up unsupervised learning algorithms. More precisely, we explain how it is possible to accelerate learning algorithms by quantizing some of their subroutines. Quantization refers to the process that partially or totally converts a classical algorithm to its quantum counterpart in order to improve performance. In particular, we give quantized versions of clustering via minimum spanning tree, divisive clustering and k-medians that are faster than their classical analogues. We also describe a distributed version of k-medians that allows the participants to save on the global communication cost of the protocol compared to the classical version. Finally, we design quantum algorithms for the construction of a neighbourhood graph, outlier detection as well as smart initialization of the cluster centres.  相似文献   

3.
This work presents an evolutionary approach to modify the voting system of the k-nearest neighbours (kNN) rule we called EvoNN. Our approach results in a real-valued vector which provides the optimal relative contribution of the k-nearest neighbours. We compare two possible versions of our algorithm. One of them (EvoNN1) introduces a constraint on the resulted real-valued vector where the greater value is assigned to the nearest neighbour. The second version (EvoNN2) does not include any particular constraint on the order of the weights. We compare both versions with classical kNN and 4 other weighted variants of the kNN on 48 datasets of the UCI repository. Results show that EvoNN1 outperforms EvoNN2 and statistically obtains better results than the rest of the compared methods.  相似文献   

4.
This paper proposes new parallel versions of some estimation of distribution algorithms (EDAs). Focus is on maintenance of the behavior of sequential EDAs that use probabilistic graphical models (Bayesian networks and Gaussian networks), implementing a master–slave workload distribution for the most computationally intensive phases: learning the probability distribution and, in one algorithm, “sampling and evaluation of individuals.” In discrete domains, we explain the parallelization of$ EBNA_ BIC$and$ EBNA_ PC$algorithms, while in continuous domains, the selected algorithms are$ EGNA_ BIC$and$ EGNA_ EE$. Implementation has been done using two APIs: message passing interface and POSIX threads. The parallel programs can run efficiently on a range of target parallel computers. Experiments to evaluate the programs in terms of speed up and efficiency have been carried out on a cluster of multiprocessors. Compared with the sequential versions, they show reasonable gains in terms of speed.  相似文献   

5.
Typical concurrency control protocols for atomic actions, such as two-phase locking, perform poorly for long read-only actions. We present four new concurrency control protocols that eliminate all interference between read-only actions and update actions, and thus offer significantly improved performance for read-only actions. The protocols work by maintaining multiple versions of the system state; read-only actions read old versions, while update actions manipulate the most recent version. We focus on the problem of managing the storage required for old versions in a distributed system. One of the protocols uses relatively little space, but has a potentially significant communication cost. The other protocols use more space, but may be cheaper in terms of communication.  相似文献   

6.
《Ergonomics》2012,55(4):514-519
Computer use and, more specifically, the administration of tests and materials online continue to proliferate. A number of subjective, self-report workload measures exist, but the National Aeronautics and Space Administration-Task Load Index (NASA-TLX) is probably the most well known and used. The aim of this paper is to consider the workload costs associated with the computer-based and paper versions of the NASA-TLX measure. It was found that there is a significant difference between the workload scores for the two media, with the computer version of the NASA-TLX incurring more workload. This has implications for the practical use of the NASA-TLX as well as for other computer-based workload measures.  相似文献   

7.
A self-analysis of the NASA-TLX workload measure   总被引:2,自引:0,他引:2  
Noyes JM  Bruneau DP 《Ergonomics》2007,50(4):514-519
Computer use and, more specifically, the administration of tests and materials online continue to proliferate. A number of subjective, self-report workload measures exist, but the National Aeronautics and Space Administration-Task Load Index (NASA-TLX) is probably the most well known and used. The aim of this paper is to consider the workload costs associated with the computer-based and paper versions of the NASA-TLX measure. It was found that there is a significant difference between the workload scores for the two media, with the computer version of the NASA-TLX incurring more workload. This has implications for the practical use of the NASA-TLX as well as for other computer-based workload measures.  相似文献   

8.
In the era of Big Data, huge amounts of structured and unstructured data are being produced daily by a myriad of ubiquitous sources. Big Data is difficult to work with and requires massively parallel software running on a large number of computers. MapReduce is a recent programming model that simplifies writing distributed applications that handle Big Data. In order for MapReduce to work, it has to divide the workload among computers in a network. Consequently, the performance of MapReduce strongly depends on how evenly it distributes this workload. This can be a challenge, especially in the advent of data skew. In MapReduce, workload distribution depends on the algorithm that partitions the data. One way to avoid problems inherent from data skew is to use data sampling. How evenly the partitioner distributes the data depends on how large and representative the sample is and on how well the samples are analyzed by the partitioning mechanism. This paper proposes an improved partitioning algorithm that improves load balancing and memory consumption. This is done via an improved sampling algorithm and partitioner. To evaluate the proposed algorithm, its performance was compared against a state of the art partitioning mechanism employed by TeraSort. Experiments show that the proposed algorithm is faster, more memory efficient, and more accurate than the current implementation.  相似文献   

9.
Two versions of the HARTS operating system, which is based on Software Components Group's pSOS uniprocessor kernel, are presented. In one version, pSOS services are enhanced to provide interprocessor communication and a distributed naming service. In the second version, real-time fault-tolerant communication, including reliable broadcasting, clock synchronization, and group communication are added to the HARTS operating system. Three tools to evaluate the performance and fault tolerance dependability of HARTS hardware and software-a synthetic-workload generator, a monitor, and a fault injector-are described. The generator produces a synthetic workload, the monitor collects the performance data, and the fault injector simulates faulty behavior for further study. Together these tools create a facility that lets the user perform a wide range of experiments. The tools are independent, so they are equally effective separately or together, depending on the requirements  相似文献   

10.
《Computer Networks》2007,51(11):3069-3089
As a mechanism to efficiently support group communications, multicasting, faces a serious state scalability problem when there are large numbers of groups in the network. Recently, a novel solution called Aggregated Multicast has been proposed, in which multiple groups can share one delivery tree. A key problem in Aggregated Multicast is group-to-tree matching (i.e., assigning groups to proper trees). In this paper, we formally define this problem, and formulate two versions of the problem: static and dynamic. We analyze the static version and prove that it is NP-complete. To tackle this hard problem, we propose three algorithms: one optimal (using Linear Integer Programming, or ILP), one near-optimal (using Greedy method), and one Pseudo-Dynamic algorithm. For the dynamic version, we present a generic dynamic on-line algorithm. Simulation study has been conducted to evaluate the performance of the algorithms. Our results show that: (1) for the static problem, the Greedy algorithm is a feasible solution and its performance is very close to the optimal ILP solution, while the Pseudo-Dynamic algorithm is a good heuristic for many cases where Greedy does not work well; (2) for the dynamic problem, the generic dynamic on-line algorithm is a very practical solution with promising performance and reasonable computation requirement.  相似文献   

11.
The Internet connects millions of computers worldwide, and provides a new potential working environment for remote‐controlled telerobotic systems. The main limitation of using the Internet in this application is random delays between communicating nodes, which can cause disturbances in human–machine interaction and affect telepresence experiences. This is particularly important in systems integrating virtual reality technology to present interfaces. Telepresence, or the sense of presence in a remote environment, hypothetically is positively related to teleoperation task performance. This research evaluated the effect of constant and random network (communication) delays on remote‐controlled telerover performance, operator workload, and telepresence experiences. The research also assessed the effect of using a system gain adaptation algorithm to offset the negative impact of communication delays on the various response measures. It was expected that with gain adaptation, system stability, performance, and user telepresence experiences would improve with a corresponding decrease in workload. Results indicated that gain adaptation had a significant effect on the performance measures. The study demonstrated that gain adaptation could reduce deterioration in telepresence experiences and improve user performance in teleoperated and telerobotic control. © 2005 Wiley Periodicals, Inc. Hum Factors Man 15: 259–274, 2005.  相似文献   

12.
In this work, we present a novel and efficient information-processing way, multiparty-controlled joint remote state preparation (MCJRSP), to transmit quantum information from many senders to one distant receiver via the control of many agents in a network. We firstly put forward a scheme regarding MCJRSP for an arbitrary single-particle state via Greenberg–Horne–Zeilinger entangled states, and then extend to generalize an arbitrary two-particle state scenario. Notably, different from conventional joint remote state preparation, the desired states cannot be recovered but all of agents collaborate together. Besides, both successful probability and classical information cost are worked out, the relations between success probability and the employed entanglement are revealed, the case of many-particle states is generalized briefly, and the experimental feasibility of our schemes is analysed via an all-optical framework at last. And we argue that our proposal might be of importance to long-distance communication in prospective quantum networks.  相似文献   

13.
This paper presents resource management techniques for allocating communication and computational resources in a distributed stream processing platform. The platform is designed to exploit the synergy of two classes of network connections—dedicated and opportunistic. Previous studies we conducted have demonstrated the benefits of such bi-modal resource organization that combines small pools of dedicated computers with a very large pool of opportunistic computing capacities of idle computers to serve high throughput computing applications. This paper extends the idea of bi-modal resource organization into the management of communication resources. Since distributed stream processing applications demand large volume of data transmission between processing sites at a consistent rate, adequate control over the network resources is important to ensure a steady flow of processing. The system model used in this paper is a platform where stream processing servers at distributed sites are interconnected with a combination of dedicated and opportunistic communication links. Two pertinent resource allocation problems are analyzed in detail and solved using decentralized algorithms. One is mapping of the processing and the communication tasks of the stream processing workload on the processing and the communication resources of the platform. The other is the dynamic re-allocation of the communication links due to variations in the capacity of the opportunistic communication links. Overall optimization goal of the allocations is higher task throughput and better utilization of the expensive dedicated links without deviating much from the timely completion of the tasks. The algorithms are evaluated through extensive simulation with a model based on realistic observations. The results demonstrate that the algorithms are able to exploit the synergy of bi-modal communication links towards achieving the optimization goals.  相似文献   

14.
15.
Atomistic simulations of thin film deposition, based on the lattice Monte Carlo method, provide insights into the microstructure evolution at the atomic level. However, large-scale atomistic simulation is limited on a single computer—due to memory and speed constraints. Parallel computation, although promising in memory and speed, has not been widely applied in these simulations because of the intimidating overhead. The key issue in achieving optimal performance is, therefore, to reduce communication overhead among processors. In this paper, we propose a new parallel algorithm for the simulation of large-scale thin film deposition incorporating two optimization strategies: (1) domain decomposition with sub-domain overlapping and (2) asynchronous communication. This algorithm was implemented both on message-passing-processor systems (MPP) and on cluster computers. We found that both architectures are suitable for parallel Monte Carlo simulation of thin film deposition in either a distributed memory mode or a shared memory mode with message-passing libraries.  相似文献   

16.
Graphics processing units (GPUs) are being increasingly embraced by the high‐performance computing community as an effective way to reduce execution time by accelerating parts of their applications. remote CUDA (rCUDA) was recently introduced as a software solution to address the high acquisition costs and energy consumption of GPUs that constrain further adoption of this technology. Specifically, rCUDA is a middleware that allows a reduced number of GPUs to be transparently shared among the nodes in a cluster. Although the initial prototype versions of rCUDA demonstrated its functionality, they also revealed concerns with respect to usability, performance, and support for new CUDA features. In response, in this paper, we present a new rCUDA version that (1) improves usability by including a new component that allows an automatic transformation of any CUDA source code so that it conforms to the needs of the rCUDA framework, (2) consistently features low overhead when using remote GPUs thanks to an improved new communication architecture, and (3) supports multithreaded applications and CUDA libraries. As a result, for any CUDA‐compatible program, rCUDA now allows the use of remote GPUs within a cluster with low overhead, so that a single application running in one node can use all GPUs available across the cluster, thereby extending the single‐node capability of CUDA. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

17.
This paper describes the ARS library package, which supports two implementation versions of an object-based system: a shared-variable and a message-passing version. The two versions have the same object structure and synchronisation but differ in their process structure and inter-process communication models. Thus, the mechanisms related to the uniform features are common for the two versions, but the process multiplexing mechanisms differ. As a consequence, the performance characteristics of the two versions of a system related to uniform features are similar, while those related to the process multiplexing differ significantly. We present an overview of the computational model supported by the ARS package, the internal structure of the package and compare overheads in two versions of an object-based system supported by the package. © 1998 John Wiley & Sons, Ltd.  相似文献   

18.
《Information and Computation》2007,205(10):1491-1525
We develop the semantic theory of a foundational language for modelling applications over global computers whose interconnection structure can be explicitly manipulated. Together with process distribution, process mobility and remote asynchronous communication through distributed data repositories, the language has primitives for explicitly modelling inter-node connections and for dynamically activating and deactivating them. For the proposed language, we define natural notions of extensional observations and study their closure under operational reductions and/or language contexts to obtain barbed congruence and may testing equivalence. We then focus on barbed congruence and provide an alternative characterisation in terms of a labelled bisimulation. To test practical usability of the semantic theory, we model a system of communicating mobile devices and use the introduced proof techniques to verify one of its key properties.  相似文献   

19.
A key challenge when scheduling computations over the Internet is temporal unpredictability: remote “workers” arrive and depart at unpredictable times and often provide unpredictable computational resources; the time for communication over the Internet is impossible to predict accurately. In response, earlier research has developed the underpinnings of a theory of how to schedule computations having intertask dependencies in a way that renders tasks eligible for execution at the maximum possible rate. Simulation studies suggest that such scheduling: (a) utilizes resource providers’ computational resources well, by enhancing the likelihood of having work to allocate to an available client; (b) lessens the likelihood of a computation’s stalling for lack of tasks that are eligible for execution. The applicability of the current version of the theory is limited by its demands on the structure of the dag that models the computation being scheduled—namely, that the dag be decomposable into connected bipartite “building-block” dags. The current paper extends the theory by developing the Sweep Algorithm, which takes a significant step toward removing this restriction. The resulting augmented suite of scheduling algorithms allows one to craft optimal schedules for a large range of dags that the earlier framework could not handle. Most of the newly optimally scheduled dags presented here are artificial but “close” in structure to dags that arise in real computations; one of the new dags is a component of a large dag that arises in a functional Magnetic Resonance Imaging application.  相似文献   

20.
We address generalized versions of the Huffman and Alphabetic Tree Problem where the cost caused by each individual leaf i, instead of being linear, depends on its depth in the tree by an arbitrary function. The objective is to minimize either the total cost or the maximum cost among all leaves. We review and extend the known results in this direction and devise a number of new algorithms and hardness proofs. It turns out that the Dynamic Programming approach for the Alphabetic Tree Problem can be extended to arbitrary cost functions, resulting in a time O(n 4) optimal algorithm using space O(n 3). We identify classes of cost functions where the well-known trick to reduce the runtime by a factor of n via a “monotonicity” property can be applied. For the generalized Huffman Tree Problem we show that even the k-ary version can be solved by a generalized version of the Coin Collector Algorithm of Larmore and Hirschberg (in Proc. SODA’90, pp. 310–318, 1990) when the cost functions are nondecreasing and convex. Furthermore, we give an O(n 2logn) algorithm for the worst case minimization variants of both the Huffman and Alphabetic Tree Problem with nondecreasing cost functions. Investigating the limits of computational tractability, we show that the Huffman Tree Problem in its full generality is inapproximable unless P = NP, no matter if the objective function is the sum of leaf costs or their maximum. The alphabetic version becomes NP-hard when the leaf costs are interdependent.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号