首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The MapReduce framework has become the de facto standard for big data processing due to its attractive features and abilities. One is that it automatically parallelizes a job into multiple tasks and transparently handles task execution on a large cluster of commodity machines. The increasing heterogeneity of distributed environments may result in a few straggling tasks, which prolong job completion. Speculative execution is proposed to mitigate stragglers. However, the existing speculative execution mechanism could not work efficiently as many speculative tasks are still slower than their original tasks. In this paper, we explore an approach to increase the efficiency of speculative execution, and further improve MapReduce performance. We propose the Partial Speculative Execution (PSE) strategy to make speculative tasks start from the checkpoint. By leveraging the checkpoint of original tasks, PSE can eliminate the costs of re-reading, re-copying, and re-computing the processed data. We implement PSE in Hadoop, and evaluate its performance in terms of job completion time and the efficiency of speculative execution under several kinds of classical workloads. Experimental results show that, in heterogeneous environments with stragglers, PSE completes jobs 56 % faster than that with no speculation and 12 % faster than that with LATE, an improved speculative execution algorithm. In addition, on average PSE can improve the efficiency of speculative execution by 24 % compared to LATE.  相似文献   

2.
Hadoop MapReduce has evolved to an important industry standard for massive parallel data processing and has become widely adopted for a variety of use-cases. Recent works have shown that indexes can improve the performance of selective MapReduce jobs dramatically. However, one major weakness of existing approaches is high index creation costs. We present HAIL (Hadoop Aggressive Indexing Library), a novel indexing approach for HDFS and Hadoop MapReduce. HAIL creates different clustered indexes over terabytes of data with minimal, often invisible costs, and it dramatically improves runtimes of several classes of MapReduce jobs. HAIL features two different indexing pipelines, static indexing and adaptive indexing. HAIL static indexing efficiently indexes datasets while uploading them to HDFS. Thereby, HAIL leverages the default replication of Hadoop and enhances it with logical replication. This allows HAIL to create multiple clustered indexes for a dataset, e.g., one for each physical replica. Still, in terms of upload time, HAIL matches or even improves over the performance of standard HDFS. Additionally, HAIL adaptive indexing allows for automatic, incremental indexing at job runtime with minimal runtime overhead. For example, HAIL adaptive indexing can completely index a dataset as byproduct of only four MapReduce jobs while incurring an overhead as low as 11 % for the very first of those job only. In our experiments, we show that HAIL improves job runtimes by up to 68 $\times $ over Hadoop. This article is an extended version of the VLDB 2012 paper (Dittrich et al. in PVLDB 5(11):1591–1602, 2012).  相似文献   

3.
State-based formal methods [e.g. Event-B/RODIN (Abrial in Modeling in Event-B—system and software engineering. Cambridge University Press, Cambridge, 2010; Abrial et al. in Int J Softw Tools Technol Transf (STTT) 12(6):447–466, 2010)] for critical system development and verification are now well established, with track records including tool support and industrial applications. The focus of proof-based verification, in particular, is on safety properties. Liveness properties, which guarantee eventual, or converging computations of some requirements, are less well dealt with. Inductive reasoning about liveness is not explicitly supported. Liveness proofs are often complex and expensive, requiring high-skill levels on the part of the verification engineer. Fairness-based temporal logic approaches have been proposed to address this, e.g. TLA Lamport (ACM Trans Program Lang Syst 16(3):872–923, 1994) and that of Manna and Pnueli (Temporal verification of reactive systems—safety. Springer, New York, 1995). We contribute to this technology need by proposing a fairness-based method integrating temporal and first-order logic, proof and tools for modelling and verification of safety and liveness properties. The method is based on an integration of Event-B and TLA. Building on our previous work (Méry and Poppleton in Integrated formal methods, 10th international conference, IFM 2013, Turku, Finland, pp 208–222, 2013. doi: 10.1007/978-3-642-38613-8_15), we present the method via three example population protocols Angluin et al. (Distrib Comput 18(4):235–253, 2006). These were proposed as a theoretical framework for computability reasoning about Wireless Sensor Network and Mobile Ad-Hoc Network algorithms. Our examples present typical liveness and convergence requirements. We prove convergence results for the examples by integrated modelling and proof with Event-B/RODIN and TLA. We exploit existing proof rules, define and apply three new proof rules; soundness proofs are also provided. During the process we observe certain repeating patterns in the proofs. These are easily identified and reused because of the explicit nature of the reasoning.  相似文献   

4.
By multiple realignments of density matrices, we present a new separability criterion for the multipartite quantum state, which includes the computable cross-norm or realignment criterion and the multipartite partial realignment criterion as special cases. An example is used to show that the new criterion can be more efficient than the corresponding multipartite realignment criteria given in Horodecki et al. (Open Syst Inf Dyn 13:103–111, 2006) and Shen et al. (Phys Rev A 92:042332, 2015).  相似文献   

5.
6.
Intuitionistic fuzzy set is capable of handling uncertainty with counterpart falsities which exist in nature. Proximity measure is a convenient way to demonstrate impractical significance of values of memberships in the intuitionistic fuzzy set. However, the related works of Pappis (Fuzzy Sets Syst 39(1):111–115, 1991), Hong and Hwang (Fuzzy Sets Syst 66(3):383–386, 1994), Virant (2000) and Cai (IEEE Trans Fuzzy Syst 9(5):738–750, 2001) did not model the measure in the context of the intuitionistic fuzzy set but in the Zadeh’s fuzzy set instead. In this paper, we examine this problem and propose new notions of δ-equalities for the intuitionistic fuzzy set and δ-equalities for intuitionistic fuzzy relations. Two fuzzy sets are said to be δ-equal if they are equal to an extent of δ. The applications of δ-equalities are important to fuzzy statistics and fuzzy reasoning. Several characteristics of δ-equalities that were not discussed in the previous works are also investigated. We apply the δ-equalities to the application of medical diagnosis to investigate a patient’s diseases from symptoms. The idea is using δ-equalities for intuitionistic fuzzy relations to find groups of intuitionistic fuzzified set with certain equality or similar degrees then combining them. Numerical examples are given to illustrate validity of the proposed algorithm. Further, we conduct experiments on real medical datasets to check the efficiency and applicability on real-world problems. The results obtained are also better in comparison with 10 existing diagnosis methods namely De et al. (Fuzzy Sets Syst 117:209–213, 2001), Samuel and Balamurugan (Appl Math Sci 6(35):1741–1746, 2012), Szmidt and Kacprzyk (2004), Zhang et al. (Procedia Eng 29:4336–4342, 2012), Hung and Yang (Pattern Recogn Lett 25:1603–1611, 2004), Wang and Xin (Pattern Recogn Lett 26:2063–2069, 2005), Vlachos and Sergiadis (Pattern Recogn Lett 28(2):197–206, 2007), Zhang and Jiang (Inf Sci 178(6):4184–4191, 2008), Maheshwari and Srivastava (J Appl Anal Comput 6(3):772–789, 2016) and Support Vector Machine (SVM).  相似文献   

7.
In this paper we investigate the problem of partitioning an input string T in such a way that compressing individually its parts via a base-compressor C gets a compressed output that is shorter than applying C over the entire T at once. This problem was introduced in Buchsbaum et al. (Proc. of 11th ACM-SIAM Symposium on Discrete Algorithms, pp. 175–184, 2000; J. ACM 50(6):825–851, 2003) in the context of table compression, and then further elaborated and extended to strings and trees by Ferragina et al. (J. ACM 52:688–713, 2005; Proc. of 46th IEEE Symposium on Foundations of Computer Science, pp. 184–193, 2005) and Mäkinen and Navarro (Proc. of 14th Symposium on String Processing and Information Retrieval, pp. 229–241, 2007). Unfortunately, the literature offers poor solutions: namely, we know either a cubic-time algorithm for computing the optimal partition based on dynamic programming (Buchsbaum et al. in J. ACM 50(6):825–851, 2003; Giancarlo and Sciortino in Proc. of 14th Symposium on Combinatorial Pattern Matching, pp. 129–143, 2003), or few heuristics that do not guarantee any bounds on the efficacy of their computed partition (Buchsbaum et al. in Proc. of 11th ACM-SIAM Symposium on Discrete Algorithms, pp. 175–184, 2000; J. ACM 50(6):825–851, 2003), or algorithms that are efficient but work in some specific scenarios (such as the Burrows-Wheeler Transform, see e.g. Ferragina et al. in J. ACM 52:688–713, 2005; Mäkinen and Navarro in Proc. of 14th Symposium on String Processing and Information Retrieval, pp. 229–241, 2007) and achieve compression performance that might be worse than the optimal-partitioning by a Ω(log?n/log?log?n) factor. Therefore, computing efficiently the optimal solution is still open (Buchsbaum and Giancarlo in Encyclopedia of Algorithms, pp. 939–942, 2008). In this paper we provide the first algorithm which computes in O(nlog?1+ε n) time and O(n) space, a partition of T whose compressed output is guaranteed to be no more than (1+ε)-worse the optimal one, where ε may be any positive constant fixed in advance. This result holds for any base-compressor C whose compression performance can be bounded in terms of the zero-th or the k-th order empirical entropy of the text T. We will also discuss extensions of our results to BWT-based compressors and to the compression booster of Ferragina et al. (J. ACM 52:688–713, 2005).  相似文献   

8.
We use self-reduction methods to prove strong information lower bounds on two of the most studied functions in the communication complexity literature: Gap Hamming Distance (GHD) and Inner Product (IP). In our first result we affirm the conjecture that the information cost of GHD is linear even under the uniform distribution, which strengthens the Ω(n) bound recently shown by Kerenidis et al. (2012), and answers an open problem from Chakrabarti et al. (2012). In our second result we prove that the information cost of IPn is arbitrarily close to the trivial upper bound n as the permitted error tends to zero, again strengthening the Ω(n) lower bound recently proved by Braverman and Weinstein (Electronic Colloquium on Computational Complexity (ECCC) 18, 164 2011). Our proofs demonstrate that self-reducibility makes the connection between information complexity and communication complexity lower bounds a two-way connection. Whereas numerous results in the past (Chakrabarti et al. 2001; Bar-Yossef et al. J. Comput. Syst. Sci. 68(4), 702–732 2004; Barak et al. 2010) used information complexity techniques to derive new communication complexity lower bounds, we explore a generic way in which communication complexity lower bounds imply information complexity lower bounds in a black-box manner.  相似文献   

9.
As a widely-used parallel computing framework for big data processing today, the Hadoop MapReduce framework puts more emphasis on high-throughput of data than on low-latency of job execution. However, today more and more big data applications developed with MapReduce require quick response time. As a result, improving the performance of MapReduce jobs, especially for short jobs, is of great significance in practice and has attracted more and more attentions from both academia and industry. A lot of efforts have been made to improve the performance of Hadoop from job scheduling or job parameter optimization level. In this paper, we explore an approach to improve the performance of the Hadoop MapReduce framework by optimizing the job and task execution mechanism. First of all, by analyzing the job and task execution mechanism in MapReduce framework we reveal two critical limitations to job execution performance. Then we propose two major optimizations to the MapReduce job and task execution mechanisms: first, we optimize the setup and cleanup tasks of a MapReduce job to reduce the time cost during the initialization and termination stages of the job; second, instead of adopting the loose heartbeat-based communication mechanism to transmit all messages between the JobTracker and TaskTrackers, we introduce an instant messaging communication mechanism for accelerating performance-sensitive task scheduling and execution. Finally, we implement SHadoop, an optimized and fully compatible version of Hadoop that aims at shortening the execution time cost of MapReduce jobs, especially for short jobs. Experimental results show that compared to the standard Hadoop, SHadoop can achieve stable performance improvement by around 25% on average for comprehensive benchmarks without losing scalability and speedup. Our optimization work has passed a production-level test in Intel and has been integrated into the Intel Distributed Hadoop (IDH). To the best of our knowledge, this work is the first effort that explores on optimizing the execution mechanism inside map/reduce tasks of a job. The advantage is that it can complement job scheduling optimizations to further improve the job execution performance.  相似文献   

10.
A flow-shop batching problem with consistent batches is considered in which the processing times of all jobs on each machine are equal to p and all batch set-up times are equal to s. In such a problem, one has to partition the set of jobs into batches and to schedule the batches on each machine. The processing time of a batch B i is the sum of processing times of operations in B i and the earliest start of B i on a machine is the finishing time of B i on the previous machine plus the set-up time s. Cheng et al. (Naval Research Logistics 47:128–144, 2000) provided an O(n) pseudopolynomial-time algorithm for solving the special case of the problem with two machines. Mosheiov and Oron (European Journal of Operational Research 161:285–291, 2005) developed an algorithm of the same time complexity for the general case with more than two machines. Ng and Kovalyov (Journal of Scheduling 10:353–364, 2007) improved the pseudopolynomial complexity to \(O(\sqrt{n})\). In this paper, we provide a polynomial-time algorithm of time complexity O(log?3 n).  相似文献   

11.
The commonly used one step methods and linear multi-step methods all have a global error that is of the same order as the local truncation error (as defined in [1, 6, 8, 13, 15]). In fact, this is true of the entire class of general linear methods. In practice, this means that the order of the method is typically defined solely by order conditions which are derived by studying the local truncation error. In this work we investigate the interplay between the local truncation error and the global error, and develop a methodology which defines the construction of explicit error inhibiting block one-step methods (alternatively written as explicit general linear methods [2]). These error inhibiting schemes are constructed so that the accumulation of the local truncation error over time is controlled, which results in a global error that is one order higher than the local truncation error. In this work, we delineate how to carefully choose the coefficient matrices so that the growth of the local truncation error is inhibited. We then use this theoretical understanding to construct several methods that have higher order global error than local truncation error, and demonstrate their enhanced order of accuracy on test cases. These methods demonstrate that the error inhibiting concept is realizable. Future work will further develop new error inhibiting methods and will analyze the computational efficiency and linear stability properties of these methods.  相似文献   

12.
Fast Image Inpainting Based on Coherence Transport   总被引:2,自引:0,他引:2  
High-quality image inpainting methods based on nonlinear higher-order partial differential equations have been developed in the last few years. These methods are iterative by nature, with a time variable serving as iteration parameter. For reasons of stability a large number of iterations can be needed which results in a computational complexity that is often too large for interactive image manipulation.Based on a detailed analysis of stationary first order transport equations the current paper develops a fast noniterative method for image inpainting. It traverses the inpainting domain by the fast marching method just once while transporting, along the way, image values in a coherence direction robustly estimated by means of the structure tensor. Depending on a measure of coherence strength the method switches continuously between diffusion and directional transport. It satisfies a comparison principle. Experiments with the inpainting of gray tone and color images show that the novel algorithm meets the high level of quality of the methods of Bertalmio et al. (SIG-GRAPH ’00: Proc. 27th Conf. on Computer Graphics and Interactive Techniques, New Orleans, ACM Press/Addison-Wesley, New York, pp. 417–424, 2000), Masnou (IEEE Trans. Image Process. 11(2):68–76, 2002), and Tschumperlé (Int. J. Comput. Vis. 68(1):65–82, 2006), while being faster by at least an order of magnitude.  相似文献   

13.
MapReduce, first proposed by Google, is a remarkable programming model for processing very large amounts of data. An open-source implementation of MapReduce, called Hadoop, is now used for developing a wide range of applications. Although developing a correct and efficient program on MapReduce is much easier than developing one with MPI etc., it is still nontrivial if the target application requires involved functionalities of Hadoop MapReduce. Under these situations, functional models for MapReduce computation play important roles because we can utilize them for better understanding, proving the correctness, and even optimization of MapReduce programs. In this paper, we develop two functional models, a low-level one and a high-level one, which capture the semantics of Hadoop MapReduce computation. We discuss the detailed semantics mainly in terms of the following two computations: the computation of Mapper and Reducer classes and the computation in the Shuffle phase with the secondary-sorting technique. In addition, we develop MapReduce algorithms for the scan computational pattern (prefix sums) on the newly proposed models.  相似文献   

14.
In this paper, we study open shop scheduling problems with synchronization. This model has the same features as the classical open shop model, where each of the n jobs has to be processed by each of the m machines in an arbitrary order. Unlike the classical model, jobs are processed in synchronous cycles, which means that the m operations of the same cycle start at the same time. Within one cycle, machines which process operations with smaller processing times have to wait until the longest operation of the cycle is finished before the next cycle can start. Thus, the length of a cycle is equal to the maximum processing time of its operations. In this paper, we continue the line of research started by Weiß et al. (Discrete Appl Math 211:183–203, 2016). We establish new structural results for the two-machine problem with the makespan objective and use them to formulate an easier solution algorithm. Other versions of the problem, with the total completion time objective and those which involve due dates or deadlines, turn out to be NP-hard in the strong sense, even for \(m=2\) machines. We also show that relaxed models, in which cycles are allowed to contain less than m jobs, have the same complexity status.  相似文献   

15.
Sokó? and Lewiñski (Struct Multidisc Optim 42:835–853, Sokó? and Lewiñski 2010) published a paper on Michell trusses for three forces in the plane, with an application to a class of symmetric trusses. Vazquez Espi and Cervera Bravo (2011) have written a Discussion on the above paper, which was replied by Sokó? and Lewiñski (2011). In the current Discussion, some comments on the above exchange are offered, because it involves some fundamental issues of topology optimization.  相似文献   

16.
Existing definitions of the relativizations of NC 1, L and NL do not preserve the inclusions \({{\bf NC}^1 \subseteq {\bf L}, {\bf NL}\subseteq {\bf AC}^1}\). We start by giving the first definitions that preserve them. Here for L and NL we define their relativizations using Wilson’s stack oracle model, but limit the height of the stack to a constant (instead of log(n)). We show that the collapse of any two classes in \({\{{\bf AC}^0 (m), {\bf TC}^0, {\bf NC}^1, {\bf L}, {\bf NL}\}}\) implies the collapse of their relativizations. Next we exhibit an oracle α that makes AC k (α) a proper hierarchy. This strengthens and clarifies the separations of the relativized theories in Takeuti (1995). The idea is that a circuit whose nested depth of oracle gates is bounded by k cannot compute correctly the (k + 1) compositions of every oracle function. Finally, we develop theories that characterize the relativizations of subclasses of P by modifying theories previously defined by the second two authors. A function is provably total in a theory iff it is in the corresponding relativized class, and hence, the oracle separations imply separations for the relativized theories.  相似文献   

17.
We study connectivity preserving multivalued functions (Kovalevsky in A new concept for digital geometry, shape in picture, 1994) between digital images. This notion generalizes that of continuous multivalued functions (Escribano et al. in Discrete geometry for computer imagery, lecture notes in computer science, 2008; Escribano et al. in J Math Imaging Vis 42:76–91, 2012) studied mostly in the setting of the digital plane \({\mathbb {Z}}^2\). We show that connectivity preserving multivalued functions, like continuous multivalued functions, are appropriate models for digital morphological operations. Connectivity preservation, unlike continuity, is preserved by compositions, and generalizes easily to higher dimensions and arbitrary adjacency relations.  相似文献   

18.
19.
Query optimization in Big Data becomes a promising research direction due to the popularity of massive data analytical systems such as Hadoop system. The query optimization is getting hard to efficiently execute JOIN queries on top of Hadoop query language, Hive, over limited Big Data storages. According to our previous work, HiveQL Optimization for JOIN query over Multi-session Environment (HOME) system has been introduced over Hadoop system to improve its performance by storing the intermediate results to avoid repeated computations. Time overheads and Big Data storages limitation are considered the main drawback of the HOME system, especially in the case of using additional physical storages or renting extra virtualized storages. In this paper, an index-based system for reusing data called indexing HiveQL Optimization for JOIN over Multi-session Big Data Environment (iHOME) is proposed to overcome HOME overheads by storing only the indexes of the joined rows instead of storing the full intermediate results directly. Moreover, the proposed iHOME system addresses eight cases of JOIN queries which classified into three groups; Similar-to-iHOME, Compute-on-iHOME, and Filter-of-iHOME. According to the experimental results of the iHOME system using TPC-H benchmark, it is found that the execution time of eight JOIN queries using iHOME on Hive has been reduced. Also, the stored data size in the iHOME system is reduced relative to the HOME system, as well as, the Big Data storage is saved. So, by increasing stored data size, the iHOME system guarantees the space scalability and overcomes the storage limitation.  相似文献   

20.
We continue previous work by Mattei et al. (Ann. Math. Artif. Intell. 1042 68(1–3), 135–160 2013) in which they study the computational complexity of bribery schemes when voters have conditional preferences modeled as CP-nets. For most of the cases they considered, they showed that the bribery problem is solvable in polynomial time. Some cases remained open—we solve several of them and extend the previous results to the case that voters are weighted. Additionally, we consider negative (weighted) bribery in CP-nets, when the briber is not allowed to pay voters to vote for his preferred candidate.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号