首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In MapReduce model, a job is divided into a series of map tasks and reduce tasks. The execution time of the job is prolonged by some slow tasks seriously, especially in heterogeneous environments. To finish the slow tasks as soon as possible, current MapReduce schedulers launch a backup task on other nodes for each of the slow tasks. However, traditional MapReduce schedulers cannot detect slow tasks correctly since they cannot estimate the progress of tasks accurately (Hadoop home page http://hadoop.apache.org/, 2011; Zaharia et al. in 8th USENIX symposium on operating systems design and implementation, ACM, New York, pp. 29–42, 2008). To solve this problem, this paper proposes a History-based Auto-Tuning (HAT) MapReduce scheduler, which calculates the progress of tasks accurately and adapts to the continuously varying environment automatically. HAT tunes the weight of each phase of a map task and a reduce task according to the value of them in history tasks and uses the accurate weights of the phases to calculate the progress of current tasks. Based on the accurate-calculated progress of tasks, HAT estimates the remaining time of tasks accurately and further launches backup tasks for the tasks that have the longest remaining time. Experimental results show that HAT can significantly improve the performance of MapReduce applications up to 37% compared with Hadoop and up to 16% compared with LATE scheduler.  相似文献   

2.
MapReduce and its open source implementation, Hadoop, have gained widespread adoption for parallel processing of big data jobs. Since the number of such big data jobs is also rapidly rising, reducing their energy consumption is increasingly more important to reduce environmental impact as well as operational costs. Prior work by Mashayekhy et al. (IEEE Trans. Parallel Distributed Syst. 26, 2720–2733, 2016), has tackled the problem of energy-aware scheduling of a single MapReduce job but we provide a far more efficient heuristic in this paper. We first model the problem as an Integer Linear Program to find the optimal solution using ILP solvers. Then we present a task-based greedy scheduling algorithm, TGSAVE, to select a slot for each task to minimize the total energy consumption of the MapReduce job for big data applications in heterogeneous environments without significant performance loss while satisfying the service level agreement (SLA). We perform several experiments on a Hadoop cluster to measure characteristics of tasks for nine different applications to evaluate our proposed algorithm. The results show that the total energy consumption of MapReduce jobs obtained by TGSAVE is up to 35% less than that achieved by EMRSA proposed in Mashayekhy et al. (IEEE Trans. Parallel Distributed Syst. 26, 2720–2733, 2016), its closest rival, for same workloads. Besides, TGSAVE is capable of finding a solution in same order of time for up to 74% tighter deadlines than the tightest deadline that EMRSA can find a feasible one. On average, TGSAVE solution is approximately 1.4% far from the optimal solution, and it can meet deadlines as tight as 12%, on average, above the energy-oblivious minimum makespan in the benchmarks we examined.  相似文献   

3.
This paper presents an agent-based simulator for environmental land change that includes efficient and parallel auto-tuning. This simulator extends the Multi-Agent System for Environmental simulation (MASE) by introducing rationality to agents using a mentalistic approach—the Belief-Desire-Intention (BDI) model—and is thus named MASE-BDI. Because the manual tuning of simulation parameters is an error-prone, labour and computing intensive task, an auto-tuning approach with efficient multi-objective optimization algorithms is also introduced. Further, parallelization techniques are employed to speed up the auto-tuning process by deploying it in parallel systems. The MASE-BDI is compared to the MASE using the Brazilian Cerrado biome case. The MASE-BDI reduces the simulation execution times by at least 82 × and slightly improves the simulation quality. The auto-tuning algorithms, by evaluating less than 0.00115 % of a search space with 6 million parameter combinations, are able to quickly tune the simulation model, regardless of the objective used. Moreover, the experimental results show that executing the tuning in parallel leads to speedups of approximately 11 × compared to sequential execution in a hardware setting with 16-CPU cores.  相似文献   

4.
Hoare logic [1] is a logic used as a way of specifying semantics of programming languages, which has been extended to be a separation logic to reason about mutable heap structure [2]. In a model M of Hoare logic, each program α induces an M-computable function f α M on the universe of M; and the M-recursive functions are defined on M. It will be proved that the class of all the M-computable functions f α M induced by programs is equal to the class of all the M-recursive functions. Moreover, each M-recursive function is \(\sum {_1^{{N^M}}} \)-definable in M, where the universal quantifier is a number quantifier ranging over the standard part of a nonstandard model M.  相似文献   

5.

Cloud computing systems are splitting compute- and data-intensive jobs into smaller tasks to execute them in a parallel manner using clusters to improve execution time. However, such systems at increasing scale are exposed to stragglers, whereby abnormally slow running tasks executing within a job substantially affect job performance completion. Such stragglers are a direct threat towards attaining fast execution of data-intensive jobs within cloud computing. Researchers have proposed an assortment of different mechanisms, frameworks, and management techniques to detect and mitigate stragglers both proactively and reactively. In this paper, we present a comprehensive review of straggler management techniques within large-scale cloud data centres. We provide a detailed taxonomy of straggler causes, as well as proposed management and mitigation techniques based on straggler characteristics and properties. From this systematic review, we outline several outstanding challenges and potential directions of possible future work for straggler research.

  相似文献   

6.
Query optimization in Big Data becomes a promising research direction due to the popularity of massive data analytical systems such as Hadoop system. The query optimization is getting hard to efficiently execute JOIN queries on top of Hadoop query language, Hive, over limited Big Data storages. According to our previous work, HiveQL Optimization for JOIN query over Multi-session Environment (HOME) system has been introduced over Hadoop system to improve its performance by storing the intermediate results to avoid repeated computations. Time overheads and Big Data storages limitation are considered the main drawback of the HOME system, especially in the case of using additional physical storages or renting extra virtualized storages. In this paper, an index-based system for reusing data called indexing HiveQL Optimization for JOIN over Multi-session Big Data Environment (iHOME) is proposed to overcome HOME overheads by storing only the indexes of the joined rows instead of storing the full intermediate results directly. Moreover, the proposed iHOME system addresses eight cases of JOIN queries which classified into three groups; Similar-to-iHOME, Compute-on-iHOME, and Filter-of-iHOME. According to the experimental results of the iHOME system using TPC-H benchmark, it is found that the execution time of eight JOIN queries using iHOME on Hive has been reduced. Also, the stored data size in the iHOME system is reduced relative to the HOME system, as well as, the Big Data storage is saved. So, by increasing stored data size, the iHOME system guarantees the space scalability and overcomes the storage limitation.  相似文献   

7.
8.
9.
A new approach to estimating the fault-tolerance of the parallel control computing systems relies on the mathematical model-based determination of the probability of successful completion in a given schedule time of an arbitrary set of interdependent jobs (tasks) with random times of job execution and asynchronous job redundancy. The estimates were determined both for the standard execution of a set of tasks and for the case of single malfunction (fault or failure) of any computing system processor detected at execution of any job from the set. The basic distinction of this approach lies in that here the numerical values of the reliability parameters (probabilities or intensities of faults or failures) of the computing resources are neither given nor used.  相似文献   

10.
Existing definitions of the relativizations of NC 1, L and NL do not preserve the inclusions \({{\bf NC}^1 \subseteq {\bf L}, {\bf NL}\subseteq {\bf AC}^1}\). We start by giving the first definitions that preserve them. Here for L and NL we define their relativizations using Wilson’s stack oracle model, but limit the height of the stack to a constant (instead of log(n)). We show that the collapse of any two classes in \({\{{\bf AC}^0 (m), {\bf TC}^0, {\bf NC}^1, {\bf L}, {\bf NL}\}}\) implies the collapse of their relativizations. Next we exhibit an oracle α that makes AC k (α) a proper hierarchy. This strengthens and clarifies the separations of the relativized theories in Takeuti (1995). The idea is that a circuit whose nested depth of oracle gates is bounded by k cannot compute correctly the (k + 1) compositions of every oracle function. Finally, we develop theories that characterize the relativizations of subclasses of P by modifying theories previously defined by the second two authors. A function is provably total in a theory iff it is in the corresponding relativized class, and hence, the oracle separations imply separations for the relativized theories.  相似文献   

11.
An approach to stabilization of nonlinear oscillations in multidimensional spaces is proposed on the basis of the V.I. Zubov’s stability theory for invariant sets. As a special case, the derived controls make it possible to excite self-oscillating regimes in specified state subspaces R 2k ? R 2n with simultaneous oscillation damping on Cartesian products R 2n?2k .  相似文献   

12.
We consider the problem of mining web access patterns with super-pattern constraint. This constraint requires that the sequential patterns in the sequence database must contain a particular set of patterns as sub-patterns. One common application of this constraint is web usage mining which mines the user access behavior on the web. In this paper, we introduce an efficient strategy for mining web access patterns with super-pattern constraint that requires only one database scan. Firstly, we present the MWAPC (M ining W eb A ccess P atterns based on super-pattern C onstraint) algorithm, in which each frequent pattern has to be checked if it contains at least one pattern from a user-defined set of patterns. Then we develop an effective algorithm, called EMWAPC that prunes the search space at the beginning of mining process and avoids checking the constraints one by one based on three proposed propositions. We have conducted the experiments on real web log databases. The experimental results show that the proposed algorithms outperform the previous methods.  相似文献   

13.
A B 4-valued propositional logic will be proposed in this paper which there are three unary logical connectives ~1, ~2, ¬ and two binary logical connectives ∧, ∨, and a Gentzen-typed deduction system will be given so that the system is sound and complete with B 4-valued semantics, where B 4 is a Boolean algebra.  相似文献   

14.
In the problem of the stabilizing solution of the algebraic Riccati equation, the resolvent Θ(s) = (s I 2n ? H)?1 of the Hamilton 2n × 2n-matrix H of the algebraic Riccati equation allows us to reduce the problem to a linear matrix equation. In [1], the constructions necessary for this and the theorem of existence and representation of the stabilized solutions to an algebraic Riccati equation was proposed. In this paper, the methods of constructing the resolvent and the linear reduction matrix defined by it necessary for the application of the theorem, and in addition, the algorithms of constructing stabilizing solution of the algebraic Riccati equation are proposed.  相似文献   

15.
Paper presents a unique novel online learning algorithm for eight popular nonlinear (i.e., kernel), classifiers based on a classic stochastic gradient descent in primal domain. In particular, the online learning algorithm is derived for following classifiers: L1 and L2 support vector machines with both a quadratic regularizer w t w and the l 1 regularizer |w|1; regularized huberized hinge loss; regularized kernel logistic regression; regularized exponential loss with l 1 regularizer |w|1 and Least squares support vector machines. The online learning algorithm is aimed primarily for designing classifiers for large datasets. The novel learning model is accurate, fast and extremely simple (i.e., comprised of few coding lines only). Comparisons of performances of the proposed algorithm with the state of the art support vector machine algorithm on few real datasets are shown.  相似文献   

16.
Summarization is an important intermediate step for expediting knowledge discovery tasks such as anomaly detection. In the context of anomaly detection from data stream, the summary needs to represent both anomalous and normal data. But streaming data has distinct characteristics, such as one-pass constraint, for which conducting data mining operations are difficult. Existing stream summarization techniques are unable to create summary which represent both normal and anomalous instances. To address this problem, in this paper, a number of hybrid summarization techniques are designed and developed using the concept of reservoir for anomaly detection from network traffic. Experimental results on thirteen benchmark data streams show that the summaries produced from stream using pairwise distance (PSSR) and template matching (TMSSR) techniques can retain more anomalies than existing stream summarization techniques, and anomaly detection technique can identify the anomalies with high true positive and low false positive rate.  相似文献   

17.
Biterm Topic Model (BTM) is an effective topic model proposed to handle short texts. However, its standard gibbs sampling inference method (StdBTM) costs much more time than that (StdLDA) of Latent Dirichlet Allocation (LDA). To solve this problem we propose two time-efficient gibbs sampling inference methods, SparseBTM and ESparseBTM, for BTM by making a tradeoff between space and time consumption in this paper. The idea of SparseBTM is to reduce the computation in StdBTM by both recycling intermediate results and utilizing the sparsity of count matrix \(\mathbf {N}^{\mathbf {T}}_{\mathbf {W}}\). Theoretically, SparseBTM reduces the time complexity of StdBTM from O(|B| K) to O(|B| K w ) which scales linearly with the sparsity of count matrix \(\mathbf {N}^{\mathbf {T}}_{\mathbf {W}}\) (K w ) instead of the number of topics (K) (K w < K, K w is the average number of non-zero topics per word type in count matrix \(\mathbf {N}^{\mathbf {T}}_{\mathbf {W}}\)). Experimental results have shown that in good conditions SparseBTM is approximately 18 times faster than StdBTM. Compared with SparseBTM, ESparseBTM is a more time-efficient gibbs sampling inference method proposed based on SparseBTM. The idea of ESparseBTM is to reduce more computation by recycling more intermediate results through rearranging biterm sequence. In theory, ESparseBTM reduces the time complexity of SparseBTM from O(|B|K w ) to O(R|B|K w ) (0 < R < 1, R is the ratio of the number of biterm types to the number of biterms). Experimental results have shown that the percentage of the time efficiency improved by ESparseBTM on SparseBTM is between 6.4% and 39.5% according to different datasets.  相似文献   

18.
Model-based testing has mainly focused on models where concurrency is interpreted as interleaving (like the ioco theory for labeled transition systems), which may be too coarse when one wants concurrency to be preserved in the implementation. In order to test such concurrent systems, we choose to use Petri nets as specifications and define a concurrent conformance relation named co-ioco. We present a test generation algorithm based on Petri net unfolding able to build a complete test suite w.r.t our co-ioco conformance relation. In addition, we propose several coverage criteria that allow to select finite prefixes of an unfolding in order to build manageable test suites.  相似文献   

19.
Many scholarly writings today are available in electronic formats. With universities around the world choosing to make digital versions of their dissertations, theses, project reports, and related files and data sets available online, an overwhelming amount of information is becoming available on almost any particular topic. How will users decide which dissertation, or subsection of a dissertation, to read to get the required information on a particular topic? What kind of services can such digital libraries provide to make knowledge discovery easier? In this paper, we investigate these issues, using as a case study the Networked Digital Library of Theses and Dissertations (NDLTD), a rapidly growing collection that already has about 800,000 Electronic Theses and Dissertations (ETDs) from universities around the world. We propose the design for a scalable, Web Services based tool KDWebS (Knowledge Discovery System based on Web Services), to facilitate automated knowledge discovery in NDLTD. We also provide some preliminary proof of concept results to demonstrate the efficacy of the approach.  相似文献   

20.
Suffix array is a powerful data structure, used mainly for pattern detection in strings. The main disadvantage of a full suffix array is its quadratic O(n2) space capacity when the actual suffixes are needed. In our previous work [39], we introduced the innovative All Repeated Patterns Detection (ARPaD) algorithm and the Moving Longest Expected Repeated Pattern (MLERP) process. The former detects all repeated patterns in a string using a partition of the full Suffix Array and the latter is capable of analyzing large strings regardless of their size. Furthermore, the notion of Longest Expected Repeated Pattern (LERP), also introduced by the authors in a previous work, significantly reduces to linear O(n) the space capacity needed for the full suffix array. However, so far the LERP value has to be specified in ad hoc manner based on experimental or empirical values. In order to overcome this problem, the Probabilistic Existence of LERP theorem has been proven in this paper and, furthermore, a formula for an accurate upper bound estimation of the LERP value has been introduced using only the length of the string and the size of the alphabet used in constructing the string. The importance of this method is the optimum upper bounding of the LERP value without any previous preprocess or knowledge of string characteristics. Moreover, the new data structure LERP Reduced Suffix Array is defined; it is a variation of the suffix array, and has the advantage of permitting the classification and parallelism to be implemented directly on the data structure. All other alternative methodologies deal with the very common problem of fitting any kind of data structure in a computer memory or disk in order to apply different time efficient methods for pattern detection. The current advanced and elegant proposed methodology allows us to alter the above-mentioned problem such that smaller classes of the problem can be distributed on different systems and then apply current, state-of-the-art, techniques such as parallelism and cloud computing using advanced DBMSs which are capable of handling the storage and analysis of big data. The implementation of the above-described methodology can be achieved by invoking our innovative ARPaD algorithm. Extensive experiments have been conducted on small, comparable strings of Champernowne Constant and DNA as well as on extremely large strings of π with length up to 68 billion digits. Furthermore, the novelty and superiority of our methodology have been also tested on real life application such as a Distributed Denial of Service (DDoS) attack early warning system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号