首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
MapReduce frameworks allow programmers to write distributed, data‐parallel programs that operate on multisets. These frameworks offer considerable flexibility to support various kinds of programs and data. To understand the essence of the programming model better and to provide a rigorous foundation for optimizations, we present an abstract, functional model of MapReduce along with a number of customization options. We demonstrate that the MapReduce programming model can also represent programs that operate on lists, which differ from multisets in that the order of elements matters. Along with the functional model, we offer a cost model that allows programmers to estimate and compare the performance of MapReduce programs. Based on the cost model, we introduce two transformation rules aiming at performance optimization of MapReduce programs, which also demonstrates the usefulness of our model. In an exploratory study, we assess the impact of applying these rules to two applications. The functional model and the cost model provide insights at a proper level of abstraction into why the optimization works. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

2.
A Statistical Language Modeling Approach to Online Deception Detection   总被引:1,自引:0,他引:1  
Online deception is disrupting our daily life, organizational process, and even national security. Existing approaches to online deception detection follow a traditional paradigm by using a set of cues as antecedents for deception detection, which may be hindered by ineffective cue identification. Motivated by the strength of statistical language models (SLMs) in capturing the dependency of words in text without explicit feature extraction, we developed SLMs to detect online deception. We also addressed the data sparsity problem in building SLMs in general and in deception detection in specific using smoothing and vocabulary pruning techniques. The developed SLMs were evaluated empirically with diverse datasets. The results showed that the proposed SLM approach to deception detection outperformed a state-of-the-art text categorization method as well as traditional feature-based methods.  相似文献   

3.
4.
针对云计算环境下求解大规模的Web服务选择问题,提出基于MapReduce模式的并行离散萤火虫群优化算法。该算法重新定义个体的编码,计算个体间的距离,改进位置更新,提高高维空间中的搜索能力,并采用分群分治思想和理想点方法进行优化,以避免过早陷入局部最优,提高处理大规模问题的能力。实验结果表明,该算法在求解服务选择问题上具有可行性和有效性,且扩展性较好。  相似文献   

5.
An important property of today’s big data processing is that the same computation is often repeated on datasets evolving over time, such as web and social network data. While repeating full computation of the entire datasets is feasible with distributed computing frameworks such as Hadoop, it is obviously inefficient and wastes resources. In this paper, we present HadUP (Hadoop with Update Processing), a modified Hadoop architecture tailored to large-scale incremental processing with conventional MapReduce algorithms. Several approaches have been proposed to achieve a similar goal using task-level memoization. However, task-level memoization detects the change of datasets at a coarse-grained level, which often makes such approaches ineffective. Instead, HadUP detects and computes the change of datasets at a fine-grained level using a deduplication-based snapshot differential algorithm (D-SD) and update propagation. As a result, it provides high performance, especially in an environment where task-level memoization has no benefit. HadUP requires only a small amount of extra programming cost because it can reuse the code for the map and reduce functions of Hadoop. Therefore, the development of HadUP applications is quite easy.  相似文献   

6.
基于UML的软件产品线建模方法研究   总被引:3,自引:0,他引:3  
软件产品线方法是一种面向特定领域的、大规模、大粒度的软件复用技术。文章简要介绍了基于UML的软件产品线建模方法。由于软件产品线对于产品线成员公共性和变化性的特殊关注,其用例模型、交互模型、状态模型、静态模型、特征模型和单一软件系统相比都有所区别。文章对各种模型进行描述的同时,以报业排版软件产品线样张打印功能为例,进行具体说明。  相似文献   

7.
In recent years, the MapReduce framework has become one of the most popular parallel computing platforms for processing big data. MapReduce is used by companies such as Facebook, IBM, and Google to process or analyze massive data sets. Since the approach is frequently used for industrial solutions, the algorithms based on the MapReduce framework gained significant attention within the scientific community. The subgraph isomorphism is a fundamental graph theory problem. Finding small patterns in large graphs is a core challenge in the analysis of applications with big data sets. This paper introduces two novel algorithms, which are capable of finding matching patterns in arbitrary large graphs. The algorithms are designed for utilizing the easy parallelization technique offered by the MapReduce framework. The approaches are evaluated regarding their space and memory requirements. The paper also provides the applied data structure and presents formal analysis of the algorithms.  相似文献   

8.
In this paper we investigate how best to model naturally arising distributions of colour camera data. It has become standard to model single mode distributions of colour data by ignoring the intensity component and constructing a Gaussian model of the chromaticity. This approach is appealing, because the intensity of data can change arbitrarily due to shadowing and shading, whereas the chromaticity is more robust to these effects. However, it is unclear how best to construct such a model, since there are many domains in which the chromaticity can be represented. Furthermore, the applicability of this kind of model is questionable in all but the most basic lighting environments.We begin with a review of the reflection processes that give rise to distributions of colour data. Several candidate models are then presented; some are from the existing literature and some are novel. Properties of the different models are compared analytically and the models are empirically compared within a region tracking application over two separate sets of data. Results show that chromaticity based models perform well in constrained environments where the physical model upon which they are based applies. It is further found that models based on spherical representations of the chromaticity data provide better performance than those based on more common planar representations, such as the chromaticity plane or the normalised colour space. In less constrained environments, however, such as daylight, chromaticity based models do not perform well, because of the effects of additional illumination components, which violate the physical model upon which they are based.  相似文献   

9.
基于UML的XML建模方法   总被引:7,自引:0,他引:7  
张志  赵文耘  李川 《计算机工程》2003,29(8):195-196,F003
针对由业务模型直接定义XML Schema存在较多的困难,提出了一种把UML技术用于XML Schema建模的方法。该方法通过领域建模定义了3层模型,即概念层模型、逻辑层模型和实现层模型,分别对应于业务模型的分析、设计及实现阶段,并对各层模型之间的转换方法进行了有效的探讨。  相似文献   

10.
Automation and Remote Control - This paper is devoted to the mathematical modeling of traffic flow in a large automobile network. A statistical model of traffic flow proposed by the authors and...  相似文献   

11.
Predictive Statistical Models for User Modeling   总被引:11,自引:1,他引:11  
The limitations of traditional knowledge representation methods for modeling complex human behaviour led to the investigation of statistical models. Predictive statistical models enable the anticipation of certain aspects of human behaviour, such as goals, actions and preferences. In this paper, we motivate the development of these models in the context of the user modeling enterprise. We then review the two main approaches to predictive statistical modeling, content-based and collaborative, and discuss the main techniques used to develop predictive statistical models. We also consider the evaluation requirements of these models in the user modeling context, and propose topics for future research.  相似文献   

12.
Statistical Approach for Voice Personality Transformation   总被引:1,自引:0,他引:1  
A voice transformation method which changes the source speaker's utterances so as to sound similar to those of a target speaker is described. Speaker individuality transformation is achieved by altering the LPC cepstrum, average pitch period and average speaking rate. The main objective of the work involves building a nonlinear relationship between the parameters for the acoustical features of two speakers, based on a probabilistic model. The conversion rules involve the probabilistic classification and a cross correlation probability between the acoustic features of the two speakers. The parameters of the conversion rules are estimated by estimating the maximum likelihood of the training data. To obtain transformed speech signals which are perceptually closer to the target speaker's voice, prosody modification is also involved. Prosody modification is achieved by scaling excitation spectrum and time scale modification with appropriate modification factors. An evaluation by objective tests and informal listening tests clearly indicated the effectiveness of the proposed transformation method. We also confirmed that the proposed method leads to smoothly evolving spectral contours over time, which, from a perceptual standpoint, produced results that were superior to conventional vector quantization (VQ)-based methods  相似文献   

13.
Registration and modeling of shapes are two important problems in computer vision and pattern recognition. Despite enormous progress made over the past decade, these problems are still open. In this paper, we advance the state of the art in both directions. First we consider an efficient registration method that aims to recover a one-to-one correspondence between shapes and introduce measures of uncertainties driven from the data which explain the local support of the recovered transformations. To this end, a free form deformation is used to describe the deformation model. The transformation is combined with an objective function defined in the space of implicit functions used to represent shapes. Once the registration parameters have been recovered, we introduce a novel technique for model building and statistical interpretation of the training examples based on a variable bandwidth kernel approach. The support on the kernels varies spatially and is determined according to the uncertainties of the registration process. Such a technique introduces the ability to account for potential registration errors in the model. Hand-written character recognition and knowledge-based object extraction in medical images are examples of applications that demonstrate the potentials of the proposed framework.  相似文献   

14.
Verification and correction of state vector estimates formed with regard for different errors in solving nonlinear estimation equations in a real computing medium are studied. Effective methods of solving this problem are elaborated on the basis of invariants and -invariants of the mathematical model of motion of an object. An illustrative example is given.  相似文献   

15.
为解决AprioriTid算法对大数据执行效率不高的问题,根据Hadoop平台的MapReduce模型,分析了AprioriTid算法的并行化方法,给出了并行化的主要步骤和Map、Reduce函数的描述。与串行的AprioriTid算法相比,并行算法利用了多个节点的计算能力,缩短了从大数据集中挖掘关联规则的时间。对并行算法的性能进行了测试,实验结果表明,并行AprioriTid算法具有较高的执行效率和较好的可扩展性。  相似文献   

16.
The MapReduce framework has become the de facto standard for big data processing due to its attractive features and abilities. One is that it automatically parallelizes a job into multiple tasks and transparently handles task execution on a large cluster of commodity machines. The increasing heterogeneity of distributed environments may result in a few straggling tasks, which prolong job completion. Speculative execution is proposed to mitigate stragglers. However, the existing speculative execution mechanism could not work efficiently as many speculative tasks are still slower than their original tasks. In this paper, we explore an approach to increase the efficiency of speculative execution, and further improve MapReduce performance. We propose the Partial Speculative Execution (PSE) strategy to make speculative tasks start from the checkpoint. By leveraging the checkpoint of original tasks, PSE can eliminate the costs of re-reading, re-copying, and re-computing the processed data. We implement PSE in Hadoop, and evaluate its performance in terms of job completion time and the efficiency of speculative execution under several kinds of classical workloads. Experimental results show that, in heterogeneous environments with stragglers, PSE completes jobs 56 % faster than that with no speculation and 12 % faster than that with LATE, an improved speculative execution algorithm. In addition, on average PSE can improve the efficiency of speculative execution by 24 % compared to LATE.  相似文献   

17.
Economic modeling of financial markets attempts to model highly complex systems in which expectations can be among the dominant driving forces. It is necessary, then, to focus on how agents form expectations. We believe that they look for patterns, hypothesize, try, make mistakes, learn and adapt. Agents' bounded rationality leads us to a rule-based approach which we model using Fuzzy Rule Bases. For example if a single agent believes the exchange rate is determined by a set of possible inputs and is asked to state his relationship, his answer will probably reveal a fuzzy nature like: IF the inflation rate in the EURO-Zone is low and the GDP growth rate islarger than in the US THEN the EURO will rise against the USD.Low and larger are fuzzy terms which give a graduallinguistic meaning to crisp intervalls in the respective universes of discourse. In order to learn a Fuzzy Rule base from examples we introduce Genetic Algorithms and Artificial Neural Networks as learning operators. These examples can either be empirical data or originate from an economic simulation model. The software GENEFER (GEnetic NEural Fuzzy ExploreR) has been developedfor designing such a Fuzzy Rule Base. The design process is modular and comprises Input Identification, Fuzzification, Rule Base Generating and Rule Base Tuning. The two latter steps make use of genetic and neural learning algorithms for optimizing the Fuzzy Rule Base.  相似文献   

18.
FP—Growth算法MapReduce化研究   总被引:1,自引:0,他引:1  
随着云计算概念的盛行,以及数据挖掘技术在分布式环境下的应用问题,该文献针对当前业界中流行的大规模并行计算模型MapReduce,将其引入数据挖掘领域关联规则算法的并行化改进中,提出基于FP-Growth算法并行化改进的MR—FP算法,为并行化关联规则挖掘提供节点可扩展、可容错、故障可恢复的运行保证。并通过案例分析得出系统在事务数呈数量级级别增长下仍可保持较高的性能。通过理论分析和案例实验表明,数据挖掘理论和方法在云计算环境下可以充分发挥能力,具有广阔的、有价值的研究空间。  相似文献   

19.
We show that MapReduce, the de facto standard for large scale data-intensive parallel programming, can be equipped with a programming theory in calculational form. By integrating the generate-and-test programming paradigm and semirings for aggregation of results, we propose a novel parallel programming framework for MapReduce. The framework consists of two important calculation theorems: the shortcut fusion theorem of semiring homomorphisms bridges the gap between specifications and efficient implementations, and the filter-embedding theorem helps to develop parallel programs in a systematic and incremental way.  相似文献   

20.
The explosive growth of audiovisual information in the last few years has made the development of advanced video modeling and management tools an urgent task. In this research, we investigate the use of stratification approach to model the contextual information of video contents as multi-layered strata. By judiciously choosing the right level and types of strata, we are able to automate a major part of the strata extraction process. Using the strata as the basis, we have developed advanced functionalities to support the flexible retrieval and content-based browsing of video. A prototype has been developed to support the whole process of video management, from strata extraction, to indexing, retrieval and browsing. The prototype is tested in the domain of news video and the system has been found to be effective.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号