首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper describes a computer-cluster based parallel database management system (DBMS), InfiniteDB, developed by the authors. InfiniteDB aims at efficiently support data intensive computing in response to the rapid growing in database size and the need of high performance analyzing of massive databases. It can be efficiently executed in the computing system composed by thousands of computers such as cloud computing system. It supports the parallelisms of intra-query, inter-query, intra-operation, inter-operation and pipelining. It provides effective strategies for managing massive databases including the multiple data declustering methods, the declustering-aware algorithms for relational operations and other database operations, and the adaptive query optimization method. It also provides the functions of parallel data warehousing and data mining, the coordinatorwrapper mechanism to support the integration of heterogeneous information resources on the Internet, and the fault tolerant and resilient infrastructures. It has been used in many applications and has proved quite effective for data intensive computing.  相似文献   

2.
This paper describes a computer-cluster based parallel database management system (DBMS), InfiniteDB, developed by the authors. InfiniteDB aims at efficiently support data intensive computing in response to the rapid growing in database size and the need of high performance analyzing of massive databases. It can be efficiently executed in the computing system composed by thousands of computers such as cloud computing system. It supports the parallelisms of intra-query, inter-query, intra-operation, inter-operation and pipelining. It provides effective strategies for managing massive databases including the multiple data declustering methods, the declustering-aware algorithms for relational operations and other database operations, and the adaptive query optimization method. It also provides the functions of parallel data warehousing and data mining, the coordinatorwrapper mechanism to support the integration of heterogeneous information resources on the Internet, and the fault tolerant and resilient infrastructures. It has been used in many applications and has proved quite effective for data intensive computing.  相似文献   

3.
与公有云计算相比,针对数据与计算双重密集型任务的私有云计算系统对计算效率和系统管理效率提出了更高的要求,目前的公有云计算系统显得过于复杂和繁琐,因此需要一种简便易用的能够适应数据与计算密集型任务的私有云计算系统实现。借鉴公有云计算的相关理论和实现方法,提出了一种针对数据与计算双重密集型任务的私有云计算系统实现方案。该方案通过作业文件描述用户的计算任务,确定计算任务的计算模型和计算的输入输出文件;针对私有云的特点,简化Google云计算系统的MapReduce并行处理框架,得到更加直观的数据计算模型;自动连  相似文献   

4.
Volunteer computing systems offer high computing power to the scientific communities to run large data intensive scientific workflows. However, these computing environments provide the best effort infrastructure to execute high performance jobs. This work aims to schedule scientific and data intensive workflows on hybrid of the volunteer computing system and Cloud resources to enhance the utilization of these environments and increase the percentage of workflow that meets the deadline. The proposed workflow scheduling system partitions a workflow into sub-workflows to minimize data dependencies among the sub-workflows. Then these sub-workflows are scheduled to distribute on volunteer resources according to the proximity of resources and the load balancing policy. The execution time of each sub-workflow on the selected volunteer resources is estimated in this phase. If any of the sub-workflows misses the sub-deadline due to the large waiting time, we consider re-scheduling of this sub-workflow into the public Cloud resources. This re-scheduling improves the system performance by increasing the percentage of workflows that meet the deadline. The proposed Cloud-aware data intensive scheduling algorithm increases the percentage of workflow that meet the deadline with a factor of 75% in average with respect to the execution of workflows on the volunteer resources.  相似文献   

5.
As scientific research becomes more data intensive, there is an increasing need for scalable, reliable, and high performance storage systems. Such data repositories must provide both data archival services and rich metadata, and cleanly integrate with large scale computing resources. ROARS is a hybrid approach to distributed storage that provides both large, robust, scalable storage and efficient rich metadata queries for scientific applications. In this paper, we present the design and implementation of ROARS, focusing primarily on the challenge of maintaining data integrity across long time scales. We evaluate the performance of ROARS on a storage cluster, comparing to the Hadoop distributed file system and a centralized file server. We observe that ROARS has read and write performance that scales with the number of storage nodes, and integrity checking that scales with the size of the largest node. We demonstrate the ability of ROARS to function correctly through multiple system failures and reconfigurations. ROARS has been in production use for over three years as the primary data repository for a biometrics research lab at the University of Notre Dame.  相似文献   

6.
7.
Energy awareness is an important aspect of modern network and computing system design and management, especially in the case of internet-scale networks and data intensive large scale distributed computing systems. The main challenge is to design and develop novel technologies, architectures and methods that allow us to reduce energy consumption in such infrastructures, which is also the main reason for reducing the total cost of running a network. Energy-aware network components as well as new control and optimization strategies may save the energy utilized by the whole system through adaptation of network capacity and resources to the actual traffic load and demands, while ensuring end-to-end quality of service. In this paper, we have designed and developed a two-level control framework for reducing power consumption in computer networks. The implementation of this framework provides the local control mechanisms that are implemented at the network device level and network-wide control strategies implemented at the central control level. We also developed network-wide optimization algorithms for calculating the power setting of energy consuming network components and energy-aware routing for the recommended network configuration. The utility and efficiency of our framework have been verified by simulation and by laboratory tests. The test cases were carried out on a number of synthetic as well as on real network topologies, giving encouraging results. Thus, we come up with well justified recommendations for energy-aware computer network design, to conclude the paper.  相似文献   

8.
Non-Gaussian spatial data are common in many sciences such as environmental sciences, biology and epidemiology. Spatial generalized linear mixed models (SGLMMs) are flexible models for modeling these types of data. Maximum likelihood estimation in SGLMMs is usually made cumbersome due to the high-dimensional intractable integrals involved in the likelihood function and therefore the most commonly used approach for estimating SGLMMs is based on the Bayesian approach. This paper proposes a computationally efficient strategy to fit SGLMMs based on the data cloning (DC) method suggested by Lele et al. (2007). This method uses Markov chain Monte Carlo simulations from an artificially constructed distribution to calculate the maximum likelihood estimates and their standard errors. In this paper, the DC method is adapted and generalized to estimate SGLMMs and some of its asymptotic properties are explored. Performance of the method is illustrated by a set of simulated binary and Poisson count data and also data about car accidents in Mashhad, Iran. The focus is inference in SGLMMs for small and medium data sets.  相似文献   

9.
Sampling from a multimodal and high-dimensional target distribution posits a great challenge in Bayesian analysis. A new Markov chain Monte Carlo algorithm Distributed Evolutionary Monte Carlo (DGMC) is proposed for real-valued problems, which combines the attractive features of the distributed genetic algorithm and the Markov chain Monte Carlo. The DGMC algorithm evolves a population of Markov chains through some genetic operators to simulate the target function. Theoretical justification proves that the DGMC algorithm has the target function as its stationary distribution. The effectiveness of the DGMC algorithm is illustrated by simulating two multimodal distributions and an application to a real data example.  相似文献   

10.
11.
Automated context aggregation and file annotation for PAN-based computing   总被引:1,自引:1,他引:0  
This paper presents a method for automatically annotating files created on portable devices with contextual metadata. We achieve this through the combination of two system components. One is a context dissemination mechanism which allows devices in a personal area network (PAN) to maintain a shared aggregate contextual perception. The other is a storage management system that uses such context information to automatically decorate files created on personal devices with annotations. As a result, the user is able to flexibly browse and lookup files that were generated on the move, based on the contextual situation at the time of their creation. What is equally important is that the user is relieved from the cumbersome task of having to manually provide annotations in an explicit fashion. This is especially valuable when generating files on the move, using U/I-restricted portable devices.
Spyros LalisEmail:
  相似文献   

12.
提出并描述了一个基于树型层次结构的计算资源共享与聚集系统(tree-based layered sharing and aggregation,TLSA).TLSA系统由对等网络环境下的空闲节点组成,形成一个类似B树的层次结构,使在节点加入和退出的时候可以自动的维持平衡.树型结构的网络拓扑通过自组织的可用性协议来维护,保证了系统的比较低的消息通信量和平衡的处理器负载.通过内部的资源发现协议,节点可以寻找到系统中最近最合适的空闲计算资源来完成大量的子任务.通过模拟测试结果表明对于大规模的子任务,TLSA可以在很短的时间内寻找到空闲资源,而且网络消息通信量不超过O(logmN),具有低消息通信量、非集中性、可扩展性、自组织等特性.  相似文献   

13.
图计算应用的通信模式以时空随机的点对点细粒度通信为主,但现有高性能计算机的网络系统应对大量细粒度通信时表现不佳,进而影响整体性能.虽然在应用层进行通信优化可以有效提升图计算应用性能,但这会给应用开发人员带来很大的负担,因此提出并实现结构动态的消息聚合技术,通过构建虚拟拓扑的方法在通信路径上增加中间点从而提升消息聚合的效...  相似文献   

14.
Power efficiency investigation has been required in each level of a High Performance Computing (HPC) system because of the increasing computation demands of scientific and engineering applications. Focusing on handling the critical design constraints in the software level that run beyond a parallel system composed of huge numbers of power-hungry components, we optimize HPC program design in order to achieve the best possible power performance on the target hardware platform. The power performance of a CUDA Processing Element (PE) is determined by both hardware factors including power features of each component including with CPU, GPU, main memory and PCI buses, and their interconnection architecture; and software factors including algorithm design and the character of executable instructions performed on it. In this paper, approaches to model and evaluate the power consumption of large scale SIMD computation by CUDA PEs on multi-core and GPU platforms are introduced. The model allows obtaining design characteristic values at the early programming stage, thus benefitting programmers by providing the necessary environment information for choosing the best power-efficient alternative. Based on the model, CPU Dynamic frequency scaling (DFS) can be applied on CUDA PE architecture that adjusts CPU frequency to enhance power efficiency of the entire PE without compromising its computing performance. The power model and power efficiency improvements of the new designs have been validated by measuring the new programs on the real GPU multiprocessing system.  相似文献   

15.
In this study we tested a Bayesian model based on a conjugate gamma/Poisson pair associated with environmental variables derived from satellite data such as sea surface temperature (SST) and its derived gradient fields from Moderate Resolution Imaging Spectroradiometer (MODIS)/Terra, chlorophyll-a concentration from Sea Viewing Wide field of View Sensor (SEAWiFS)/SeaStar and surface winds and Ekman pumping from SeaWinds/Quick Scatterometer (QuikSCAT) to predict weekly catch estimates of the skipjack tuna in the South Brazil Bight. This was achieved by confronting the fishery data with model estimates and regressing the results on the satellite data. The fishery data were expressed by an index of catch per unit effort (CPUE) calculated as the weight of fish caught (in tonnes) by fishing week, and were divided into two series, called historical series (1996–1998; 2001), and validation year (2002). The output of model CPUE estimates is in good agreement with the historical weekly CPUE and generated updated weekly estimates that explained up to 62% of weekly CPUE from 2002. In general, the best proxy for the Bayesian weekly estimates is the gradient zonal SST field. The results refined previous knowledge of the influence of SST on the occurrence of skipjack tuna.  相似文献   

16.
This paper focuses on the Bayesian posterior mean estimates (or Bayes’ estimate) of the parameter set of Poisson hidden Markov models in which the observation sequence is generated by a Poisson distribution whose parameter depends on the underlining discrete-time time-homogeneous Markov chain. Although the most commonly used procedures for obtaining parameter estimates for hidden Markov models are versions of the expectation maximization and Markov chain Monte Carlo approaches, this paper exhibits an algorithm for calculating the exact posterior mean estimates which, although still cumbersome, has polynomial rather than exponential complexity, and is a feasible alternative for use with small scale models and data sets. This paper also shows simulation results, comparing the posterior mean estimates obtained by this algorithm and the maximum likelihood estimates obtained by expectation maximization approach.  相似文献   

17.
We propose a novel framework for generating classification rules from relational data. This is a specialized version of the general framework intended for mining relational data and is defined in granular computing theory. In the framework proposed in this paper we define a method for deriving information granules from relational data. Such granules are the basis for generating relational classification rules. In our approach we follow the granular computing idea of switching between different levels of granularity of the universe. Thanks to this a granule-based relational data representation can easily be replaced by another one and thereby adjusted to a given data mining task, e.g. classification. A generalized relational data representation, as defined in the framework, can be treated as the search space for generating rules. On account of this the size of the search space may significantly be limited. Furthermore, our framework, unlike others, unifies not only the way the data and rules to be derived are expressed and specified, but also partially the process of generating rules from the data. Namely, the rules can be directly obtained from the information granules or constructed based on them.  相似文献   

18.
针对分布式数据库在分析应用方面的聚合计算性能较低的问题,以MongoDB数据库为研究实例,提出了一种基于片键和索引的数据库性能提升方法。首先,通过分析业务特征指导选择的片键字段,该字段需要保证数据在分片节点上的均匀布局;其次,通过研究分布式数据库的索引效率,利用删除查询字段索引的方法进一步提升计算性能,该方法能充分利用硬件资源提高聚合计算的性能。实验结果表明,采用高基数粒度的分片片键能够让数据在集群上均匀地分布在各个数据节点上,而舍弃索引使用全表查询能够有效提高聚合计算的速度,聚合计算优化方法能够有效提高聚合计算的性能。  相似文献   

19.
面向上下文感知计算的贝叶斯网络结构自学习算法的研究   总被引:1,自引:1,他引:1  
通过对上下文感知计算中上下文特点的详细分析,提出一种面向上下文感知计算的通用贝叶斯网络结构自学习方法。该方法能在足够实例数据的支撑下自动对上下文感知计算中上下文之间的关系进行学习,进而形成贝叶斯网络结构,用于从低层上下文向高层上下文的演化。通过对上下文感知计算中上下文的层次化特点的有效利用,该方法对贝叶斯网络自学习方法进行了有效优化。研究分析表明,该方法能显著降低贝叶斯网络学习过程中的时间复杂度。  相似文献   

20.
Bayesian networks have received much attention in the recent literature. In this article, we propose an approach to learn Bayesian networks using the stochastic approximation Monte Carlo (SAMC) algorithm. Our approach has two nice features. Firstly, it possesses the self-adjusting mechanism and thus avoids essentially the local-trap problem suffered by conventional MCMC simulation-based approaches in learning Bayesian networks. Secondly, it falls into the class of dynamic importance sampling algorithms; the network features can be inferred by dynamically weighted averaging the samples generated in the learning process, and the resulting estimates can have much lower variation than the single model-based estimates. The numerical results indicate that our approach can mix much faster over the space of Bayesian networks than the conventional MCMC simulation-based approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号