首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We describe the design and implementation of our StratOSphere project, a framework which unifies distributed objects and mobile code applications. We begin by first examining different mobile code paradigms that distribute processing of code and data resource components across a network. After analyzing these paradigms, and presenting a lattice of functionality, we then develop a layered architecture for StratOSphere, incorporating higher levels of mobility and interoperability at each successive layer. In our design, we provide an object model that permits objects to migrate to different sites, select among different method implementations, and provide new methods and behavior. We describe how we build new semantics in each software layer, and present sample objects developed for the Alexandria Digital Library Project at UC Santa Barbara, which as been building an information retrieval system for geographically-referenced information and datasets. We have designed using StratOSphere a repository that stores its holdings. The library's map, image and geographical data are viewed as a collection of objects with extensible operations. StratOSphere.  相似文献   

2.
在大数据时代,我们面临着多种数据类型,数据规模以前所未有的速度增长,这给数据存储、管理以及分析带来了很大的挑战。传统的单机存储引擎显然不能满足数据爆炸性增长的需求,需要构建高性能、高可扩展、低成本、易用的分布式存储系统基础设施。本文对不同的分布式存储系统和其中的关键技术的研究进行全面的阐述和分析。  相似文献   

3.
In this work, we focus on distance-based outliers in a metric space, where the status of an entity as to whether it is an outlier is based on the number of other entities in its neighborhood. In recent years, several solutions have tackled the problem of distance-based outliers in data streams, where outliers must be mined continuously as new elements become available. An interesting research problem is to combine the streaming environment with massively parallel systems to provide scalable stream-based algorithms. However, none of the previously proposed techniques refer to a massively parallel setting. Our proposal fills this gap and investigates the challenges in transferring state-of-the-art techniques to Apache Flink, a modern platform for intensive streaming analytics. We thoroughly present the technical challenges encountered and the alternatives that may be applied, of which a micro-clustering-based one is the most efficient. We show speed-ups of up to 2.27 times over advanced non-parallel solutions, by using just an ordinary four-core machine and a real-world dataset. When moving to a three-machine cluster, due to less contention, we manage to achieve both better scalability in terms of the window slide size and the data dimensionality, and even higher speed-ups, e.g., by a factor of more than 11X. Overall, our results demonstrate that outlier mining can be achieved in an efficient and scalable manner. The resulting techniques have been made publicly available as open-source software.  相似文献   

4.
Sukyoung Ryu 《Software》2016,46(9):1219-1238
Programming languages grow over time that requires frequent changes in language manipulations such as compilation, interpretation, and analysis. Because the very first step of most language manipulations is parsing, whether parsing can adapt to changes easily, quickly, and correctly, it affects the scalability of language manipulations. Even though various parsing techniques have been well studied theoretically, their practical experiences in scalable frameworks have not been available. In this paper, we present our experiences with parsing in scalable frameworks. We first describe our trials and errors using various parsing techniques in developing parsers for the Fortress programming language. Because Fortress was a scientific language under development, its mathematical and growable syntax introduced new challenges in parsing. We summarize the lessons learned from parsing Fortress, and we share our experience of applying the lessons to parsing the JavaScript programming language. While JavaScript is one of the most widely used languages, JavaScript itself and its diverse variants keep extending its syntax, and the extremely dynamic features of JavaScript also add new challenges in parsing. Using automatic generation tools and methods like staged parsing and automatic extraction and testing of examples in language specifications, our methodology for scalable parsing has shown to be very effective in practice. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

5.
This paper describes the design of the Abstract Library for Parallel Search (ALPS), a framework for implementing scalable, parallel algorithms based on tree search. ALPS is specifically designed to support data-intensive algorithms, in which large amounts of data are required to describe each node in the search tree. Implementing such algorithms in a scalable manner is challenging both because of data storage requirements and communication overhead. ALPS incorporates a number of new ideas to address this challenge. The paper also describes the design of two other libraries forming a hierarchy built on top of ALPS. The first is the Branch, Constrain, and Price Software (BiCePS) library, a framework that supports the implementation of parallel branch and bound algorithms in which the bounds are obtained by solving some sort of relaxation, usually Lagrangian. In this layer, the notion of global data objects associated with the variables and constraints is introduced. These global objects provide a connection between the various subproblems in the search tree, but they pose further difficulties for designing scalable algorithms. The other library is the BiCePS linear integer solver (BLIS), a concretization of BiCePS, in which linear programming is used to obtain bounds in each search tree node.  相似文献   

6.
随着E级计算的屏障被打破,高性能计算已经迈入了新时代。为了满足日益增长的数据访问需求,新兴的技术和存储介质都被运用到了超级计算机中,这使得其架构变得日趋复杂,其性能异常和系统热点定位也变得十分困难。为此,设计并实现了一个面向E级超级计算机的轻量级端到端I/O性能监控与分析诊断系统——Beacon+。该系统无需修改应用代码/脚本即可对每个应用的数据访问过程进行全路径实时监控与分析。通过在线+离线的压缩方法和分布式缓存/存储等机制,Beacon+在保证系统本身高扩展性、低开销的同时还可以持续稳定地提供I/O诊断服务。以神威新一代超级计算机为部署平台,通过I/O标准测试应用和实际应用证明了Beacon+的低开销和高准确性,以及I/O诊断的高效性。  相似文献   

7.
The National Museum of Photography, Film and Television is Britain's leading center for lens-based media. To provide a strengthened platform for the 21st Century, the museum created a new flagship gallery that explores and addresses the importance of digital media. “Wired Worlds: Exploring the Digital Frontier” uses innovative presentational and communication approaches to describe the distinctive nature, styles and techniques of digital media. The 500-m 2 gallery took nearly three years and $2.2 million to complete. To address the challenges of an evolving field, we adopted an intensive, organic research and development approach. We organized a creative/curatorial project team, 10 contracted international digital media artists and developers, and a London-based design studio. We also consulted with other media professionals, academics and partnership companies  相似文献   

8.
OpenStreetMap: User-Generated Street Maps   总被引:10,自引:0,他引:10  
The activity of mapping our environment used to be the preserve of highly trained and well-equipped surveyors and cartographers. The increase in the availability of computing in the wider environment through laptops, hand-held computers, and mobile phones, in combination with the free access to location information from GPS satellites, provided new opportunities for a wider range of people to be part of mapping activities and to create a bottom-up map, generated by users. In this article, we describe the Open Geodata project OpenStreetMap. We provide an overview of the project, the techniques used to collect, organize, and deliver mapping information, and conclude with an analysis of the opportunities and challenges that the project faces.  相似文献   

9.
10.
In today's knowledge‐, service‐, and cloud‐based economy, an overwhelming amount of business‐related data are being generated at a fast rate daily from a wide range of sources. These data increasingly show all the typical properties of big data: wide physical distribution, diversity of formats, nonstandard data models, and independently managed and heterogeneous semantics. In this context, there is a need for new scalable and process‐aware services for querying, exploration, and analysis of process data in the enterprise because (1) process data analysis services should be capable of processing and querying large amount of data effectively and efficiently and, therefore, have to be able to scale well with the infrastructure's scale and (2) the querying services need to enable users to express their data analysis and querying needs using process‐aware abstractions rather than other lower‐level abstractions. In this paper, we introduce ProcessAtlas, ie, an extensible large‐scale process data querying and analysis platform for analyzing process data in the enterprise. The ProcessAtlas platform offers an extensible architecture by adopting a service‐based model so that new analytical services can be plugged into the platform. In ProcessAtlas, we present a domain‐specific model for representing process knowledge, ie, process‐level entities, abstractions, and the relationships among them modeled as graphs. We provide services for discovering, extracting, and analyzing process data. We provide efficient mapping and execution of process‐level queries into graph‐level queries by using scalable process query services to deal with the process data size growth and with the infrastructure's scale. We have implemented ProcessAtlas as a MapReduce‐based prototype and report on experiments performed on both synthetic and real‐world datasets.  相似文献   

11.
Active XML (AXML) documents combine extensional XML data with intentional data defined through Web service calls. The dynamic properties of these documents pose challenges to both storage and data materialization techniques. In this paper, we present ARAXA, a non-intrusive approach to store and manage AXML documents. We also define a methodology to materialize AXML documents at query time. The storage approach of ARAXA is based on plain relational tables and user-defined functions of Object-Relational DBMS to trigger the service calls. By using a DBMS we benefit from efficient storage tools and query optimization. Approaches without DBMS support have to process XML in main memory or provide for virtual memory solutions. One of the main advantages of ARAXA is that AXML documents do not need to be loaded into main memory at query processing time. This is crucial when dealing with large documents. The experimental results with ARAXA prototype show that our approach is scalable and capable of dealing with large AXML documents.  相似文献   

12.
Data distribution management (DDM) is one of the services defined by the DoD High Level Architecture and is necessary to provide efficient, scalable mechanisms for distributing state updates and interaction information in large scale distributed simulations. In this paper, we focus on data distribution management mechanisms (also known as filtering) used for real time training simulations. We propose a new method of DDM, which we refer to as the dynamic grid-based approach. Our scheme is based on a combination of a fixed grid-based method, known for its scalability, and a region-based strategy, which provides greater accuracy than the fixed grid-based method. We describe our DDM algorithm, its implementation, and report on the performance results that we have obtained using the RTI-Kit framework. Our results clearly indicate that our scheme is scalable and that it reduces the message overhead by 40%, and the number of multicast groups used by 98% when compared to the fixed grid-based allocation scheme using 10 nodes, 1000 objects, and 20,000 grid cells.  相似文献   

13.
Microprocessor speed has been growing exponentially faster than memory system speed in the recent past. This paper explores the long term implications of this trend. We define scalable locality, which measures our ability to apply ever faster processors to increasingly large problems (just as scalable parallelism measures our ability to apply more numerous processors to larger problems). We provide an algorithm called time skewing that derives an execution order and storage mapping to produce any desired degree of locality, for certain programs that can be made to exhibit scalable locality. Our approach is unusual in that it derives the transformation from the algorithm's dataflow (a fundamental characteristic of the algorithm) instead of searching a space of transformations of the execution order and array layout used by the programmer (artifacts of the expression of the algorithm). We provide empirical results for data sets using L2 cache, main memory, and virtual memory.  相似文献   

14.
The reliability and scalability of large-scale network storage systems are confronted with big challenges, which require designing a reliable, scalable, and efficient data placement algorithm. Previous techniques can only partially satisfy these requirements. In this work, we develop an effective hybrid approach, RSEDP, which combines reliable replication data placement (RRDP) with scalable and efficient data placement (SEDP) to achieve the requirements mentioned above. RRDP distributes replicated data over large-scale heterogeneous network storage systems in which the same replica is distributed to different devices and not inclined to consecutive devices, achieving high redundancy degree and failure resilience. SEDP assigns data evenly among devices according to their weight and scales well to the expansions or curtailments of the systems. In order to take the advantages of both RRDP and SEDP, RSEDP integrates them by categorizing data into hot and cold data based on their access frequency, placing hot data by RRDP, and distributing the remainder by SEDP. The theoretical analysis and the experimental study show that the combined RSEDP can increase redundancy degree and failure resilience, and has a good scalability and time efficiency with small memory overhead.  相似文献   

15.
Advances in Cloud computing technology and the availability of affordable and easy to use Cloud services are enabling a multitude of scientific applications to use these resources as primary or secondary computing infrastructure. The urban and built environment research domain is one area that can benefit greatly from Cloud computing. The global population growth and increase in the size and population of cities raise many challenges for governments, planners and researchers alike. The Australian Urban Research Infrastructure Network (AURIN—http://www.aurin.org.au) project has been tasked with developing an advanced platform (e-Infrastructure) across Australia to tackle these challenges. The platform leverages large-scale Cloud resources to provide federated data access to, at present over 1100 data sets from major and often definitive government and industry data-rich organisations, and for scalable data processing and visualisation. The original AURIN tools were developed using the object modelling system (OMS) and supported integrated workflows to define and enact/re-enact scientific processes. More recently the work has evolved to focus more on delivery of a workbench offering a rich range of tools delivered through an extensible workflow environment. In this paper, we provide the background to AURIN including the scientific drivers that are shaping the work and the realisation of the Cloud-based AURIN environment. We focus in particular on the workflow environment and show how it seamlessly utilizes the Cloud for urban research processes focused especially on data-intensive spatial analysis. We illustrate the utilisation of this workflow environment across a range of case studies reflecting urban research activities.  相似文献   

16.
Classical approaches for remote visualization and collaboration used in Computer-Aided Design and Engineering (CAD/E) applications are no longer appropriate due to the increasing amount of data generated, especially using standard networks. We introduce a lightweight and computing platform for scientific simulation, collaboration in engineering, 3D visualization and big data management. This ICT based platform provides scientists an “easy-to-integrate” generic tool, thus enabling worldwide collaboration and remote processing for any kind of data. The service-oriented architecture is based on the cloud computing paradigm and relies on standard internet technologies to be efficient on a large panel of networks and clients. In this paper, we discuss the need of innovations in (i) pre and post processing visualization services, (ii) 3D large scientific data set scalable compression and transmission methods, (iii) collaborative virtual environments, and (iv) collaboration in multi-domains of CAD/E. We propose our open platform for collaborative simulation and scientific big data analysis. This platform is now available as an open project with all core components licensed under LGPL V2.1. We provide two examples of usage of the platform in CAD/E for sustainability engineering from one academic application and one industrial case study. Firstly, we consider chemical process engineering showing the development of a domain specific service. With the rise of global warming issues and with growing importance granted to sustainable development, chemical process engineering has turned to think more and more environmentally. Indeed, the chemical engineer has now taken into account not only the engineering and economic criteria of the process, but also its environmental and social performances. Secondly, an example of natural hazards management illustrates the efficiency of our approach for remote collaboration that involves big data exchange and analysis between distant locations. Finally we underline the platform benefits and we open our platform through next activities in innovation techniques and inventive design.  相似文献   

17.
Tomographic imaging and computer simulations are increasingly yielding massive datasets. Interactive and exploratory visualizations have rapidly become indispensable tools to study large volumetric imaging and simulation data. Our scalable isosurface visualization framework on commodity off-the-shelf clusters is an end-to-end parallel and progressive platform, from initial data access to the final display. Interactive browsing of extracted isosurfaces is made possible by using parallel isosurface extraction, and rendering in conjunction with a new specialized piece of image compositing hardware called Metabuffer. In this paper, we focus on the back end scalability by introducing a fully parallel and out-of-core isosurface extraction algorithm. It achieves scalability by using both parallel and out-of-core processing and parallel disks. It statically partitions the volume data to parallel disks with a balanced workload spectrum, and builds I/O-optimal external interval trees to minimize the number of I/O operations of loading large data from disk. We also describe an isosurface compression scheme that is efficient for progress extraction, transmission and storage of isosurfaces.  相似文献   

18.
针对当前源网荷储一体化调控面临的“新能源+规模化储能”一体化调控策略缺失、主动支撑能力不足以及多系统林立等共性问题,本文以全国首个“源网荷储”一体化项目为例,探讨一体化调控管理模式的应用。主要研究内容包括区域风光功率联合预测、电网友好型新能源电站的“网-源”协调关键机制、风光储场站协同优化控制保护、新能源快速一次调频和虚拟惯量支撑等技术创新研发,以及风光储场站一体化数据集控平台研发和“集控+直控”双调度模式创新。研究结果表明,一体化集控平台从可测、可控、可调、可支撑四个方面为源网荷储一体化运行提供数据支持,满足新能源电站一体化智慧控制、电网友好接入及风光储稳定运行的要求,实现了全景智慧、电网友好、风光储协同的目标。研究成果有助于加快推动源网荷储一体化建设,创新一体化调控管理模式,并为促进形成规范化的源网荷储一体化实施路径提供参考。  相似文献   

19.
20.
飞机系统日趋复杂化,使得需要处理的飞行数据规模越来越大,对数据读取速度的要求越来越高。本文设计了一个海量数据管理系统,该系统基于Hadoop分布式平台,采用的Linux集群技术使存储容量易于扩充,分布式计算框架可以达到高速处理数据的要求。本系统具有数据规模易扩展、处理速度快、安全性高、容易实现等特点,能够较好地满足飞行数据存储管理的要求。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号