首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
This special issue and our editorial celebrate 10 years of progress with data-intensive or scientific workflows. There have been very substantial advances in the representation of workflows and in the engineering of workflow management systems (WMS). The creation and refinement stages are now well supported, with a significant improvement in usability. Improved abstraction supports cross-fertilisation between different workflow communities and consistent interpretation as WMS evolve. Through such re-engineering the WMS deliver much improved performance, significantly increased scale and sophisticated reliability mechanisms. Further improvement is anticipated from substantial advances in optimisation. We invited papers from those who have delivered these advances and selected 14 to represent today’s achievements and representative plans for future progress. This editorial introduces those contributions with an overview and categorisation of the papers. Furthermore, it elucidates responses from a survey of major workflow systems, which provides evidence of substantial progress and a structured index of related papers. We conclude with suggestions on areas where further research and development is needed and offer a vision of future research directions.  相似文献   

Visualization workflows are important services for expert users to analyze watersheds when using our HydroTerre end-to-end workflows. Analysis is an interactive and iterative process and we demonstrate that the expert user can focus on model results, not data preparation, by using a web application to rapidly create, tune, and calibrate hydrological models anywhere in the continental USA (CONUS). The HydroTerre system captures user interaction for provenance and reproducibility to share modeling strategies with modelers. Our end-to-end workflow consists of four workflows. The first is data workflows using Essential Terrestrial Variables (ETV) data sets that we demonstrated to construct watershed models anywhere in the CONUS (Leonard and Duffy, 2013). The second is data-model workflows that transform the data workflow results to model inputs. The model inputs are consumed in the third workflow, model workflows (Leonard and Duffy, 2014a) that handle distribution of data and model within High Performance Computing (HPC) environments. This article focuses on our fourth workflow, visualization workflows, which consume the first three workflows to form an end-to-end system to create and share hydrological model results efficiently for analysis and peer review. We show how visualization workflows are incorporated into the HydroTerre infrastructure design and demonstrate the efficiency and robustness for an expert modeler to produce, analyze, and share new hydrological models using CONUS national datasets.  相似文献   

The paper presents a platform for distributed computing, developed using the latest software technologies and computing paradigms to enable big data mining. The platform, called ClowdFlows, is implemented as a cloud-based web application with a graphical user interface which supports the construction and execution of data mining workflows, including web services used as workflow components. As a web application, the ClowdFlows platform poses no software requirements and can be used from any modern browser, including mobile devices. The constructed workflows can be declared either as private or public, which enables sharing the developed solutions, data and results on the web and in scientific publications. The server-side software of ClowdFlows can be multiplied and distributed to any number of computing nodes. From a developer’s perspective the platform is easy to extend and supports distributed development with packages. The paper focuses on big data processing in the batch and real-time processing mode. Big data analytics is provided through several algorithms, including novel ensemble techniques, implemented using the map-reduce paradigm and a special stream mining module for continuous parallel workflow execution. The batch mode and real-time processing mode are demonstrated with practical use cases. Performance analysis shows the benefit of using all available data for learning in distributed mode compared to using only subsets of data in non-distributed mode. The ability of ClowdFlows to handle big data sets and its nearly perfect linear speedup is demonstrated.  相似文献   

Computational workflows are a powerful paradigm to represent and manage complex applications, particularly in large-scale distributed scientific data analysis. Workflows represent application components that result in individual computations as well as their interdependences in terms of dataflow. Workflow systems use these representations to manage various aspects of workflow creation and execution for users, such as the automatic assignment of execution resources. This article describes an approach to automating a new aspect of the process: the selection of application components and data sources. We present a novel approach that enables users to specify varying degrees of detail and amount of constraints in a workflow request, including the specification of constraints on input, intermediate or output data in the workflow, abstract workflow component classes rather than specific component implementations, and generic reusable workflow templates that express a pre-defined combination of components. The algorithm elaborates the user request into a set of fully ground workflows with specific choices of data sources and codes to be used so that they can be submitted for mapping and execution. The algorithm searches through the space of possible candidate workflows by creating increasingly more specialized versions of the original template and eliminating candidates that violate constraints cumulated in the candidate workflow as components and data sources are selected. A novel feature of our approach is that it assumes a distributed architecture where data and component catalogues are separate from the workflow system. The algorithm explicitly poses queries to external catalogues, and therefore any reasoning regarding data or component properties is not assumed to occur within the workflow system. We describe our implementation of this approach in the Wings workflow system. This implementation uses the W3C Web Ontology Language and associated reasoners to implement the workflow system as well as the data and component catalogues. This research demonstrates the use of artificial intelligence techniques to support the kinds of automation envisioned by the scientific community for large-scale distributed scientific data analysis.  相似文献   

Workflow technology continues to play an important role as a means for specifying and enacting computational experiments in modern science. Reusing and re-purposing workflows allow scientists to do new experiments faster, since the workflows capture useful expertise from others. As workflow libraries grow, scientists face the challenge of finding workflows appropriate for their task, understanding what each workflow does, and reusing relevant portions of a given workflow. We believe that workflows would be easier to understand and reuse if high-level views (abstractions) of their activities were available in workflow libraries. As a first step towards obtaining these abstractions, we report in this paper on the results of a manual analysis performed over a set of real-world scientific workflows from Taverna, Wings, Galaxy and Vistrails. Our analysis has resulted in a set of scientific workflow motifs that outline (i) the kinds of data-intensive activities that are observed in workflows (Data-Operation motifs), and (ii) the different manners in which activities are implemented within workflows (Workflow-Oriented motifs). These motifs are helpful to identify the functionality of the steps in a given workflow, to develop best practices for workflow design, and to develop approaches for automated generation of workflow abstractions.  相似文献   

Simulation has become a commonly employed first step in evaluating novel approaches towards resource allocation and task scheduling on distributed architectures. However, existing simulators fall short in their modeling of the instability common to shared computational infrastructure, such as public clouds. In this work, we present DynamicCloudSim which extends the popular simulation toolkit CloudSim with several factors of instability, including inhomogeneity and dynamic changes of performance at runtime as well as failures during task execution. As a validation of the introduced functionality, we simulate the impact of instability on scientific workflow scheduling by assessing and comparing the performance of four schedulers in the course of several experiments both in simulation and on real cloud infrastructure. Results indicate that our model seems to adequately capture the most important aspects of cloud performance instability. The source code of DynamicCloudSim and the examined schedulers is available at https://code.google.com/p/dynamiccloudsim/.  相似文献   

In this article we present our recent efforts in designing a comprehensive consistent scientific workflow, nicknamed Wolf2 Pack, for force-field optimization in the field of computational chemistry. Atomistic force fields represent a multiscale bridge that connects high-resolution quantum mechanics knowledge to coarser molecular mechanics-based models. Force-field optimization has so far been a time-consuming and error-prone process, and is a topic where the use of a scientific workflow can provide obvious great benefits. As a case study we generate a gas-phase force field for methanol using Wolf2 Pack, with special attention given toward deriving partial atomic charges.  相似文献   

Over the past decade process mining has emerged as a new analytical discipline able to answer a variety of questions based on event data. Event logs have a very particular structure; events have timestamps, refer to activities and resources, and need to be correlated to form process instances. Process mining results tend to be very different from classical data mining results, e.g., process discovery may yield end-to-end process models capturing different perspectives rather than decision trees or frequent patterns. A process-mining tool like ProM provides hundreds of different process mining techniques ranging from discovery and conformance checking to filtering and prediction. Typically, a combination of techniques is needed and, for every step, there are different techniques that may be very sensitive to parameter settings. Moreover, event logs may be huge and may need to be decomposed and distributed for analysis. These aspects make it very cumbersome to analyze event logs manually. Process mining should be repeatable and automated. Therefore, we propose a framework to support the analysis of process mining workflows. Existing scientific workflow systems and data mining tools are not tailored towards process mining and the artifacts used for analysis (process models and event logs). This paper structures the basic building blocks needed for process mining and describes various analysis scenarios. Based on these requirements we implemented RapidProM, a tool supporting scientific workflows for process mining. Examples illustrating the different scenarios are provided to show the feasibility of the approach.  相似文献   

Commercial applications for the arts tend to enforce a division between the use of learnable direct manipulation interfaces and the use of powerful, well supported programming environments. In contrast, programmable applications integrate these two software-design paradigms (i.e. direct manipulation and programming languages) and thereby attempt to exploit the strengths of both. A sample graphics application, SchemePaint, is outlined, and some of the issues related to the creation of programmable applications for the arts are discussed.  相似文献   

Schools have long fallen short in helping students to develop the skills necessary to engage in scientific inquiry. Emerging technology-based programs can potentially address this shortfall, but the field lacks clear models of instructional materials capable of doing so. The VELscience project seeks to provide a model for one type of software (virtual environments for learning, or VELs) designed to engage students in student-directed inquiry. In student-directed inquiry, students are given a topic or task, then pose questions, design the investigation, collect and analyze data, draw conclusions and publish their findings. This study examined the effectiveness of this model through observations of middle school students who completed Hurricane Hal, a VEL in which students determine the ecological impact of a natural disaster on a wetlands ecosystem. The results suggest that this program successfully engaged students in student-directed inquiry.  相似文献   

We investigate interoperability aspects of scientific workflow systems and argue that the workflow execution environment, the model of computation (MoC), and the workflow language form three dimensions that must be considered depending on the type of interoperability sought: at the activity, sub-workflow, or workflow levels. With a focus on the problems that affect interoperability, we illustrate how these issues are tackled by current scientific workflows as well as how similar problems have been addressed in related areas. Our long-term objective is to achieve (logical) interoperability between workflow systems operating under different MoCs, using distinct language features, and sharing activities running on different execution environments.  相似文献   

The complexity and diversity of data and processing now being used in geoscientific data analysis are increasing at such a pace that it is difficult to maintain a single software package capable of performing all required tasks. A typical solution is to use a number of commercial software packages loosely linked via file format conversion routines or other customized communication paths. Creating and maintaining these paths is made difficult by both the number of distinct packages being used and the evolutionary changes taking place within each package. In this paper we attempt to create a generic model for a geological data processing task, identify some of the difficulties involved in performing these tasks using current technology, and outline a computational architecture that may correct many of the deficiencies of existing technology. Our goal is not to provide a detailed design but to present some key issues and possible solutions as a foundation for discussion and investigation.  相似文献   

Ubiquitous manufacturing (UM) features a “design anywhere, make anywhere, sell anywhere, and at any time” paradigm that grants factories an unlimited production capacity and permanent manufacturing service availability. However, the research and applications of UM have been limited thus far to in-factory operations or logistics. For this reason, this study reviews the current practices of UM, discusses the challenges faced by researchers and practitioners, and determines potential opportunities for UM in the near future. Finally, we conclude that the success of UM depends on the quality of the manufacturing services deployed, and that UM is a realizable target for Industry 4.0.  相似文献   

Computer architecture: challenges and opportunities for the next decade   总被引:2,自引:0,他引:2  
Agerwala  T. Chatterjee  S. 《Micro, IEEE》2005,25(3):58-69
Computer architecture forms the bridge between application needs and the capabilities of the underlying technologies. As application demands change and technologies cross various thresholds, computer architects must continue innovating to produce systems that can deliver needed performance and cost effectiveness. Our challenge as computer architects is to deliver end-to-end performance growth at historical levels in the presence of technology discontinuities. We can address this challenge by focusing on power optimization at all levels. Key levers are the development of power-optimized building blocks, deployment of chip-level multiprocessors, increasing use of accelerators and offload engines, widespread use of scale-out systems, and system-level power optimization.  相似文献   

Sophisticated symbol processing in connectionist systems can be supported by two primitive representational techniques calledRelative-Position Encoding (RPE) andPattern-Similarity Association (PSA), and a selection technique calledTemporal-Winner-Take-All (TWTA). TWTA effects winner-take-all selection on the basis of fine signal-timing differences as opposed to activation-level differences. Both RPE and PSA are for the encoding of highly temporary associations between representations. RPE is based on the way activation patterns are positioned relative to each other within a network. Under PSA, two patterns are temporarily associated if they have (suitable) subpatterns that are (suitably) similar. The article shows how particular versions of the primitives are used to good effect in a system called Conposit/SYLL. This is a connectionist implementation of a slightly simplified version of a complex existing psychological theory, namely Johnson-Laird's account of syllogistic reasoning. The computational processes in this theory present a major implementational challenge to connectionism. The challenge lies in the mutability, multiplicity, and diversity of the working memory structures, and the elaborateness of the processing needed for them. Conposit/SYLL's techniques allow it to meet the challenge. The implementation of symbolic processing in Conposit/SYLL is an interesting application of connectionism partly because it significantly affects the design of the symbolic processing level itself. In particular, it encourages the use of associative as opposed to pointer-based data structures, and the use of random as opposed to ordered iteration over sets of data structures. In addition, the article discusses Conposit/SYLL's somewhat unusual variable-binding approach.  相似文献   

We investigate sparse non-linear denoising of functional brain images by kernel principal component analysis (kernel PCA). The main challenge is the mapping of denoised feature space points back into input space, also referred to as “the pre-image problem”. Since the feature space mapping is typically not bijective, pre-image estimation is inherently illposed. In many applications, including functional magnetic resonance imaging (fMRI) data which is the application used for illustration in the present work, it is of interest to denoise a sparse signal. To meet this objective we investigate sparse pre-image reconstruction by Lasso regularization. We find that sparse estimation provides better brain state decoding accuracy and a more reproducible pre-image. These two important metrics are combined in an evaluation framework which allow us to optimize both the degree of sparsity and the non-linearity of the kernel embedding. The latter result provides evidence of signal manifold non-linearity in the specific fMRI case study.  相似文献   

Recommender systems have been widely used in different application domains including energy-preservation, e-commerce, healthcare, social media, etc. Such applications require the analysis and mining of massive amounts of various types of user data, including demographics, preferences, social interactions, etc. in order to develop accurate and precise recommender systems. Such datasets often include sensitive information, yet most recommender systems are focusing on the models’ accuracy and ignore issues related to security and the users’ privacy. Despite the efforts to overcome these problems using different risk reduction techniques, none of them has been completely successful in ensuring cryptographic security and protection of the users’ private information. To bridge this gap, the blockchain technology is presented as a promising strategy to promote security and privacy preservation in recommender systems, not only because of its security and privacy salient features, but also due to its resilience, adaptability, fault tolerance and trust characteristics. This paper presents a holistic review of blockchain-based recommender systems covering challenges, open issues and solutions. Accordingly, a well-designed taxonomy is introduced to describe the security and privacy challenges, overview existing frameworks and discuss their applications and benefits when using blockchain before indicating opportunities for future research.  相似文献   

Internet technologies have a great potential for changing fundamentally the banks and the banking industry. The opportunities, which the e-banking services and technologies offer to the banking sector in order to fulfil existing customer needs and to attract new prospective customers, are the driving forces for banks in order to design, develop and operate their own e-banking systems. This paper examines the challenges and opportunities of e-banking for the Greek banking sector, during the e-commerce era, and also presents the results of a survey of banking executives working at banks offering e-banking services. The main findings demonstrate that banks expand to e-banking services in order to remain competitive, to keep track with technological developments and to benefit from the lower cost of e-banking transactions. The major problems they face are the low response rate from customers and the implementation of security and data protection mechanisms. The relatively low Internet usage, the non-familiarity with technologically advanced devices and problems regarding security and privacy are the main factors that have a negative influence on the adoption of e-banking services by customers in Greece.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号