共查询到20条相似文献,搜索用时 15 毫秒
1.
Previous research has shown that high levels of Facebook use are associated with lower grades in college students. Divided attention in the form of trying to use Facebook during class or while studying has been suggested as a possible explanation for this finding. In the current study, 44 participants were divided into high and low Facebook users and completed a memory test for 72 words. Participants were not allowed to use Facebook, or any other electronic device, during the study thereby eliminating divided attention between Facebook and the task at hand as a possible explanation for the results. High Facebook users (defined as spending more than one hour a day on Facebook) scored significantly lower on the free recall test than low Facebook users. Possible explanations for this finding are discussed. 相似文献
2.
Mehdi Sheikhalishahi Lucio Grandinetti Richard M. Wallace Jose Luis Vazquez‐Poletti 《Software》2015,45(2):161-175
The complexity of computing systems introduces a few issues and challenges such as poor performance and high energy consumption. In this paper, we first define and model resource contention metric for high performance computing workloads as a performance metric in scheduling algorithms and systems at the highest level of resource management stack to address the main issues in computing systems. Second, we propose a novel autonomic resource contention‐aware scheduling approach architected on various layers of the resource management stack. We establish the relationship between distributed resource management layers in order to optimize resource contention metric. The simulation results confirm the novelty of our approach.Copyright © 2013 John Wiley & Sons, Ltd. 相似文献
3.
随着能耗管理成为可靠和绿色计算的重要课题,能耗感知调度方法以其低成本和可行性引发关注.目前,网格环境下依赖任务的能耗感知调度研究具有极大的挑战性,其需要平衡应用的优先约束性、海量数据传输、系统的异构性和不同性能指标的冲突性的关系.提出的网格依赖任务的能耗有效调度(energy-efficient scheduling of grid dependent tasks, ESGDT)算法旨在优化应用执行时间的前提下降低应用执行能耗,能有效解决上述问题.通过任务复制和渐进比例因子减少通信时间和通信能耗,同时兼顾应用复杂的数据依赖关系;适应芯片微型化和多核技术的发展趋势,采用动态电源管理技术减少任务执行的静态能耗;任务复制条件、渐进比例因子和微调原则均适时兼顾时间和能耗两个相互冲突的调度指标,并提出自适应和动态映射方法适应异构计算环境.模拟实验表明,较HEFT,EETDS和HEADUS算法,ESGDT算法不仅没有影响调度的时间性能,还可进一步降低应用执行能耗. 相似文献
4.
Emanuele Manca Andrea Manconi Alessandro Orro Giuliano Armano Luciano Milanesi 《Concurrency and Computation》2016,28(1):21-43
Sorting is a very important task in computer science and becomes a critical operation for programs making heavy use of sorting algorithms. General‐purpose computing has been successfully used on Graphics Processing Units (GPUs) to parallelize some sorting algorithms. Two GPU‐based implementations of the quicksort were presented in literature: the GPU‐quicksort, a compute‐unified device architecture (CUDA) iterative implementation, and the CUDA dynamic parallel (CDP) quicksort, a recursive implementation provided by NVIDIA Corporation. We propose CUDA‐quicksort an iterative GPU‐based implementation of the sorting algorithm. CUDA‐quicksort has been designed starting from GPU‐quicksort. Unlike GPU‐quicksort, it uses atomic primitives to perform inter‐block communications while ensuring an optimized access to the GPU memory. Experiments performed on six sorting benchmark distributions show that CUDA‐quicksort is up to four times faster than GPU‐quicksort and up to three times faster than CDP‐quicksort. An in‐depth analysis of the performance between CUDA‐quicksort and GPU‐quicksort shows that the main improvement is related to the optimized GPU memory access rather than to the use of atomic primitives. Moreover, in order to assess the advantages of using the CUDA dynamic parallelism, we implemented a recursive version of the CUDA‐quicksort. Experimental results show that CUDA‐quicksort is faster than the CDP‐quicksort provided by NVIDIA, with better performance achieved using the iterative implementation. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献
5.
Reimer Behrends Kevin Hammond Vladimir Janjic Alexander Konovalov Steve Linton Hans‐Wolfgang Loidl Patrick Maier Phil Trinder 《Concurrency and Computation》2016,28(13):3606-3636
Symbolic computation has underpinned a number of key advances in Mathematics and Computer Science. Applications are typically large and potentially highly parallel, making them good candidates for parallel execution at a variety of scales from multi‐core to high‐performance computing systems. However, much existing work on parallel computing is based around numeric rather than symbolic computations. In particular, symbolic computing presents particular problems in terms of varying granularity and irregular task sizes that do not match conventional approaches to parallelisation. It also presents problems in terms of the structure of the algorithms and data. This paper describes a new implementation of the free open‐source GAP computational algebra system that places parallelism at the heart of the design, dealing with the key scalability and cross‐platform portability problems. We provide three system layers that deal with the three most important classes of hardware: individual shared memory multi‐core nodes, mid‐scale distributed clusters of (multi‐core) nodes and full‐blown high‐performance computing systems, comprising large‐scale tightly connected networks of multi‐core nodes. This requires us to develop new cross‐layer programming abstractions in the form of new domain‐specific skeletons that allow us to seamlessly target different hardware levels. Our results show that, using our approach, we can achieve good scalability and speedups for two realistic exemplars, on high‐performance systems comprising up to 32000 cores, as well as on ubiquitous multi‐core systems and distributed clusters. The work reported here paves the way towards full‐scale exploitation of symbolic computation by high‐performance computing systems, and we demonstrate the potential with two major case studies. © 2016 The Authors. Concurrency and Computation: Practice and Experience Published by John Wiley & Sons Ltd. 相似文献
6.
There has been an increasing research interest in extending the use of Java towards high‐performance demanding applications such as scalable Web servers, distributed multimedia applications, and large‐scale scientific applications. However, extending Java to a multicomputer environment and improving the low performance of current Java implementations pose great challenges to both the systems developer and application designer. In this survey, we describe and classify 14 relevant proposals and environments that tackle Java's performance bottlenecks in order to make the language an effective option for high‐performance network‐based computing. We further survey significant performance issues while exposing the potential benefits and limitations of current solutions in such a way that a framework for future research efforts can be established. Most of the proposed solutions can be classified according to some combination of three basic parameters: the model adopted for inter‐process communication, language extensions, and the implementation strategy. In addition, where appropriate to each individual proposal, we examine other relevant issues, such as interoperability, portability, and garbage collection. Copyright © 2002 John Wiley & Sons, Ltd. 相似文献
7.
曙光4000H生物信息处理专用计算机的高性能算法研究 总被引:2,自引:1,他引:1
曙光4000H生物信息处理专用计算机基于现代计算机体系结构和可重构计算器件,分别通过I/O延迟隐藏、细粒度并行以及并行流水等技巧优化设计了BLAST、动态规划、RNA二级结构预测等有代表性的3类算法,并开发C模拟器进行了性能评价,结果表明,上述算法大幅度提高了计算机处理能力. 相似文献
8.
Enric Tejedor Montse Farreras David Grove Rosa M. Badia Gheorghe Almasi Jesus Labarta 《Concurrency and Computation》2012,24(18):2421-2448
Programming for large‐scale, multicore‐based architectures requires adequate tools that offer ease of programming and do not hinder application performance. StarSs is a family of parallel programming models based on automatic function‐level parallelism that targets productivity. StarSs deploys a data‐flow model: it analyzes dependencies between tasks and manages their execution, exploiting their concurrency as much as possible. This paper introduces Cluster Superscalar (ClusterSs), a new StarSs member designed to execute on clusters of SMPs (Symmetric Multiprocessors). ClusterSs tasks are asynchronously created and assigned to the available resources with the support of the IBM APGAS runtime, which provides an efficient and portable communication layer based on one‐sided communication. We present the design of ClusterSs on top of APGAS, as well as the programming model and execution runtime for Java applications. Finally, we evaluate the productivity of ClusterSs, both in terms of programmability and performance and compare it to that of the IBM X10 language. Copyright © 2012 John Wiley & Sons, Ltd. 相似文献
9.
为了充分整合分布的高性能计算资源,本文提出一种面向科学计算的网格环境,旨在形成一个可统一管理和运行维护的虚拟的超级计算机资源,面向用户提供统一、易用、可靠的科学计算服务。面向科学计算的网格环境通过轻量级网格中间件SCE汇聚资源,支持作业的全局调度、数据的统一管理视图,面向用户提供命令行和网格门户两种使用方式,并提供编程接口供专业社区和学科平台二次开发使用,满足不同层次的用户需求。目前,面向科学计算的网格环境已经在中国科学院超级计算环境(ScGrid)中得到应用和用户认可。 相似文献
10.
Pawe Rociszewski Pawe Czarnul Rafa Lewandowski Marcel Schally‐Kacprzak 《Concurrency and Computation》2016,28(9):2586-2607
The paper presents a new open‐source framework called KernelHive for multilevel parallelization of computations among various clusters, cluster nodes, and finally, among both CPUs and GPUs for a particular application. An application is modeled as an acyclic directed graph with a possibility to run nodes in parallel and automatic expansion of nodes (called node unrolling) depending on the number of computation units available. A methodology is proposed for parallelization and mapping of an application to the environment that includes selection of devices using a chosen optimizer, selection of best grid configurations for compute devices, optimization of data partitioning and the execution. One of possibly many scheduling algorithms can be selected considering execution time, power consumption, and so on. An easy‐to‐use GUI is provided for modeling and monitoring with a repository of ready‐to‐use constructs and computational kernels. The methodology, execution times, and scalability have been demonstrated for a distributed and parallel password‐breaking example run in a heterogeneous environment with a cluster and servers with different numbers of nodes and both CPUs and GPUs. Additionally, performance of the framework has been compared with an MPI + OpenCL implementation using a parallel geospatial interpolation application employing up to 40 cluster nodes and 320 cores. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献
11.
一种分布式环境下的新型高性能计算平台 总被引:3,自引:0,他引:3
分析了采用志愿机模式的网络计算平台的特点,针对其不足.提出了一种分布式环境下的新型高性能计算平台NH-PCP(A Novel High Performance Computing Platform in Distributed Environment).该平台具有可扩展性和容错能力,采用对象串行化技术能够使应用程序跨平台运行.NHPCP具有友好的用户界面,提供一套简单易用的API(Application Programming Interface)函数调用,任何大计算量的、可以分解成独立计算子任务的应用都可以方便地利用该平台运算.基于该平台具体实现了两个典型的并行应用实例,通过对实验结果的分析,总结了适合于本计算平台的并行应用的特点. 相似文献
12.
For many parallel applications, I/O performance is a major bottleneck. MPI‐IO, defined by the MPI forum, can help parallel applications overcome the performance and portability limitations of existing parallel I/O interfaces. Although autotuning has been used to improve the performance of computing kernels, MPI‐IO autotuning has rarely been studied. To automate MPI‐IO performance tuning, we designed and implemented an automatic tuner. The tuner relies on the Periscope tuning framework for transparently passing hints to the MPI‐IO library and for automatically collecting performance data. Unlike computational code, each MPI‐IO function takes a relatively long time to complete. Thus, exhaustively searching through the entire parameter space is impractical. So we developed a performance model that can direct us to shorten the tuning time. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献
13.
14.
Grid computing technologies are now being largely deployed with the widespread adoption of the Globus Toolkit as the industrial standard Grid middleware. However, its inherent steep learning curve discourages the use of these technologies for non‐experts. Therefore, to increase the use of Grid computing, it is important to have high‐level tools that simplify the process of remote task execution. In this paper we introduce a middleware, developed on top of the Java Commodity Grid, which offers an object‐oriented, user‐friendly application programming interface, from the Java language, which eases remote task execution for computationally intensive applications. Copyright © 2006 John Wiley & Sons, Ltd. 相似文献
15.
With the advancement of new processor and memory architectures, supercomputers of multicore and multinode architectures have become general tools for large‐scale engineering and scientific simulations. However, the nonuniform latencies between intranode and internode communications on these machines introduce new challenges that need to be addressed in order to achieve optimal performance. In this paper, a novel hybrid solver that is especially designed for supercomputers of multicore and multinode architectures is proposed. The new hybrid solver is characterized by its two‐level parallel computing approach on the basis of the strategies of two‐level partitioning and two‐level condensation. It distinguishes intranode and internode communications to minimize the communication overheads. Moreover, it further reduces the size of interface equation system to improve its convergence rate. Three numerical experiments of structural linear static analysis were conducted on DAWNING‐5000A supercomputer to demonstrate the validity and efficiency of the proposed method. Test results show that the proposed approach was superior in performance compared with the conventional Schur complement method. Copyright © 2014 John Wiley & Sons, Ltd. 相似文献
16.
Bryant C. Lam Alan D. George Herman Lam Vikas Aggarwal 《Concurrency and Computation》2015,27(17):5288-5310
Diminishing returns from increased clock frequencies and instruction‐level parallelism have forced computer architects to adopt architectures that exploit wider parallelism through multiple processor cores. While emerging many‐core architectures have progressed at a remarkable rate, concerns arise regarding the performance and productivity of numerous parallel‐programming tools for application development. Development of parallel applications on many‐core processors often requires developers to familiarize themselves with unique characteristics of a target platform while attempting to maximize performance and maintain correctness of their applications. The family of partitioned global address space (PGAS) programming models comprises the current state of the art in balancing performance and programmability. One such PGAS approach is SHMEM, a lightweight, shared‐memory programming library that has demonstrated high performance and productivity potential for parallel‐computing systems with distributed‐memory architectures. In the paper, we present research, design, and analysis of a new SHMEM infrastructure specifically crafted for low‐level PGAS on modern and emerging many‐core processors featuring dozens of cores and more. Our approach (with a new library known as TSHMEM) is investigated and evaluated atop two generations of Tilera architectures, which are among the most sophisticated and scalable many‐core processors to date, and is intended to enable similar libraries atop other architectures now emerging. In developing TSHMEM, we explore design decisions and their impact on parallel performance for the Tilera TILE‐Gx and TILEPro many‐core architectures, and then evaluate the designs and algorithms within TSHMEM through microbenchmarking and applications studies with other communication libraries. Our results with barrier primitives provided by the Tilera libraries show dissimilar performance between the TILE‐Gx and TILEPro; therefore, TSHMEM's barrier design takes an alternative approach and leverages the on‐chip mesh network to provide consistent low‐latency performance. In addition, our experiments with TSHMEM show that naive collective algorithms consistently outperformed linear distributed collective algorithms when executed in an SMP‐centric environment. In leveraging these insights for the design of TSHMEM, our approach outperforms the OpenSHMEM reference implementation, achieves similar to positive performance over OpenMP and OSHMPI atop MPICH, and supports similar libraries in delivering high‐performance parallel computing to emerging many‐core systems. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献
17.
We present here a performance analysis of three of current architectures that have become commonplace in the High Performance Computing world. Blue Gene/Q is the third generation of systems from IBM that use modestly performing cores but at large-scale in order to achieve high performance. The XE6 is the latest in a long line of Cray systems that use a 3-D topology but the first to use its Gemini interconnection network. InfiniBand provides the flexibility of using compute nodes from many vendors that can be connected in many possible topologies. The performance characteristics of each vary vastly, and the way in which nodes are allocated in each type of system can significantly impact on achieved performance. In this work we compare these three systems using a combination of micro-benchmarks and a set of production applications. In addition we also examine the differences in performance variability observed on each system and quantify the lost performance using a combination of both empirical measurements and performance models. Our results show that significant performance can be lost in normal production operation of the Cray XE6 and InfiniBand Clusters in comparison to Blue Gene/Q. 相似文献
18.
针对中国超级计算机的发展取得世界瞩目的成绩,但硬件高性能与应用低水平的矛盾比较突出的问题,从计算技术的发展趋势入手,根据千万亿次计算的物理学特点,分析我国面对的历史性机遇,建议充分发挥我国超级计算平台的基础性优势,以突破市场上国外CAE软件在计算规模、分辨率和精度方面的局限以及技术封锁为出发点,发展以高可扩展、可容错的... 相似文献
19.
网络环境下高性能计算的可视化 总被引:8,自引:0,他引:8
本文介绍了在“先进计算基础设施”环境中实现科学计算可视化的方法 .该环境对可视化提出了较高的要求 :一方面 ,客户是通过 Internet观察这些结果的 ,没有事先安装任何软件 ;另一方面 ,高性能计算结果的可视化数据规模又非常大 ,必须用动态三维图像展现 .因此 ,传统的可视化手段难以同时满足这两个目标 .本文基于 Java的 Java 3D建立了三维可视化环境 ,实现了从 Internet自动安装用户端运行环境 ,并能通过传输控制三维模型的代码 ,在用户端快速生成可视化图像 ;同时 ,结合 Applet与 Servlet交互技术 ,在用户观察当前可视化结果的同时 ,分解传输有效数据来控制三维模型后续的运动 ,使跨网可视化的动态效果接近于本地运行的水平 . 相似文献
20.
在科学计算和大数据处理应用需求的推动下,高性能计算机的性能不断提升、系统规模日益扩大,系统功耗越来越成为制约能力提升的重要瓶颈.在深入分析现有4类高性能计算机的基础上探讨了2项关键技术:1)可重构微服务器(reconfigurable micro server,RMS)技术.解决单个计算节点在领域应用加速能力、系统功耗和体积间的平衡兼顾问题.2)自治与分治相结合的集群构造技术.解决基于微小型化计算节点的大规模计算平台构造与扩展性问题.在此基础上,提出了一种新型的高效多用计算平台架构——“蚁群”,构建了包含2 048个低功耗、微小型化RMS计算节点的蚁群平台原型系统,并实现了大规模指纹实时比对和多RMS节点协同排序2个典型应用.测试表明,单个RMS节点的指纹比对性能是Xeon单核的34倍,功耗仅5W,整个原型系统可实现千万量级指纹库的数百并发实时查询;蚁群平台的数据排序性能功耗比是GPU平台的10倍以上,有效提升数据排序的效率. 相似文献