期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

张媛于冠龙卢泽新刘亚萍《计算机工程与应用》2009,45(35):65-69

传统的网络文件系统难以满足高性能计算系统的I/O 需求,并行网络文件系统——PNFS可以有效地解决传统网络文件系统在可扩展性、可用性和性能上存在的问题。首先对PNFS的体系结构进行了设计,实现了元数据服务器与存储服务器的分离,消除了由于集中服务器结构引发的I/O瓶颈问题。然后,对PNFS的原型系统进行了性能测试,并与相同环境下NFS的测试结果进行比较与分析,结果表明PNFS能够为客户端提供并行访问文件数据的能力,有着较高的I/O读写带宽和较低的访问延迟,同时实现了客户端I/O带宽与存储服务器规模之间的线性可扩展关系,能较好地满足高性能计算中的I/O需求。相似文献

2.

Concurrent warp execution: improving performance of GPU-likely SIMD architecture by increasing resource utilization

Hong Jun Choi Dong Oh Son Jong Myon Kim Cheol Hong Kim 《The Journal of supercomputing》2014,69(1):330-356

Hardware parallelism should be exploited to improve the performance of computing systems. Single instruction multiple data (SIMD) architecture has been widely used to maximize the throughput of computing systems by exploiting hardware parallelism. Unfortunately, branch divergence due to branch instructions causes underutilization of computational resources, resulting in performance degradation of SIMD architecture. Graphics processing unit (GPU) is a representative parallel architecture based on SIMD architecture. In recent computing systems, GPUs can process general-purpose applications as well as graphics applications with the help of convenient APIs. However, contrary to graphics applications, general-purpose applications include many branch instructions, resulting in serious performance degradation of GPU due to branch divergence. In this paper, we propose concurrent warp execution (CWE) technique to reduce the performance degradation of GPU in executing general-purpose applications by increasing resource utilization. The proposed CWE enables selecting co-warps to activate more threads in the warp, leading to concurrent execution of combined warps. According to our simulation results, the proposed architecture provides a significant performance improvement (5.85 % over PDOM, 91 % over DWF) with little hardware overhead. 相似文献

3.

QMDS: a file system metadata management service supporting a graph data model-based query language

《International Journal of Parallel, Emergent and Distributed Systems》2013,28(2):159-183

File system metadata management has become a bottleneck for many data-intensive applications that rely on high-performance file systems. Part of the bottleneck is due to the limitations of an almost 50-year-old interface standard with metadata abstractions that were designed at a time when high-end file systems managed less than 100 MB. Today's high-performance file systems store 7–9 orders of magnitude more data, resulting in a number of data items for which these metadata abstractions are inadequate, such as directory hierarchies unable to handle complex relationships among data. Users of file systems have attempted to work around these inadequacies by moving application-specific metadata management to relational databases to make metadata searchable. Splitting file system metadata management into two separate systems introduces inefficiencies and systems management problems. To address this problem, we propose QMDS: a file system metadata management service that integrates all file system metadata and uses a graph data model with attributes on nodes and edges. Our service uses a query language interface for file identification and attribute retrieval. We present our metadata management service design and architecture and study its performance using a text analysis benchmark application. Results from our QMDS prototype show the effectiveness of this approach. Compared to the use of a file system and relational database, the QMDS prototype shows superior performance for both ingest and query workloads. 相似文献

4.

基于粒子群优化粒子滤波和CUDA加速的故障诊断方法

曹洁李钊王进花余萍《计算机应用与软件》2020,37(4):240-246,251

在非线性系统中,粒子滤波需要大量粒子才能保证状态估计的准确度,这降低了算法的实时性,导致故障诊断的准确率和实时性不佳。针对该问题,提出基于GPU平台的粒子群优化粒子滤波(PSOPF)并行算法。通过分析PSOPF算法的并行性,设计并实现一种基于CUDA并行计算架构的PSOPF并行算法,利用大量的GPU线程对算法进行加速。为解决拒绝重采样对GPU全局内存的非合并访问带来的执行效率低问题,通过改进拒绝重采样并行算法,使线程束中的线程对同一内存区段中的粒子进行重采样,提高了其执行效率。通过对风力机组变桨距系统故障诊断验证了算法的有效性。实验结果表明,该方法可满足故障诊断准确率和实时性的要求。相似文献

5.

ChattyGraph:面向异构多协处理器的高可扩展图计算系统

蒋筱斌熊轶翔张珩武延军赵琛《软件学报》2023,34(4):1977-1996

现阶段,随着数据规模扩大化和结构多样化的趋势日益凸现,如何利用现代链路内链的异构多协处理器为大规模数据处理提供实时、可靠的并行运行时环境,已经成为高性能以及数据库领域的研究热点.利用多协处理器(GPU)设备的现代服务器(multi-GPU server)硬件架构环境,已经成为分析大规模、非规则性图数据的首选高性能平台.现有研究工作基于Multi-GPU服务器架构设计的图计算系统和算法(如广度优先遍历和最短路径算法),整体性能已显著优于多核CPU计算环境.然而,这类图计算系统中,多GPU协处理器间的图分块数据传输性能受限于PCI-E总线带宽和局部延迟,导致通过增加GPU设备数量无法达到整体系统性能的类线性增长趋势,甚至会出现严重的时延抖动,进而已无法满足大规模图并行计算系统的高可扩展性要求.经过一系列基准实验验证发现,现有系统存在如下两类缺陷:(1)现代GPU设备间数据通路的硬件架构发展日益更新(如NVLink-V1,NVLink-V2),其链路带宽和延迟得到大幅改进,然而现有系统受限于PCI-E总线进行数据分块通信,无法充分利用现代GPU链路资源(包括链路拓扑、连通性和路由);(2)在... 相似文献

6.

CPU+MIC异构系统中动态请求处理模型研究

尤国华刘媛高东《计算机应用研究》2020,37(12):3667-3670

为满足日益增加的服务器端的计算需求,更多的协处理器（如GPU和MIC）成为服务器端的新成员,参与服务器端计算,但是传统的服务器端软件（如Web服务器软件等）不能充分发挥协处理器的性能。为充分利用MIC的性能,提升单台Web服务器的服务质量,针对CPU+MIC的异构硬件体系提出了一种新的动态请求处理模型。该模型基于事件驱动模型和线程池模型,可将部分动态请求调度至MIC执行,并行处理动态请求,兼顾了CPU和MIC间的负载均衡。仿真实验表明,该模型在平均响应时间、吞吐量和99%响应时间等方面均优于现有的Web服务器软件模型。相似文献

7.

Systolic neighborhood search on graphics processing units

Pablo Vidal Francisco Luna Enrique Alba 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2014,18(1):125-142

In this paper, we propose a parallel processing model based on systolic computing merged with concepts of evolutionary algorithms. The proposed model works over a Graphics Processing Unit using the structure of threads as cells that form a systolic mesh. Data passes through those cells, each one performing a simple computing operation. The systolic algorithm is implemented using NVIDIA’s compute unified device architecture. To investigate the behavior and performance of the proposed model we test it over a NP-complete problem. The study of systolic algorithms on GPU and the different versions of the proposal show that our canonical model is a competitive solver with efficacy and presents a good scalability behavior across different instance sizes. 相似文献

8.

基于CUDA的塔台模拟机冲突检测算法

汤坤费向东季玉龙徐伟《计算机与数字工程》2011,(10):85-88

塔台模拟机冲突检测算法是一种耗时大的并行算法。针对其导致塔台模拟系统核心服务器CPU负担过重的缺点,在常用冲突检测算法的基础上,提出一种基于统一设备构架（CUDA）的塔台模拟机冲突检测实现方案。首先介绍GPU并行运算的体系结构基础,并将基于卡尔曼滤波的目标物体跟踪技术的分层冲突检测算法移植到GPU。然后利用相同价格的CPU和GPU对比运算效果。实验结果表明：与相同算法的CPU实现方案相比,GPU实现方案将计算效率提高10～50倍。使用此方案,极大地减轻了核心服务器的负担,使塔台模拟机的性能得到质的提高。相似文献

9.

Adaptive metadata rebalance in exascale file system

Myung-Hoon Cha Dong-Oh Kim Hong-Yeon Kim Young-Kyun Kim 《The Journal of supercomputing》2017,73(4):1337-1359

This paper presents an effective method of metadata rebalance in exascale distributed file systems. Exponential data growth has led to the need for an adaptive and robust distributed file system whose typical architecture is composed of a large cluster of metadata servers and data servers. Though each metadata server can have an equally divided subset from the entire metadata set at first, there will eventually be a global imbalance in the placement of metadata among metadata servers, and this imbalance worsens over time. To ensure that disproportionate metadata placement will not have a negative effect on the intrinsic performance of a metadata server cluster, it is necessary to recover the balanced performance of the cluster periodically. However, this cannot be easily done because rebalancing seriously hampers the normal operation of a file system. This situation continues to get worse with both an ever-present heavy workload on the file system and frequent failures of server components at exascale. As one of the primary reasons for such a degraded performance, file system clients frequently fail to look up metadata from the metadata server cluster during the period of metadata rebalance; thus, metadata operations cannot proceed at their normal speed. We propose a metadata rebalance model that minimizes failures of metadata operations during the metadata rebalance period and validate the proposed model through a cost analysis. The analysis results demonstrate that our model supports the feasibility of online metadata rebalance without the normal operation obstruction and increases the chances of maintaining balance in a huge cluster of metadata servers. 相似文献

10.

RSA算法的CUDA高效实现技术 总被引：1，自引：1，他引：0

下载免费PDF全文

孙迎红童元满王志英《计算机工程与应用》2011,47(2):84-87

CUDA（Compute Unified Device Architecture）作为一种支持GPU通用计算的新型计算架构,在大规模数据并行计算方面得到了广泛的应用。RSA算法是一种计算密集型的公钥密码算法,给出了基于CUDA的RSA算法并行化高效实现技术,其关键为引入大量独立并发的Montgomery模乘线程,并给出了具体的线程组织、数据存储结构以及基于共享内存的性能优化实现技术。根据RSA算法CUDA实现方法,在某款GPU上测试了RSA算法的运算性能和吞吐率。实验结果表明,与RSA算法的通用CPU实现方式相比,CUDA实现能够实现超过40倍的性能加速。相似文献

11.

A distributed server architecture supporting dynamic resource provisioning for BPM-oriented workflow management systems

Ching-Hong Tsai Author Vitae Feng-Jian Wang Author Vitae 《Journal of Systems and Software》2010,83(8):1538-1552

Workflow management systems have been widely used in many business process management (BPM) applications. There are also a lot of companies offering commercial software solutions for BPM. However, most of them adopt a simple client/server architecture with one single centralized workflow-management server only. As the number of incoming workflow requests increases, the single workflow-management server might become the performance bottleneck, leading to unacceptable response time. Development of parallel servers might be a possible solution. However, a parallel server architecture with a fixed-number of servers cannot efficiently utilize computing resources under time-varying system workloads. This paper presents a distributed workflow-management server architecture which adopts dynamic resource provisioning mechanisms to deal with the probable performance bottleneck. We implemented a prototype system of the proposed architecture based on a commercial workflow management system, Agentflow. A series of experiments were conducted on the prototype system for performance evaluation. The experimental results indicate that the proposed architecture can deliver scalable performance and effectively maintain stable request response time under a wide range of incoming workflow request workloads. 相似文献

12.

CC-NUMA架构下4路龙芯3B服务器设计与实现

张鹏《计算机工程与科学》2018,40(12):2141-2145

针对特定领域中服务器的高性能计算、高带宽通信以及自主可控需求,在分析龙芯3B3000处理器架构特点的基础上,设计了基于CC NUMA并行处理架构的4路龙芯3B3000高性能服务器核心模块,通过使用TOE芯片提高了网络响应效率,同时大幅降低了10G以太网接口对处理器资源的占用消耗,有效提高了服务器的综合性能。通过测试验证,该服务器能够实现高效的并行计算能力和10G以太网通信能力,且国产元器件种类占比和数量占比均可达95%以上。相似文献

13.

NFS over Lustre性能评测与分析

下载免费PDF全文

张媛卢泽新刘亚萍《计算机工程》2007,33(10):274-276

传统的网络文件系统难以满足高性能计算系统的I/O 需求，基于对象存储的全局并行文件系统Lustre可以有效地解决传统文件系统在可扩展性、可用性和性能上存在的问题。该文介绍了Lustre文件系统的结构及其优势，对NFS over Lustre 进行了性能测试，并将测试结果与Lustre文件系统、NFS网络文件系统及本地磁盘Ext3文件系统的性能进行了比较分析，给出了性能差异的原因，提出了一种可行的解决方法。相似文献

14.

ROARS: a robust object archival system for data intensive scientific computing

Hoang Bui Peter Bui Patrick Flynn Douglas Thain 《Distributed and Parallel Databases》2012,30(5-6):325-350

As scientific research becomes more data intensive, there is an increasing need for scalable, reliable, and high performance storage systems. Such data repositories must provide both data archival services and rich metadata, and cleanly integrate with large scale computing resources. ROARS is a hybrid approach to distributed storage that provides both large, robust, scalable storage and efficient rich metadata queries for scientific applications. In this paper, we present the design and implementation of ROARS, focusing primarily on the challenge of maintaining data integrity across long time scales. We evaluate the performance of ROARS on a storage cluster, comparing to the Hadoop distributed file system and a centralized file server. We observe that ROARS has read and write performance that scales with the number of storage nodes, and integrity checking that scales with the size of the largest node. We demonstrate the ability of ROARS to function correctly through multiple system failures and reconfigurations. ROARS has been in production use for over three years as the primary data repository for a biometrics research lab at the University of Notre Dame. 相似文献

15.

Client-side straggler-aware I/O scheduler for object-based parallel file systems

《Parallel Computing》2019

Object-based parallel file systems have emerged as promising storage solutions for high-performance computing (HPC) systems. Despite the fact that object storage provides a flexible interface, scheduling highly concurrent I/O requests that access a large number of objects still remains as a challenging problem, especially in the case when stragglers (storage servers that are significantly slower than others) exist in the system. An efficient I/O scheduler needs to avoid possible stragglers to achieve low latency and high throughput. In this paper, we introduce a log-assisted straggler-aware I/O scheduling to mitigate the impact of storage server stragglers. The contribution of this study is threefold. First, we introduce a client-side, log-assisted, straggler-aware I/O scheduler architecture to tackle the storage straggler issue in HPC systems. Second, we present three scheduling algorithms that can make efficient decision for scheduling I/Os while avoiding stragglers based on such an architecture. Third, we evaluate the proposed I/O scheduler using simulations, and the simulation results have confirmed the promise of the newly introduced straggler-aware I/O scheduler. 相似文献

16.

基于Open VG云电子书系统的多级优化框架设计

下载免费PDF全文

张春燕于丽《计算机测量与控制》2017,25(8):162-165, 174

针对电子书应用存在的文件格式、性能效率低下和图像失真等问题,设计了一种应用于云电子书系统的多级优化框架,优化框架主要体现在如下三个方面;第一,对向量图形类库的性能进行描述,并提出了一种优化算法,减少了类库的时间复杂度;第二,在嵌入式GPU上并行进行坐标系统的计算;利用GPU在并行计算方面的优势,云电子书在向量图形类库方面获取了显著的性能提升;第三,云电子书将文件转化功能转嫁给Hadoop云平台,节省了移动设备的能量消耗和计算时间。同时为了对Hadoop调度过程中的数据位置问题进行优化,将位置感知调度器运用到提出的系统;实验结果表明:云电子书系统与最初的Open VG类库相比,性能提升了约70%,而且云电子书系统与连续服务器平台相比,计算时间减小了约60%。相似文献

17.

应用驱动的并行程序性能优化研究

邸瑞华蒋海华吕海《计算机科学》2013,40(1):49-53

从应用角度出发,分析、归纳各种应用中的核心计算过程,利用符合多核处理器芯片架构的并行计算模型对这些核心计算过程进行优化,得出可以被重复利用的高性能可扩展的软件库,它既可以支持新应用的高效开发,也可以保证程序性能的可扩展性。以分层并行计算模型思想为指导,从应用驱动的并行程序性能优化的角度出发,首先提出了面向多核处理器芯片体系结构的并行算法设计模型,在此基础上对并行扫描算法进行分析优化,得出新的具有良好扩展性、高性能的g-scan算法。之后深入研究13种核心计算实体之一的稀疏线性代数计算实体,应用g-scan算法设计实现了新的稀疏矩阵-向量运算算法,并将其应用于结构工程领域中广泛使用的有限元分析,大大提升了其执行效率。相似文献

18.

基于GPU的高性能并行计算技术

下载免费PDF全文

姚旺胡欣刘飞王红霞刘文文《计算机测量与控制》2014,22(12)

为研究基于GPU的高性能并行计算技术,利用集成448个处理核心的NVIDIA GPU GTX470实现了脉冲压缩雷达的基本数据处理算法,包括脉冲压缩算法与相参积累算法;同时根据GPU的并行处理架构,将脉冲压缩、相参积累算法完成了并行优化设计,有效地将算法并行映射到GPU GTX470的448个处理核心中,完成了脉冲压缩雷达基本处理算法的GPU并行处理实现;最后验证了并行计算的结果,并针对处理结果效果与实时性进行了评估。相似文献

19.

云计算集群服务器系统监控方法的研究 总被引：1，自引：0，他引：1

董波沈青肖德宝《计算机工程与科学》2012,34(10):68-72

随着云计算技术越来越多地应用到信息产业的各个领域,云计算环境下集群服务器系统的监控与管理的需求越来越大。云计算下的集群服务器系统主要是通过一系列基于分布式架构的服务器集群组成,其下的服务器数量可能多达上万台。要管理好数量如此大的云计算集群服务器系统,保证其高性能运行,必然需要一套有效的云计算集群监控系统对其进行监测与调控。但是,传统的集群监测系统存在一些不足与弊端。本文对于云计算集群系统的高性能监测调度方案进行了研究,从云监控系统的架构、数据采集、负载均衡调度方面进行了探讨,构建了一个保证云计算集群系统高性能运营的云系统方案。相似文献

20.

基于GPU集群实现MD5的快速破解

杨胜斌《电脑与信息技术》2013,21(2):54-56

从单个GPU异构并行系统来看,其性能还是比较有限,文章阐述了基于GPU集群的CUDA架构实现过程,详细分析基于GPU集群进行MD5快速破解的编译过程,并对破解程序进行测试研究,对其运行结果进行测试和分析,探讨搭建GPU高性能计算集群及其进行MD5算法的快速破解的过程。相似文献