期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

蹇冬宇程永利《计算机系统应用》2023,32(12):218-223

图分区质量极大程度上影响着计算机之间的通信开销和负载平衡, 这对于大规模并行图计算的性能是至关重要的. 然而, 随着图数据规模的越来越大, 图分区算法的执行时间成了一个不可避免的问题. 因此, 研究如何优化图分区算法的执行效率是有必要的. 本文提出了一个基于广度优先遍历加权图生成的启发式图分割方法, 该方法在实现较低的通信代价和较好负载平衡的同时, 只引入了少量的预处理时间开销. 实验结果表明, 本文的划分方法减少了复制因子, 降低通信开销, 并且引入的时间开销较小. 相似文献

2.

一种改进的并行计算图划分模型

马永刚谭国真杨际祥潘东《小型微型计算机系统》2011,32(3)

图划分成功地应用在许多领域,但应用于并行计算时,使用边割度量通信量,其主要缺点是不能准确代表通信量,而且图划分模型没有考虑通信延迟和通信额外开销的分布对并行性能的影响.提出了改进的图划分模型,该模型将影响并行性能的多个要素(通信延迟、最大的局部通信额外开销和整体通信额外开销)整合到一个统一的代价函数,不仅克服了图划分模型中边割度量的一些缺点,而且可以通过调整加权参数,处理不同的优化目标和强调不同因素对并行性能的影响. 相似文献

3.

一种基于综合匹配度的边缘计算系统任务调度方法

郑守建彭晓晖王一帆任祖杰高丰《计算机学报》2022,45(3):485-499

边缘计算模式满足数据的实时和低功耗处理需求,是缓解当前网络数据洪流实时处理问题的有效方法之一.但边缘设备资源的异构与多样性给任务的调度与迁移带来极大的困难与挑战.目前,边缘计算任务调度研究主要集中在调度算法的设计与仿真,这些算法和模型通常忽略了边缘设备的异构性和边缘任务的多样性,不能使多样化的边缘任务与异构的资源能力深... 相似文献

4.

RGraph:基于RDMA的高效分布式图数据处理系统

崔鹏杰袁野李岑浩张灿王国仁《软件学报》2022,33(3):1018-1042

图是描述实体间关系的重要数据结构,被广泛地应用于信息科学、物理学、生物学、环境生态学等重要的科学领域.现如今,随着图数据规模的不断增大,利用分布式系统来处理大图数据已经成为主流,出现了形如Pregel、GraphX、PowerGraph和Gemini等经典的分布式大图数据处理系统.然而,与当前先进的基于单机的图处理系统... 相似文献

5.

An Efficient Tree-Based Multicasting Algorithm on Wormhole-Routed Star Graph Interconnection Networks Embedded with Hamiltonian Path

Nen-Chung?Wang Email author Chih-Ping?Chu 《The Journal of supercomputing》2005,34(1):5-26

Multicasting is an important issue for numerous applications in parallel and distributed computing. In multicasting, the same message is delivered from a source node to an arbitrary number of destination nodes. The star graph interconnection network has been recognized as an attractive alternative to the popular hypercube network. In this paper, we propose an efficient and deadlock-free tree-based multi-cast routing scheme for wormhole-routed star graph networks with hamiltonian path. In our proposed routing scheme, the router is with the input-buffer-based asynchronous replication mechanism that requires extra hardware cost. Meanwhile, the router simultaneously sends incoming flits on more than one outgoing channel. We perform simulation experiments with the network latency and the network traffic. Experimental results show that the proposed scheme reduces multicast latency more efficiently than other schemes. 相似文献

6.

Adaptive pre-task assignment scheduling strategy for heterogeneous distributed raytracing system

Kalim Qureshi Author Vitae Paul Manuel^{Author Vitae} 《Computers & Electrical Engineering》2007,33(1):70-78

One of the main obstacles in obtaining high performance from heterogeneous distributed computing (HDC) system is the inevitable communication overhead. This occurs when tasks executing on different computing nodes exchange data or the assigned sub-task size is very small. In this paper, we present adaptive pre-task assignment (APA) strategy for heterogeneous distributed raytracing system. In this strategy, the master assigns pre-task to the each node. The size of sub-task for each node is proportional to the node’s performance. One of the main features of this strategy is that it reduces the inter-processes communication, the cost overhead of the node’s idle time and load imbalance, which normally occurs in traditional runtime task scheduling (RTS) strategies. Performances of the RTS and APA strategies are evaluated on manager/master and workers model of HDC system. The experimental results of our proposed (APA) strategy have shown a significant improvement in the performance over RTS strategy. 相似文献

7.

一种基于NIC的RDMA可靠传输协议的设计与实现

夏军庞征斌刘路张峻常俊胜《计算机工程与科学》2014,36(2):216-221

高性能计算机不断增长的规模和复杂性使得可靠性成为影响高性能计算机系统可用性的关键因素,系统互连网络是高性能计算机的重要组成部分,其可靠性是高性能计算机系统设计必须考虑的重要问题。针对高性能计算机系统互连网络可能出现的故障,提出一种基于NIC实现的RDMA可靠传输协议,给出了一种通用的设计实现方案,并对该方案的几种具体优化设计实现方法进行了讨论。提出的可靠传输协议及实现方案能容忍系统互连网络可能出现的多种网络故障,并能尽量减少实现可靠传输所带来的额外开销。实验结果表明,所提出的RDMA可靠传输的实际测试性能与无连接RDMA传输相当。相似文献

8.

数据流Eager传输:一种分布式流体系结构中的性能优化技术

李鑫郭晓威林宇斐《计算机工程与科学》2015,37(11):2035-2044

分布式流体系结构扩展了分布式环境下的流计算模型,可在互联网上为大数据计算应用提供高效低成本的运行环境。互联网中较长的通信开销制约了计算性能。提出了一种数据流Eager传输的性能优化技术,以挖掘计算与通信之间的并行性,隐藏通信延迟。在分布式流体系结构原型系统中实现了该技术。实验结果表明,应用程序采用该优化技术之后的平均时间开销减少了19.58%,表明该优化技术能够显著提高应用的性能,具有良好的应用前景。相似文献

9.

面向大规模二部图的分布式Tip分解算法

周旭翁同峰杨志邦李博仁张吉李肯立《软件学报》2022,33(3):1043-1056

Tip分解作为图数据管理领域的热点研究问题,已被广泛应用于文档聚类和垃圾邮件组检测等实际场景中.随着图数据规模的爆炸式增长,单机内存已无法满足其存储需求,亟需研究分布式环境下Tip分解技术.现有分布式图计算系统的通信模式无法适用于二部图,为此,首先提出一种基于中继的通信模式,以实现分布式环境下处理二部图时消息的有效传递... 相似文献

10.

基于网络编码的P2P TV拓扑优化

张志明周晋陈震李军《计算机科学》2012,39(4):36-40,70

网络编码使得网络中间节点可以对收到的数据包进行特定的编码后再转发出去,以实现组播的最大吞吐率。这一技术应用于P2PTV系统,可以改善系统的有效传输率和延时等性能。为了缩短系统中节点等待数据包的时长和降低计算开销,实际系统大都对网络编码进行了简化。这使得数据包的冗余率受到拓扑结构的影响,增大了系统开销。针对这一问题量化分析了拓扑引起冗余的原因,提出了即时控制拓扑优化的方法,以对拓扑进行即时控制,优化拓扑结构。实验结果表明,相比已有工作,即时控制在数据包冗余率和节点上行带宽容量利用率之间取得了更好的折衷,并获得了更高的有效传输率。相似文献

11.

一种结合灰狼和FM算法的云端应用解构方法

姜凯华孙鹏韩锐《计算机与现代化》2020,(1):53-57

万物互联飞速发展,给云服务数据处理模式带来挑战。对此中科院提出海服务模式及海云协同系统架构。其中,云端应用的解构策略是影响系统性能的重要环节。而现有方法主要针对云计算场景下的无向简单图,不适用于海云协作环境下的有向带权图。为此,本文提出一种结合灰狼算法和FM算法的云端应用解构方法。利用灰狼算法快速收敛的特性,将灰狼算法的结果作为初始划分输入FM算法,以弥补FM算法对初始划分敏感的缺陷。仿真实验表明,混合算法的效果优于现有方法。划分后子图的顶点权和与海端节点资源分布匹配,且割权比明显降低,通信开销减少。相似文献

12.

ParTransgrid: A scalable parallel preprocessing tool for unstructured-grid cell-centered computational fluid dynamics applications

Jian Zhang Jie Liu Naichun Zhou Jing Tang Xie He Jianqiang Chen 《Software》2023,53(1):6-26

The development of a basic scalable preprocessing tool is the key routine to accelerate the entire computational fluid dynamics (CFD) workflow toward the exascale computing era. In this work, a parallel preprocessing tool, called ParTransgrid, is developed to translate the general grid format like CFD General Notation System into an efficient distributed mesh data format for large-scale parallel computing. Through ParTransgrid, a flexible face-based parallel unstructured mesh data structure designed in Hierarchical Data Format can be obtained to support various cell-centered unstructured CFD solvers. The whole parallel preprocessing operations include parallel grid I/O, parallel mesh partition, and parallel mesh migration, which are linked together to resolve the run-time and memory consumption bottlenecks for increasingly large grid size problems. An inverted index search strategy combined with a multi-master-slave communication paradigm is proposed to improve the pairwise face matching efficiency and reduce the communication overhead when constructing the distributed sparse graph in the phase of parallel mesh partition. And we present a simplified owner update rule to fast the procedure of raw partition boundaries migration and the building of shared faces/nodes communication mapping list between new sub-meshes with an order of magnitude of speed-up. Experiment results reveal that ParTransgrid can be easily scaled to billion-level grid CFD applications, the preparation time for parallel computing with hundreds of thousands of cores is reduced to a few minutes. 相似文献

13.

Direction-aware resource discovery in large-scale distributed computing environments

Wu-Chun Chung Chin-Jung Hsu Kuan-Chou Lai Kuan-Ching Li Yeh-Ching Chung 《The Journal of supercomputing》2013,66(1):229-248

As a system scales up, the peer-to-peer (P2P) approach is attractive to distributed computing environments, such as Grids and Clouds, due to the amount of resources increased. The major issue in large-scale distributed systems is to prevent the phenomenon of a communication bottleneck or a single point of failure. Conventional approaches may not be able to apply directly to such environments due to restricted queries and varied resource characteristics. Alternatively, a fully decentralized resource discovery service based on an unstructured overlay, which relies only on the information of resource attributes and characteristics, may be a feasible solution. One major challenge of such service is to locate desired and suitable resources without the global knowledge of distributed sharing resources. As a consequence, the more nodes the resource discovery service involves, the higher the network overhead incurs. In this paper, we proposed a direction-aware strategy which can alleviate the network traffic among unstructured information systems for distributed resource discovery service. Experimental results have demonstrated that the proposed approach achieves higher success rate at low cost and higher scalability. 相似文献

14.

Approximate maximum weight branchings

Amitabha Bagchi Ankur Bhargava 《Information Processing Letters》2006,99(2):54-58

We consider a special subgraph of a weighted directed graph: one comprising only the k heaviest edges incoming to each vertex. We show that the maximum weight branching in this subgraph closely approximates the maximum weight branching in the original graph. Specifically, it is within a factor of k/(k+1). Our interest in finding branchings in this subgraph is motivated by a data compression application in which calculating edge weights is expensive but estimating which are the heaviest k incoming edges is easy. An additional benefit is that since algorithms for finding branchings run in time linear in the number of edges our results imply faster algorithms although we sacrifice optimality by a small factor. We also extend our results to the case of edge-disjoint branchings of maximum weight and to maximum weight spanning forests. 相似文献

15.

A low-overhead networking mechanism for virtualized high-performance computing systems 总被引：1，自引：0，他引：1

Jae-Wan Jang Euiseong Seo Heeseung Jo Jin-Soo Kim 《The Journal of supercomputing》2012,59(1):443-468

The use of virtualized parallel and distributed computing systems is rapidly becoming the mainstream due to the significant benefit of high energy-efficiency and low management cost. Processing network operations in a virtual machine, however, incurs a lot of overhead from the arbitration of network devices between virtual machines, inherently by the nature of the virtualized architecture. Since data transfer between server nodes frequently occurs in parallel and distributed computing systems, the high overhead of networking may induce significant performance loss in the overall system. This paper introduces the design and implementation of a novel networking mechanism with low overhead for virtualized server nodes. By sacrificing isolation between virtual machines, which is insignificant in distributed or parallel computing systems, our approach significantly reduces the processing overhead in networking operations by up to 29% of processor load, along with up to 36% of processor cache miss. Furthermore, it improves network bandwidth by up to 8%, especially when transmitting large packets. As a result, our prototype enhances the performance of real-world workloads by up to 12% in our evaluation. 相似文献

16.

面向高维特征和多分类的分布式梯度提升树

江佳伟符芳诚邵蓥侠崔斌《软件学报》2019,30(3):784-798

梯度提升树算法由于其高准确率和可解释性,被广泛地应用于分类、回归、排序等各类问题.随着数据规模的爆炸式增长,分布式梯度提升树算法成为研究热点.虽然目前已有一系列分布式梯度提升树算法的实现,但是它们在高维特征和多分类任务上性能较差,原因是它们采用的数据并行策略需要传输梯度直方图,而高维特征和多分类情况下梯度直方图的传输成为性能瓶颈.针对这个问题,研究更加适合高维特征和多分类的梯度提升树的并行策略,具有重要的意义和价值.首先比较了数据并行与特征并行策略,从理论上证明特征并行更加适合高维和多分类场景.根据理论分析的结果,提出了一种特征并行的分布式梯度提升树算法FP-GBDT.FP-GBDT设计了一种高效的分布式数据集转置算法,将原本按行切分的数据集转换为按列切分的数据表征;在建立梯度直方图时,FP-GBDT使用一种稀疏感知的方法来加快梯度直方图的建立;在分裂树节点时,FP-GBDT设计了一种比特图压缩的方法来传输数据样本的位置信息,从而减少通信开销.通过详尽的实验,对比了不同并行策略下分布式梯度提升树算法的性能,首先验证了FP-GBDT提出的多种优化方法的有效性;然后比较了FP-GBDT与XGBoost的性能,在多个数据集上验证了FP-GBDT在高维特征和多分类场景下的有效性,取得了最高6倍的性能提升. 相似文献

17.

一种基于深度强化学习与概率性能感知的边缘计算环境多工作流卸载方法

马堉银郑万波马勇刘航夏云霓郭坤银陈鹏刘诚武《计算机科学》2021,48(1):40-48

移动边缘计算是一种新兴的分布式和泛在计算模式,其将计算密集型和时延敏感型任务转移到附近的边缘服务器,有效缓解了移动终端资源不足的问题,显著减小了用户与计算处理节点之间的通信传输开销.然而,如果多个用户同时提出计算密集型任务请求,特别是流程化的工作流任务请求,边缘计算环境往往难以有效地进行响应,并会造成任务拥塞.另外,受... 相似文献

18.

基于分治法求解对称三对角矩阵特征问题的混合并行实现

朱京乔赵永华《计算机系统应用》2019,28(9):246-250

基于对称三对角矩阵特征求解的分而治之方法,提出了一种改进的使用MPI/Cilk模型求解的混合并行实现,结合节点间数据并行和节点内多任务并行,实现了对分治算法中分治阶段和合并阶段的多任务划分和动态调度.节点内利用Cilk任务并行模型解决了线程级并行的数据依赖和饥饿等待等问题,提高了并行性;节点间通过改进合并过程中的通信流程,使组内进程间只进行互补的数据交换,降低了通信开销.数值实验体现了该混合并行算法在计算效率和扩展性方面的优势. 相似文献

19.

一种基于GPU集群的深度优先并行算法设计与实现

余莹李肯立郑光勇《计算机科学》2015,42(1):82-85

深度优先搜索算法在GPU集群中大型图上的简单执行,会导致线程间的负载不平衡和无法合并内存访问的情况,这使得算法的性能较低.为了明显提高算法在单个GPU和多个GPU环境下的性能,在处理数据之前通过采取一系列有效的操作来进行重新编排.提出了构造线程和数据之间映射的新技术,通过利用前缀求和及二分查找操作来达到完美的负载平衡.为了降低通信开销,对DFS各分支中需要进行交换的边集执行修剪操作.实验结果表明,算法在单个GPU上可以尽可能地实现最佳的并行性,在多GPU环境下可以最小化通信开销.在一个GPU集群中,它可以对合有数十亿节点的图有效地执行分布式DFS. 相似文献

20.

边缘计算中基于区块链的轻量级密文访问控制方案

郑嘉诚何亨陈月佳肖天哲《计算机系统应用》2024,33(4):69-81

密文策略属性基加密(ciphertext-policy attribute-based encryption, CP-ABE)技术可以在保证数据隐私性的同时提供细粒度访问控制.针对现有的基于CP-ABE的访问控制方案不能有效解决边缘计算环境中的关键数据安全问题,提出一种边缘计算环境中基于区块链的轻量级密文访问控制方案(blockchain-based lightweight access control scheme over ciphertext in edge computing, BLAC).在BLAC中,设计了一种基于椭圆曲线密码的轻量级CP-ABE算法,使用快速的椭圆曲线标量乘法实现算法加解密功能,并将大部分加解密操作安全地转移,使得计算能力受限的用户设备在边缘服务器的协助下能够高效地完成密文数据的细粒度访问控制;同时,设计了一种基于区块链的分布式密钥管理方法,通过区块链使得多个边缘服务器能够协同地为用户分发私钥.安全性分析和性能评估表明BLAC能够保障数据机密性,抵抗共谋攻击,支持前向安全性,具有较高的用户端计算效率,以及较低的服务器端解密开销和存储开销. 相似文献