期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A GPU-based heart simulator with mass-spring systems and cellular automaton

Ricardo Silva Campos Marcelo Lobosco Rodrigo Weber dos Santos 《The Journal of supercomputing》2014,69(1):1-8

This work proposes an electro-mechanical simulator of the cardiac tissue, so that its main feature is the low computational cost. This is necessary to run real-time simulations and on the fly applications. In order to achieve this, we used cellular automata and mass-spring systems to model the cardiac behavior, and furthermore parallelize the code to run in graphics processing unit (GPU) with compute unified device architecture. Sequentially, our simulator was quite faster than traditional partial differential equations simulators. In addition, we performed different load tests to evaluate our code behavior in GPUs, and spotted its potentials and bottlenecks. 相似文献

2.

An efficient parallel-network packet pattern-matching approach using GPUs

《Journal of Systems Architecture》2014,60(5):431-439

In the past few years, the increase in interest usage has been substantial. The high network bandwidth speed and the large amount of threats pose challenges to current network intrusion detection systems, which manage high amounts of network traffic and perform complicated packet processing. Pattern matching is a computationally intensive process included in network intrusion detection systems. In this paper, we present an efficient graphics processing unit (GPU)-based network packet pattern-matching algorithm by leveraging the computational power of GPUs to accelerate pattern-matching operations and subsequently increase the overall processing throughput. According to the experimental results, the proposed algorithm achieved a maximal traffic processing throughput of over 2 Gbit/s. The results demonstrate that the proposed GPU-based algorithm can effectively enhance the performance of network intrusion detection systems. 相似文献

3.

多GPU系统虚实地址转换架构研究

魏金晖李晨鲁建壮《计算机工程与科学》2021,43(2):228-234

近年来,随着大数据的发展,GPU应用的数据集规模急剧增加,这对GPU的处理能力提出了挑战.由于摩尔定律即将达到极限,提升单一GPU的性能变得越发困难,而多GPU系统通过提升GPU处理器级的并行性,成为应对该挑战的一种解决方案.GPU制造商对内存虚拟化的支持进一步简化了多GPU系统的编程,提升了资源利用率.内存虚拟化需要... 相似文献

4.

存储有效的多模式匹配算法和体系结构

嵩天李冬妮汪东升薛一波《软件学报》2013,24(7):1650-1665

多模式匹配是基于内容检测的网络安全系统的重要功能,同时,它在很多领域具有广泛的应用.实际应用中,高速且性能稳定的大规模模式匹配方法需求迫切,尤其是能够在线实时处理网络包的匹配体系结构.介绍了一种存储有效的高速大规模模式匹配算法及相关体系结构.研究从算法所基于的理论入手,提出了缓存状态机模型,并结合状态机中转换规则分类,提出了交叉转换规则动态生成的匹配算法ACC(Aho-Corasick-CDFA).该算法通过动态生成转换规则降低了生成状态机的规模,适用于大规模模式集.进一步提出了基于该算法的体系结构设计.采用网络安全系统中真实模式集进行的实验结果表明,该算法相比其他状态机类模式匹配算法,可以进一步减少80%~95%的状态机规模,存储空间降低40.7%,存储效率提高近2 倍,算法单硬件结构实现可以达到11Gbps 的匹配速度. 相似文献

5.

Revisiting actor programming in C++

《Computer Languages, Systems and Structures》2016

The actor model of computation has gained significant popularity over the last decade. Its high level of abstraction makes it appealing for concurrent applications in parallel and distributed systems. However, designing a real-world actor framework that subsumes full scalability, strong reliability, and high resource efficiency requires many conceptual and algorithmic additives to the original model.In this paper, we report on designing and building CAF, the C++ Actor Framework. CAF targets at providing a concurrent and distributed native environment for scaling up to very large, high-performance applications, and equally well down to small constrained systems. We present the key specifications and design concepts—in particular a message-transparent architecture, type-safe message interfaces, and pattern matching facilities—that make native actors a viable approach for many robust, elastic, and highly distributed developments. We demonstrate the feasibility of CAF in three scenarios: first for elastic, upscaling environments, second for including heterogeneous hardware like GPUs, and third for distributed runtime systems. Extensive performance evaluations indicate ideal runtime at very low memory footprint for up to 64 CPU cores, or when offloading work to a GPU. In these tests, CAF continuously outperforms the competing actor environments Erlang, Charm++, SalsaLite, Scala, ActorFoundry, and even the raw message passing framework OpenMPI. 相似文献

6.

A survey of timed automata for the development of real-time systems

《Computer Science Review》2013

Timed automata are a popular formalism to model real-time systems. They were introduced two decades ago to support formal verification. Since then they have also been used for other purposes and a large number of variants has been introduced to be able to deal with the many different kinds of requirements of real-time system development. This survey attempts to introduce a massive and complicated theoretical research area to a reader in an easy and compact manner. One objective of this paper is to inform a reader about the theoretical properties (or capabilities) of timed automata which are (or might be) useful for real-time model driven development. To achieve this goal, this paper presents a survey on semantics, decision problems, and variants of timed automata. The other objective of this paper is to inform a reader about the current state of the art of timed automata in practice. To achieve the second aim, this article presents a survey on timed automata’s implementability and tools. 相似文献

7.

Air pollution modelling using a Graphics Processing Unit with CUDA

F. Molnár Jr. R. Mészáros 《Computer Physics Communications》2010,181(1):105-85

The Graphics Processing Unit (GPU) is a powerful tool for parallel computing. In the past years the performance and capabilities of GPUs have increased, and the Compute Unified Device Architecture (CUDA) - a parallel computing architecture - has been developed by NVIDIA to utilize this performance in general purpose computations. Here we show for the first time a possible application of GPU for environmental studies serving as a basement for decision making strategies. A stochastic Lagrangian particle model has been developed on CUDA to estimate the transport and the transformation of the radionuclides from a single point source during an accidental release. Our results show that parallel implementation achieves typical acceleration values in the order of 80-120 times compared to CPU using a single-threaded implementation on a 2.33 GHz desktop computer. Only very small differences have been found between the results obtained from GPU and CPU simulations, which are comparable with the effect of stochastic transport phenomena in atmosphere. The relatively high speedup with no additional costs to maintain this parallel architecture could result in a wide usage of GPU for diversified environmental applications in the near future. 相似文献

8.

Processing of semantic nets on dataflow architectures

《Artificial Intelligence》1985,27(2):219-227

Extracting knowledge from a semantic network may be viewed as a process of finding given patterns in the network. On a von Neumann computer architecture the semantic net is a passive data structure stored in memory and manipulated by a program. This paper demonstrates that by adopting a data-driven model of computation the necessary pattern-matching process may be carried out on a highly-parallel dataflow architecture. The model is based on the idea of representing the semantic network as a dataflow graph in which each node is an active element capable of accepting, processing, and emitting data tokens traveling asynchronously along the network arcs. These tokens are used to perform a parallel search for the given patterns. Since no centralized control is required to guide and supervise the token flow, the model is capable of exploiting a computer architecture consisting of large numbers of independent processing elements. 相似文献

9.

模型驱动的嵌入式系统设计安全性验证方法研究

刘雪胡军黄志球马金晶程桢石娇洁《计算机工程与科学》2015,37(8):1498-1509

基于模型的嵌入式系统安全性分析与验证方法是近年来在安全攸关系统工程领域中出现的一个重要研究热点。提出一种基于模型驱动架构的面向SysML/MARTE状态机的系统安全性验证方法,具体包括:构建了具备SysML/MARTE扩展语义的状态机元模型,以及安全性建模与分析语言AltaRica的语义模型GTS的元模型;然后建立了从SysML/MARTE状态机模型分别到时间自动机模型以及AltaRica模型的语义映射模型转换规则,并基于AMMA平台和时间自动机验证工具UPPAAL设计实现了对SysML/MARTE状态机的模型转换与系统安全性形式化验证的框架。最后给出了一个飞机着陆控制系统设计模型的安全性验证实例分析。相似文献

10.

Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition

Paul R. Dixon Tasuku Oonishi Sadaoki Furui 《Computer Speech and Language》2009,23(4):510-526

In large vocabulary continuous speech recognition (LVCSR) the acoustic model computations often account for the largest processing overhead. Our weighted finite state transducer (WFST) based decoding engine can utilize a commodity graphics processing unit (GPU) to perform the acoustic computations to move this burden off the main processor. In this paper we describe our new GPU scheme that can achieve a very substantial improvement in recognition speed whilst incurring no reduction in recognition accuracy. We evaluate the GPU technique on a large vocabulary spontaneous speech recognition task using a set of acoustic models with varying complexity and the results consistently show by using the GPU it is possible to reduce the recognition time with largest improvements occurring in systems with large numbers of Gaussians. For the systems which achieve the best accuracy we obtained between 2.5 and 3 times speed-ups. The faster decoding times translate to reductions in space, power and hardware costs by only requiring standard hardware that is already widely installed. 相似文献

11.

GPU加速分子动力学模拟的热力学量提取*

刘丹赵广辉夏红霞胡磊《计算机应用研究》2010,27(5):1820-1822

近年来,统一计算设备架构(CUDA)的提出和图形处理器（GPU）快速提升的并行处理能力和数据传输能力,使得基于CUDA的GPU通用计算迅速成为一个研究热点。针对含有大规模分子动力学模拟的热力学量提取效率低下的问题,提出了分子动力学模拟的热力学量提取的新方法,利用CUDA设计了并行算法,实现了利用GPU加速分子动力学模拟的热力学量提取。实验结果表明,与基于CPU的算法相比, GPU可以提高速度500倍左右。相似文献

12.

基于演化硬件的交通流模型自适应优化

聂鑫李元香王珑柳林《计算机科学》2011,38(5):186-189

BML模型是用于研究城市路网结构中交通流特征的细胞自动机模型。基于软件的模拟及演化优化方式存在着运算效率低、优化速度慢等缺陷,极大地限制了交通流模型的实时应用能力。针对这一问题,提出将演化硬件与细胞自动机相结合,实现交通流模型的在线演化。同时对I3ML模型进行了改进,以便能够依据现实车流状况进行交通灯信号的自适应调节。实验结果表明,将演化硬件技术用于交通流模型的自适应优化,对于研制智能交通系统是一种可行的途径。相似文献

13.

Concurrent warp execution: improving performance of GPU-likely SIMD architecture by increasing resource utilization

Hong Jun Choi Dong Oh Son Jong Myon Kim Cheol Hong Kim 《The Journal of supercomputing》2014,69(1):330-356

Hardware parallelism should be exploited to improve the performance of computing systems. Single instruction multiple data (SIMD) architecture has been widely used to maximize the throughput of computing systems by exploiting hardware parallelism. Unfortunately, branch divergence due to branch instructions causes underutilization of computational resources, resulting in performance degradation of SIMD architecture. Graphics processing unit (GPU) is a representative parallel architecture based on SIMD architecture. In recent computing systems, GPUs can process general-purpose applications as well as graphics applications with the help of convenient APIs. However, contrary to graphics applications, general-purpose applications include many branch instructions, resulting in serious performance degradation of GPU due to branch divergence. In this paper, we propose concurrent warp execution (CWE) technique to reduce the performance degradation of GPU in executing general-purpose applications by increasing resource utilization. The proposed CWE enables selecting co-warps to activate more threads in the warp, leading to concurrent execution of combined warps. According to our simulation results, the proposed architecture provides a significant performance improvement (5.85 % over PDOM, 91 % over DWF) with little hardware overhead. 相似文献

14.

基于GPU的特征脸算法优化研究

李繁严星张晓宇《计算机科学》2021,48(4):197-204

特征脸算法是基于脸部表征的常用人脸辨识方法之一。当训练数据量较大时,不管是训练还是测试模块都非常耗时。基于此,采用CUDA并行运算架构实现GPU加速特征脸算法。针对GPU并行运算的效果取决于硬件规格、算法本身的复杂度和可并行性,以及程序开发者使用GPU的并行化方式等因素,文中首先提出在特征脸算法训练阶段的计算平均值、zero mean、正规化特征脸等计算步骤以及测试阶段的投影到特征脸空间、计算欧几里得距离等计算步骤使用GPU优化加速;其次在相应计算步骤采用不同的并行化加速方法并做出效能评估。实验结果表明,在人脸训练数据量在320~1920的范围内,各计算步骤加速效果明显。与Intel i7-5960X相比,GTX1060显示适配器在训练模块中可达到平均约71.7倍的加速效果,在测试模块中可达到平均约34.1倍的加速效果。相似文献

15.

Modeling and animation of fracture of heterogeneous materials based on CUDA

Jiangfan Ning Huaxun Xu Bo Wu Liang Zeng Sikun Li Yueshan Xiong 《The Visual computer》2013,29(4):265-275

Existing techniques for animation of object fracture are based on an assumption that the object materials are homogeneous while most real world materials are heterogeneous. In this paper, we propose to use movable cellular automata (MCA) to simulate fracture phenomena on heterogeneous objects. The method is based on the discrete representation and inherits the advantages from both classical cellular automaton and discrete element methods. In our approach, the object is represented as discrete spherical particles, named movable cellular automata. MCA is used to simulate the material and physical properties so as to determine when and where the fracture occurs. To achieve real-time performance, we accelerate the complex computation of automata’s physical properties in MCA simulation using CUDA on a GPU. The simulation results are directly sent to vertex buffer object (VBO) for rendering to avoid the costly communication between CPU and GPU. The experimental results show the effectiveness of our method. 相似文献

16.

基于GPU的社会化网络关系计算研究

王亚芹孔雪元王滢《计算机光盘软件与应用》2011,(15)

随着网络不断地社会化、普及化,网络社区的规模也越来越大,这给社会化网络关系的计算带来了巨大的计算量。这些计算包括个人关系及计算与生成、全局关系计算与生成以及关系的挖掘等。虽然这些工作的计算量很大,但却适合并行处理。基于此,本文通过详细分析GPU的高性能运算及其在CUDA编程模型上的具体实现,讨论利用基于CUDA硬件架构的GPU来进行社区用户关系的并行计算。相似文献

17.

PSL构造双向交换自动机及非确定自动机的方法

虞蕾陈火旺《软件学报》2010,21(1):34-46

PSL(property specification language)是一种用于描述并行系统的属性规约语言,包括线性时序逻辑FL(foundation language)和分支时序逻辑OBE(optional branching extension)两部分.由于OBE就是CTL(computation tree logic),并且具有时钟声明的公式很容易改写成非时钟公式,因此重点研究了非时钟FL逻辑.为便于进行模型检验,每个FL公式必须转化成为一种可验证形式,通常是自动机(非确定自动机).构造非确定自动机的过程主要是通过中间构建交换自动机来实现.详细给出了由非时钟FL构造双向交换自动机的构造规则.构造规则的核心逻辑不仅仅局限于是在LTL(linear temporal logic)基础上的正规表达式,而且全面而充分地考虑了各种FL操作算子的可能性.并且给出了将双向交换自动机转化为非确定自动机的一种方法.最后,编写了将PSL转化为上述自动机的实现工具.FL双向交换自动机的构造规则计算复杂度仅是FL公式长度的线性表达式,验证了构造规则的正确性.在此基础上,证明了双向交换自动机与其转化的等价的非确定自动机接受的语言相同.上述工作对解决复杂并行系统建模和模型验证问题具有重要的理论意义和应用价值. 相似文献

18.

Exploiting the capabilities of modern GPUs for dense matrix computations

Sergio Barrachina Maribel Castillo Francisco D. Igual Rafael Mayo Enrique S. Quintana‐Ortí Gregorio Quintana‐Ortí 《Concurrency and Computation》2009,21(18):2457-2477

We present several algorithms to compute the solution of a linear system of equations on a graphics processor (GPU), as well as general techniques to improve their performance, such as padding and hybrid GPU‐CPU computation. We compare single and double precision performance of a modern GPU with unified architecture, and show how iterative refinement with mixed precision can be used to regain full accuracy in the solution of linear systems, exploiting the potential of the processor for single precision arithmetic. Experimental results on a GTX280 using CUBLAS 2.0, the implementation of BLAS for NVIDIA^® GPUs with unified architecture, illustrate the performance of the different algorithms and techniques proposed. Copyright © 2009 John Wiley & Sons, Ltd. 相似文献

19.

基于CUDA的蛋白质翻译后修饰鉴定MS-Alignment算法加速研究*

翟艳堂涂强郎显宇陆忠华迟学斌《计算机应用研究》2010,27(9):3409-3414

对MS-Alignment算法进行分析得出该算法很难满足大规模数据对鉴定速度的要求,而且具有的一个特点是相同的任务在不同的数据上重复计算,为数据划分提供了基础。基于CUDA编程模型使用图形处理器(GPU)对步骤数据库检索及候选肽段生成进行加速优化,设计了该步骤在单GPU上的实现方法。测试结果表明,此方法平均加速比为30倍以上,效果良好,可以满足蛋白质翻译后修饰鉴定中大规模数据快速计算的需求。相似文献

20.

Monitoring and fault diagnosis of hybrid systems. 总被引：3，自引：0，他引：3

Feng Zhao Xenofon Koutsoukos Horst Haussecker Jim Reich Patrick Cheung 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2005,35(6):1225-1240

Many networked embedded sensing and control systems can be modeled as hybrid systems with interacting continuous and discrete dynamics. These systems present significant challenges for monitoring and diagnosis. Many existing model-based approaches focus on diagnostic reasoning assuming appropriate fault signatures have been generated. However, an important missing piece is the integration of model-based techniques with the acquisition and processing of sensor signals and the modeling of faults to support diagnostic reasoning. This paper addresses key modeling and computational problems at the interface between model-based diagnosis techniques and signature analysis to enable the efficient detection and isolation of incipient and abrupt faults in hybrid systems. A hybrid automata model that parameterizes abrupt and incipient faults is introduced. Based on this model, an approach for diagnoser design is presented. The paper also develops a novel mode estimation algorithm that uses model-based prediction to focus distributed processing signal algorithms. Finally, the paper describes a diagnostic system architecture that integrates the modeling, prediction, and diagnosis components. The implemented architecture is applied to fault diagnosis of a complex electro-mechanical machine, the Xerox DC265 printer, and the experimental results presented validate the approach. A number of design trade-offs that were made to support implementation of the algorithms for online applications are also described. 相似文献