期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

全文获取类型

收费全文	238篇
免费	43篇
国内免费	68篇

专业分类

电工技术	4篇
综合类	15篇
化学工业	3篇
金属工艺	2篇
机械仪表	1篇
建筑科学	3篇
矿业工程	1篇
轻工业	2篇
石油天然气	3篇
无线电	20篇
一般工业技术	24篇
原子能技术	6篇
自动化技术	265篇

出版年

2023年	4篇
2022年	4篇
2021年	7篇
2020年	15篇
2019年	12篇
2018年	20篇
2017年	14篇
2016年	25篇
2015年	16篇
2014年	21篇
2013年	27篇
2012年	20篇
2011年	14篇
2010年	16篇
2009年	9篇
2008年	6篇
2007年	9篇
2006年	10篇
2005年	6篇
2004年	11篇
2003年	11篇
2002年	9篇
2001年	5篇
2000年	12篇
1999年	9篇
1998年	5篇
1997年	7篇
1996年	4篇
1995年	3篇
1994年	3篇
1993年	4篇
1992年	3篇
1990年	1篇
1988年	7篇

排序方式： 共有349条查询结果，搜索用时 15 毫秒

1 [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] 下一页 » 末页»

大规模生物网络马尔可夫聚类的并行化算法

孙佳敏朱嘉富杨伏长谢江《计算机应用》2019,39(1):66-71

马尔可夫聚类算法（MCL）是在大规模生物网络中寻找模块的一个有效方法，能够挖掘网络结构和功能影响力较大的模块。算法涉及到大规模矩阵计算，因此复杂度可达立方阶次。针对复杂度高的问题，提出了基于消息传递接口（MPI）的并行化马尔可夫聚类算法以提高算法的计算性能。首先，生物网络转化成邻接矩阵；然后，根据算法的特性，按照矩阵的规模判断并重新生成新矩阵以处理非平方倍数矩阵的计算；其次，并行计算通过按块分配的方式能够有效地实现任意规模矩阵的运算；最后，循环并行计算直至收敛，得到网络聚类结果。通过模拟网络和真实生物网络数据集的实验结果表明，与全块集体式通信（FCC）并行方法相比，平均并行效率提升了10个百分点以上，因此可以将该优化算法应用在不同类型的大规模生物网络中。相似文献

一种中心型有限体积孔隙–裂隙渗流求解方法及其OpenMP并行化

王理想李世海马照松冯春《岩石力学与工程学报》2015,34(5):865-875

为高效求解单相孔隙–裂隙渗流问题,发展一种基于任意网格的三维中心型有限体积渗流求解算法,并对其进行OpenMP并行化。该算法将压力置于单元中心处;使用串联弹簧模型在空间域离散;使用显式差分格式在时间域离散;使用动态松弛求解技术,逐个单元求解。算例研究表明,该算法与有限元相比具有类似的精度,但求解效率更高。OpenMP并行化使得该算法运算速度在CPUi7–3770上可提高至4.0倍,在CPUi7–4770上可提高至4.2倍;两台机器上的并行效率均高达50%以上。相似文献

Efficient automatic parallelization of a single GPU program for a multiple GPU system

《Integration, the VLSI Journal》2019

Single GPU scaling is unable to keep pace with the soaring demand for high throughput computing. As such executing an application on multiple GPUs connected through an off-chip interconnect will become an attractive option to explore. However, much of the current code is written for a single GPU system. Porting such a code for execution on multiple GPUs is difficulty task. In particular, it requires programmer effort to determine how data is partitioned across multiple GPU cards and then launch the appropriate thread blocks that mostly accesses the data that is local to that card. Otherwise, cross-card data movement is an expensive operation. In this work we explore hardware support to efficiently parallelize a single GPU code for execution on multiple GPUs. In particular, our approach focuses on minimizing the number of remote memory accesses across the off-chip network without burdening the programmer to perform data partitioning and workload assignment. We propose a data-location aware thread block scheduler to schedule the thread blocks on the GPU that has most of its input data. The scheduler exploits well known observation that GPU workloads tend to launch a kernel multiple times iteratively to process large volumes of data. The memory accesses of the thread block across different iterations of a kernel launch exhibit correlated behavior. Our data location aware scheduler exploits this predictability to track memory access affinity of each thread block to a specific GPU card and stores this information to make scheduling decisions for future iterations. To further reduce the number of remote accesses we propose a hybrid mechanism that enables migrating or copying the pages between the memory of multiple GPUs based on their access behavior. Hence, most of the memory accesses are to the local GPU memory. Over an architecture consisting of two GPUs, our proposed schemes are able to improve the performance by 1.55× when compared to single GPU execution across widely used Rodinia [17], Parboil [18], and Graph [23] benchmarks. 相似文献

A few bad ideas on the way to the triumph of parallel computing

Robert Schreiber 《Journal of Parallel and Distributed Computing》2014

Parallelism has become mainstream, in the multicore chip, the GPU, and the internet datacenter running MapReduce. In my field, large-scale scientific computing, parallelism now reigns triumphant. 相似文献

Locality‐Conscious Nested‐Loops Parallelization

Saeed Parsa Mohammad Hamze 《ETRI Journal》2014,36(1):124-133

To speed up data‐intensive programs, two complementary techniques, namely nested loops parallelization and data locality optimization, should be considered. Effective parallelization techniques distribute the computation and necessary data across different processors, whereas data locality places data on the same processor. Therefore, locality and parallelization may demand different loop transformations. As such, an integrated approach that combines these two can generate much better results than each individual approach. This paper proposes a unified approach that integrates these two techniques to obtain an appropriate loop transformation. Applying this transformation results in coarse grain parallelism through exploiting the largest possible groups of outer permutable loops in addition to data locality through dependence satisfaction at inner loops. These groups can be further tiled to improve data locality through exploiting data reuse in multiple dimensions. 相似文献

Development and analysis of a tool for speed up of EnergyPlus through parallelization

Vishal Garg Akshey Jawa Jyotirmay Mathur Aviruch Bhatia 《Journal of Building Performance Simulation》2014,7(3):179-191

This paper describes how the EPparallel tool splits a single annual simulation into 12 simulations of one month each and runs them in parallel. The paper also describes the methodology to prepare input files, enable file sharing between nodes, collate results generated by the nodes, and ensure quality check on the simulations. The EPparallel tool uses Message Passing Interface library and runs on Linux. The tool has been tested on 16 commercial reference buildings over 16 US weather files. The results of these 256 runs which include the run times, computing time overheads, speed gains and accuracy of results are presented in this paper. The speed gain ranged from 2.9×to 7.8×and deviation (percentage of output values obtained in parallel simulation which were off by more than 1% as compared to output values obtained in annual simulation) ranged from 0% to 4%. 相似文献

DETECTION OF SCANS

《International Journal of Parallel, Emergent and Distributed Systems》2012,27(3-4):229-263

Abstract

Most automatic parallelizes are based on the detection of independent operations. Dependence analysis is mainly a syntactical process, in which the actual data transformations are ignored. There is another source of parallelism, which relies on semantical information, namely the detection of reductions and scans. Scans and reductions are quite frequent in scientific codes and are implemented efficiently on most parallel computers. We present here a new Scan detector which is based on the normalization of systems of recurrence equations. This allows the detection of scans in loops nests of arbitrary depth and on multi-dimensional arrays, and gives a uniform treatment for scalar reductions, array reductions, and arrays of reductions. 相似文献

A call to arms for task parallelism in multi‐scale materials modeling

Nathan R. Barton Joel V. Bernier Jaroslaw Knap Anne J. Sunwoo Ellen K. Cerreta Todd J. Turner 《International journal for numerical methods in engineering》2011,86(6):744-764

Simulations based on multi‐scale material models enabled by adaptive sampling have demonstrated speedup factors exceeding an order of magnitude. The use of these methods in parallel computing is hampered by dynamic load imbalance, with load imbalance measurably reducing the achieved speedup. Here we discuss these issues in the context of task parallelism, showing results achieved to date and discussing possibilities for further improvement. In some cases, the task parallelism methods employed to date are able to restore much of the potential wall‐clock speedup. The specific application highlighted here focuses on the connection between microstructure and material performance using a polycrystal plasticity‐based multi‐scale method. However, the parallel load balancing issues are germane to a broad class of multi‐scale problems. Copyright © 2011 John Wiley & Sons, Ltd. 相似文献

基于OpenMP编程模型的多线程程序性能分析

李梅《电子设计工程》2014,(23):42-44

并行化程序的出现大大提高了应用程序的执行效率,多核程序设计时需要对程序的性能进行考虑。本文重点讨论OpenMP编程模型中多核多线程程序在并行化开销、负载均衡、线程同步开销方面对程序性能的影响。相似文献

10.

KernelHive: a new workflow‐based framework for multilevel high performance computing using clusters and workstations with CPUs and GPUs

Pawe&#x; Rociszewski Pawe&#x; Czarnul Rafa&#x; Lewandowski Marcel Schally‐Kacprzak 《Concurrency and Computation》2016,28(9):2586-2607

The paper presents a new open‐source framework called KernelHive for multilevel parallelization of computations among various clusters, cluster nodes, and finally, among both CPUs and GPUs for a particular application. An application is modeled as an acyclic directed graph with a possibility to run nodes in parallel and automatic expansion of nodes (called node unrolling) depending on the number of computation units available. A methodology is proposed for parallelization and mapping of an application to the environment that includes selection of devices using a chosen optimizer, selection of best grid configurations for compute devices, optimization of data partitioning and the execution. One of possibly many scheduling algorithms can be selected considering execution time, power consumption, and so on. An easy‐to‐use GUI is provided for modeling and monitoring with a repository of ready‐to‐use constructs and computational kernels. The methodology, execution times, and scalability have been demonstrated for a distributed and parallel password‐breaking example run in a heterogeneous environment with a cluster and servers with different numbers of nodes and both CPUs and GPUs. Additionally, performance of the framework has been compared with an MPI + OpenCL implementation using a parallel geospatial interpolation application employing up to 40 cluster nodes and 320 cores. Copyright © 2015 John Wiley & Sons, Ltd. 相似文献

1 [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] 下一页 » 末页»