首页 | 本学科首页   官方微博 | 高级检索  
文章检索
  按 检索   检索词:      
出版年份:   被引次数:   他引次数: 提示:输入*表示无穷大
  收费全文   238篇
  免费   43篇
  国内免费   68篇
电工技术   4篇
综合类   15篇
化学工业   3篇
金属工艺   2篇
机械仪表   1篇
建筑科学   3篇
矿业工程   1篇
轻工业   2篇
石油天然气   3篇
无线电   20篇
一般工业技术   24篇
原子能技术   6篇
自动化技术   265篇
  2023年   4篇
  2022年   4篇
  2021年   7篇
  2020年   15篇
  2019年   12篇
  2018年   20篇
  2017年   14篇
  2016年   25篇
  2015年   16篇
  2014年   21篇
  2013年   27篇
  2012年   20篇
  2011年   14篇
  2010年   16篇
  2009年   9篇
  2008年   6篇
  2007年   9篇
  2006年   10篇
  2005年   6篇
  2004年   11篇
  2003年   11篇
  2002年   9篇
  2001年   5篇
  2000年   12篇
  1999年   9篇
  1998年   5篇
  1997年   7篇
  1996年   4篇
  1995年   3篇
  1994年   3篇
  1993年   4篇
  1992年   3篇
  1990年   1篇
  1988年   7篇
排序方式: 共有349条查询结果,搜索用时 15 毫秒
1.
马尔可夫聚类算法(MCL)是在大规模生物网络中寻找模块的一个有效方法,能够挖掘网络结构和功能影响力较大的模块。算法涉及到大规模矩阵计算,因此复杂度可达立方阶次。针对复杂度高的问题,提出了基于消息传递接口(MPI)的并行化马尔可夫聚类算法以提高算法的计算性能。首先,生物网络转化成邻接矩阵;然后,根据算法的特性,按照矩阵的规模判断并重新生成新矩阵以处理非平方倍数矩阵的计算;其次,并行计算通过按块分配的方式能够有效地实现任意规模矩阵的运算;最后,循环并行计算直至收敛,得到网络聚类结果。通过模拟网络和真实生物网络数据集的实验结果表明,与全块集体式通信(FCC)并行方法相比,平均并行效率提升了10个百分点以上,因此可以将该优化算法应用在不同类型的大规模生物网络中。  相似文献   
2.
为高效求解单相孔隙–裂隙渗流问题,发展一种基于任意网格的三维中心型有限体积渗流求解算法,并对其进行OpenMP并行化。该算法将压力置于单元中心处;使用串联弹簧模型在空间域离散;使用显式差分格式在时间域离散;使用动态松弛求解技术,逐个单元求解。算例研究表明,该算法与有限元相比具有类似的精度,但求解效率更高。OpenMP并行化使得该算法运算速度在CPUi7–3770上可提高至4.0倍,在CPUi7–4770上可提高至4.2倍;两台机器上的并行效率均高达50%以上。  相似文献   
3.
Single GPU scaling is unable to keep pace with the soaring demand for high throughput computing. As such executing an application on multiple GPUs connected through an off-chip interconnect will become an attractive option to explore. However, much of the current code is written for a single GPU system. Porting such a code for execution on multiple GPUs is difficulty task. In particular, it requires programmer effort to determine how data is partitioned across multiple GPU cards and then launch the appropriate thread blocks that mostly accesses the data that is local to that card. Otherwise, cross-card data movement is an expensive operation. In this work we explore hardware support to efficiently parallelize a single GPU code for execution on multiple GPUs. In particular, our approach focuses on minimizing the number of remote memory accesses across the off-chip network without burdening the programmer to perform data partitioning and workload assignment. We propose a data-location aware thread block scheduler to schedule the thread blocks on the GPU that has most of its input data. The scheduler exploits well known observation that GPU workloads tend to launch a kernel multiple times iteratively to process large volumes of data. The memory accesses of the thread block across different iterations of a kernel launch exhibit correlated behavior. Our data location aware scheduler exploits this predictability to track memory access affinity of each thread block to a specific GPU card and stores this information to make scheduling decisions for future iterations. To further reduce the number of remote accesses we propose a hybrid mechanism that enables migrating or copying the pages between the memory of multiple GPUs based on their access behavior. Hence, most of the memory accesses are to the local GPU memory. Over an architecture consisting of two GPUs, our proposed schemes are able to improve the performance by 1.55× when compared to single GPU execution across widely used Rodinia [17], Parboil [18], and Graph [23] benchmarks.  相似文献   
4.
Parallelism has become mainstream, in the multicore chip, the GPU, and the internet datacenter running MapReduce. In my field, large-scale scientific computing, parallelism now reigns triumphant.  相似文献   
5.
To speed up data‐intensive programs, two complementary techniques, namely nested loops parallelization and data locality optimization, should be considered. Effective parallelization techniques distribute the computation and necessary data across different processors, whereas data locality places data on the same processor. Therefore, locality and parallelization may demand different loop transformations. As such, an integrated approach that combines these two can generate much better results than each individual approach. This paper proposes a unified approach that integrates these two techniques to obtain an appropriate loop transformation. Applying this transformation results in coarse grain parallelism through exploiting the largest possible groups of outer permutable loops in addition to data locality through dependence satisfaction at inner loops. These groups can be further tiled to improve data locality through exploiting data reuse in multiple dimensions.  相似文献   
6.
This paper describes how the EPparallel tool splits a single annual simulation into 12 simulations of one month each and runs them in parallel. The paper also describes the methodology to prepare input files, enable file sharing between nodes, collate results generated by the nodes, and ensure quality check on the simulations. The EPparallel tool uses Message Passing Interface library and runs on Linux. The tool has been tested on 16 commercial reference buildings over 16 US weather files. The results of these 256 runs which include the run times, computing time overheads, speed gains and accuracy of results are presented in this paper. The speed gain ranged from 2.9×to 7.8×and deviation (percentage of output values obtained in parallel simulation which were off by more than 1% as compared to output values obtained in annual simulation) ranged from 0% to 4%.  相似文献   
7.
Abstract

Most automatic parallelizes are based on the detection of independent operations. Dependence analysis is mainly a syntactical process, in which the actual data transformations are ignored. There is another source of parallelism, which relies on semantical information, namely the detection of reductions and scans. Scans and reductions are quite frequent in scientific codes and are implemented efficiently on most parallel computers. We present here a new Scan detector which is based on the normalization of systems of recurrence equations. This allows the detection of scans in loops nests of arbitrary depth and on multi-dimensional arrays, and gives a uniform treatment for scalar reductions, array reductions, and arrays of reductions.  相似文献   
8.
Simulations based on multi‐scale material models enabled by adaptive sampling have demonstrated speedup factors exceeding an order of magnitude. The use of these methods in parallel computing is hampered by dynamic load imbalance, with load imbalance measurably reducing the achieved speedup. Here we discuss these issues in the context of task parallelism, showing results achieved to date and discussing possibilities for further improvement. In some cases, the task parallelism methods employed to date are able to restore much of the potential wall‐clock speedup. The specific application highlighted here focuses on the connection between microstructure and material performance using a polycrystal plasticity‐based multi‐scale method. However, the parallel load balancing issues are germane to a broad class of multi‐scale problems. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   
9.
并行化程序的出现大大提高了应用程序的执行效率,多核程序设计时需要对程序的性能进行考虑。本文重点讨论OpenMP编程模型中多核多线程程序在并行化开销、负载均衡、线程同步开销方面对程序性能的影响。  相似文献   
10.
The paper presents a new open‐source framework called KernelHive for multilevel parallelization of computations among various clusters, cluster nodes, and finally, among both CPUs and GPUs for a particular application. An application is modeled as an acyclic directed graph with a possibility to run nodes in parallel and automatic expansion of nodes (called node unrolling) depending on the number of computation units available. A methodology is proposed for parallelization and mapping of an application to the environment that includes selection of devices using a chosen optimizer, selection of best grid configurations for compute devices, optimization of data partitioning and the execution. One of possibly many scheduling algorithms can be selected considering execution time, power consumption, and so on. An easy‐to‐use GUI is provided for modeling and monitoring with a repository of ready‐to‐use constructs and computational kernels. The methodology, execution times, and scalability have been demonstrated for a distributed and parallel password‐breaking example run in a heterogeneous environment with a cluster and servers with different numbers of nodes and both CPUs and GPUs. Additionally, performance of the framework has been compared with an MPI + OpenCL implementation using a parallel geospatial interpolation application employing up to 40 cluster nodes and 320 cores. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号