首页 | 本学科首页   官方微博 | 高级检索  
文章检索
  按 检索   检索词:      
出版年份:   被引次数:   他引次数: 提示:输入*表示无穷大
  收费全文   21篇
  免费   0篇
  国内免费   12篇
一般工业技术   5篇
自动化技术   28篇
  2024年   1篇
  2023年   2篇
  2021年   3篇
  2020年   2篇
  2018年   1篇
  2017年   3篇
  2016年   2篇
  2014年   2篇
  2012年   3篇
  2011年   1篇
  2010年   1篇
  2009年   1篇
  2005年   1篇
  2003年   1篇
  2001年   1篇
  1998年   1篇
  1997年   1篇
  1995年   1篇
  1993年   1篇
  1992年   2篇
  1989年   1篇
  1988年   1篇
排序方式: 共有33条查询结果,搜索用时 203 毫秒
1.
We describe work done to improve the performance of NAG Library routines for nonlinear equations and nonlinear least squares problems on vector-processing machines. Calls to the Level 2 BLAS routines for matrix-vector operations were introduced wherever possible, so that further efforts to tune the code could be concentrated within the Level 2 BLAS, and advantage can be taken of optimized implementations of the Level 2 BLAS when they become available. Performance measurements from a CRAY-1S, an AMDAHL VP1100, and a CDC CYBER 205 are presented to illustrate the effectiveness of this strategy.  相似文献   
2.
BLAS (Basic Linear Algebra Subprograms)是一个基本线性代数操作的数学函数标准, 该库函数分为三个级别, 每个级别提供了向量与向量(1级)、向量与矩阵(2级)、向量与向量(三级)之间的基本运算. 本文研究了在申威1621处理器上BLAS一级函数的优化方案, 以函数AXPY为例, 充分利用平台的架构特点对其进行性能调优,设计了自动的线程分配方案. 实验结果显示优化过后的BLAS一级函数AXPY相对于GotoBLAS参考实现版本的单核和多核加速比分别高达4.36和9.50, 对于每种优化方式均得到了一定的性能提升.  相似文献   
3.
Beowulf计划关于“基于COTS技术以满足特殊计算需要”的思想使得机群计算成为斋性能计算的一个重要流派,本文针对类Beowulf机群的Intel微处理器特点,讨论了BLAS的优化技术,在以软件DSM系统作为并行编程环境的类Beowulf机群系统上作出了性能评价。  相似文献   
4.
双精度普通矩阵乘法DGEMM是BLAS库中最核心的函数之一,大部分三级BLAS库函数的核心计算都是通过调用DGEM M来实现的.该文针对龙芯3A具有128位访存指令的特点,通过理论分析,找到了最佳的循环展开方式;针对龙芯3A的Cache替换策略(随机替换),通过使用地址交错技术,减少了Cache的冲突失效;针对龙芯3A访存带宽有限的问题,通过使用共享数据的任务划分方式,减少了数据访存量.优化后的DGEMM单核和多核运算速度均是性能最高的开源BLAS库(Goto-BLAS)的2倍多.  相似文献   
5.
A direct solver for symmetric sparse matrices from finite element problems is presented. The solver is supposed to work as a local solver of domain decomposition methods for hybrid parallelization on cluster systems of multi‐core CPUs, and then it is required to run on shared memory computers and to have an ability of kernel detection. Symmetric pivoting with a given threshold factorizes a matrix with a decomposition introduced by a nested bisection and selects suspicious null pivots from the threshold. The Schur complement constructed from the suspicious null pivots is examined by a factorization with 1 × 1 and 2 × 2 pivoting and by a robust kernel detection algorithm based on measurement of residuals with orthogonal projections onto supposed image spaces. A static data structure from the nested bisection and a block sub‐structure for Schur complements at all bisection levels can use level 3 BLAS routines efficiently. Asynchronous task execution for each block can reduce idle time of processors drastically, and as a result, the solver has high parallel efficiency. Competitive performance of the developed solver to Intel Pardiso on shared memory computers is shown by numerical experiments. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   
6.
The popularity of Partitioned Global Address Space (PGAS) languages has increased during the last years thanks to their high programmability and performance through an efficient exploitation of data locality, especially on hierarchical architectures such as multicore clusters. This paper describes UPCBLAS, a parallel numerical library for dense matrix computations using the PGAS Unified Parallel C language. The routines developed in UPCBLAS are built on top of sequential basic linear algebra subprograms functions and exploit the particularities of the PGAS paradigm, taking into account data locality in order to achieve a good performance. Furthermore, the routines implement other optimization techniques, several of them by automatically taking into account the hardware characteristics of the underlying systems on which they are executed. The library has been experimentally evaluated on a multicore supercomputer and compared with a message‐passing‐based parallel numerical library, demonstrating good scalability and efficiency. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   
7.
The Basic Linear Algebra Subprograms (BLAS) define one of the most heavily used performance‐critical APIs in scientific computing today. It has long been understood that the most important of these routines, the dense Level 3 BLAS, may be written efficiently given a highly optimized general matrix multiply routine. In this paper, however, we show that an even larger set of operations can be efficiently maintained using a much simpler matrix multiply kernel. Indeed, this is how our own project, ATLAS (which provides one of the most widely used BLAS implementations in use today), supports a large variety of performance‐critical routines. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   
8.
异构HPL(high-performance Linpack)效率的提高需要充分发挥加速部件和通用CPU计算能力,加速部件集成了更多的计算核心,负责主要的计算,通用CPU负责任务调度的同时也参与计算.在合理划分任务、平衡负载的前提下,优化CPU端计算性能对整体效率的提升尤为重要.针对具体平台体系结构特点对BLAS(ba...  相似文献   
9.
This article discusses an efficient implementation of tensors of arbitrary rank by using some of the idioms introduced by the recently published C++ ISO Standard (C++11). With the aims at providing a basic building block for high-performance computing, a single Array class template is carefully crafted, from which vectors, matrices, and even higher-order tensors can be created. An expression template facility is also built around the array class template to provide convenient mathematical syntax. As a result, by using templates, an extra high-level layer is added to the C++ language when dealing with algebraic objects and their operations, without compromising performance. The implementation is tested running on both CPU and GPU.  相似文献   
10.
BLAS (basic linear algebra subprograms)是高性能扩展数学库的一个重要模块,广泛应用于科学与工程计算领域. BLAS 1级提供向量-向量运算, BLAS 2级提供矩阵-向量运算.针对国产SW26010-Pro众核处理器设计并实现了高性能BLAS 1、2级函数.基于RMA通信机制设计了从核归约策略,提升了BLAS 1、2级若干函数的归约效率.针对TRSV、TPSV等存在数据依赖关系的函数,提出了一套高效并行算法,该算法通过点对点同步维持数据依赖关系,设计了适用于三角矩阵的高效任务映射机制,有效减少了从核点对点同步的次数,提高了函数的执行效率.通过自适应优化、向量压缩、数据复用等技术,进一步提升了BLAS 1、2级函数的访存带宽利用率.实验结果显示, BLAS 1级函数的访存带宽利用率最高可达95%,平均可达90%以上, BLAS 2级函数的访存带宽利用率最高可达98%,平均可达80%以上.与广泛使用的开源数学库GotoBLAS相比, BLAS 1、2级函数分别取得了平均18.78倍和25.96倍的加速效果. LU分解、QR分解以及对称特征值问题通过调用...  相似文献   
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号