首页 | 本学科首页   官方微博 | 高级检索  
文章检索
  按 检索   检索词:      
出版年份:   被引次数:   他引次数: 提示:输入*表示无穷大
  收费全文   39篇
  免费   2篇
  国内免费   4篇
综合类   1篇
机械仪表   1篇
无线电   2篇
自动化技术   41篇
  2014年   1篇
  2013年   1篇
  2011年   1篇
  2009年   1篇
  2008年   4篇
  2007年   2篇
  2005年   2篇
  2004年   3篇
  2003年   1篇
  2002年   3篇
  2001年   4篇
  2000年   5篇
  1999年   1篇
  1998年   4篇
  1997年   2篇
  1996年   2篇
  1995年   4篇
  1994年   1篇
  1992年   2篇
  1991年   1篇
排序方式: 共有45条查询结果,搜索用时 31 毫秒
31.
The performance of the Global Array shared-memory nonuniform memory-access programming model is explored in a wide-area-network (WAN) distributed supercomputer environment. The Global Array model is extended by introducing a concept of mirrored arrays that thanks to the caching and user-controlled consistency of the shared data structure scan reduce the application sensitivity to the network latency. Latencies and bandwidths for remote memory access are studied, and the performance of a large application from computational chemistry is evaluated using both fully distributed and also mirrored arrays. Excellent performance can be obtained with mirroring if even modest (0.5 MB/s) network bandwidth is available.  相似文献   
32.
This paper investigates the performance of synchronization algorithms on ccNUMA multiprocessors, from the perspectives of the architecture and the operating system. In contrast with previous related studies that emphasized the relative performance of synchronization algorithms, this paper takes a new approach by analyzing the sources of synchronization latency on ccNUMA architectures and how can this latency be reduced by leveraging hardware and software schemes in both dedicated and multiprogrammed execution environments. From the architectural perspective, the paper identifies the implications of directory-based cache coherence on the latency and scalability of synchronization instructions and examines if and how can simple hardware that accelerates these instructions be leveraged to reduce synchronization latency. From the operating system's perspective, the paper evaluates in a unified framework, user-level, kernel-level and hybrid algorithms for implementing scalable synchronization in multiprogrammed execution environments. Along with visiting the aforementioned issues, the paper contributes a new methodology for implementing fast synchronization algorithms on ccNUMA multiprocessors. The relevant experiments are conducted on the SGI Origin2000, a popular commercial ccNUMA multiprocessor.  相似文献   
33.
In distributed shared-memory (DSM) multiprocessors, a write operation requires multiple messages to invalidate the nodes which share and cache the memory block to being written. The consequent write stall time impedes the performance of such systems. An effective means of achieving efficient invalidation is to employ multicast messages to reach the sharing nodes. This study evaluates two multicast-based invalidation schemes, dual-path and pruning, by performing application-driven simulation. The experimental settings used herein find that multicasts improve invalidation traffic for four of the six evaluated real applications. The remaining two applications are computationally intensive, and multicast-based invalidation is less effective. However, since multicasts encourage bursty communication, our results indicate that they help relieve network congestion during these periods. Dual-path performs slightly better than pruning, because it is less sensitive to routing delay in the routers. Our results further demonstrate that cache size is an important design parameter for multicast-based invalidation, and is highly effective for DSM multiprocessors with larger caches.  相似文献   
34.
Off-chip replacement (capacity and conflict) and coherent read misses in a distributed shared memory system cause execution to stall for hundreds of cycles. These off-chip replacement and coherent read misses are recurring and forming sequences of two or more misses called streams. Prior streaming techniques ignored reordering of misses and not-recently-accessed streams while streaming data. In this paper, we present stream prefetcher design that can deal with both problems. Our stream prefetcher design utilizes stream waiting rooms to store not-recently-accessed streams. Stream waiting rooms help remove more off-chip misses. Using trace based simulation% our stream prefetcher design can remove 8% to 66% (on average 40%) and 17% to 63% (on average 39%) replacement and coherent read misses, respectively. Using cycle-accurate full-system simulation, our design gives speedups from 1.00 to 1.17 of princeton application repository for shared-memory computers (PARSEC) workloads running on a distributed shared memory system with the exception of dedup and swaptions workloads.  相似文献   
35.
The list marking problem involves marking the nodes of an ℓ-node linked list stored in the memory of a (p, n)-PRAM, when only the position of the head of the list is initially known, while the remaining list nodes are stored in arbitrary memory locations. Under the assumption that cells containing list nodes bear no distinctive tags distinguishing them from other cells, we establish anΩ(min{ℓ, n/p}) randomized lower bound for ℓ-node lists and present a deterministic algorithm whose running time is within a logarithmic additive term of this bound. Such a result implies that randomization cannot be exploited in any significant way in this setting. For the case where list cells are tagged in a way that differentiates them from other cells, the above lower bound still applies to deterministic algorithms, while we establish a tight

bound for randomized algorithms. Therefore, in the latter case, randomization yields a better performance for a wide range of parameter values.  相似文献   
36.
This paper reports the performance of a single node of the Hitachi SR8000 when using SPEC OMP2001 benchmarks. Each processing node of the SR8000 is a shared-memory parallel computer composed of eight scalar processors with pseudo-vector processing feature. We have run the all of the SPEC OMP2001 benchmarks on the SR8000. According to the results of this performance measurement, we found that the SR8000 has good scalability continuing up to 8 processors except for a few benchmark programs. The performance results demonstrate that the SR8000 achieves high performance especially for memory-intensive applications.  相似文献   
37.
Results are reported for a series of experiments involving numerical curve tracking on a shared-memory parallel computer. Several algorithms exist for finding zeros or fixed points of nonlinear systems of equations that are globally convergent for almost all starting points, that is, with probability one. The essence of all such algorithms is the construction of an appropriate homotopy map and then the tracking of some smooth curve in the zero set of this homotopy map. HOMPACK is a mathematical software package implementing globally convergent homotopy algorithms with three different techniques for tracking a homotopy zero curve, and has separate routines for dense and sparse Jacobian matrices. The HOMPACK algorithms for sparse Jacobian matrices use a preconditioned conjugate gradient algorithm for the computation of the kernel of the homotopy Jacobian matrix, a required linear algebra step for homotopy curve tracking. A parallel version of HOMPACK is implemented on a shared-memory parallel computer with various levels and degrees of parallelism (e.g., linear algebra, function, and Jacobian matrix evaluation), and a detailed study is presented for each of these levels with respect to the speedup in execution time obtained with the parallelism, the time spent implementing the parallel code, and the extra memory allocated by the parallel algorithm.  相似文献   
38.
本文介绍了一种新型的并行计算机系统EP—860。由于它采用了独特的广播共享存贮器技术,使得整个系统兼顾有着松散耦合和紧密耦合两种系统的优点。系统扩展性好,便于用户编程。用户编程时只需将共享变量放到广播共享存贮器中,用访存指令即可实现通信,无需专门的通信命令。此外,本系统还具有结构简单,便于实现,经济实用等优点。  相似文献   
39.
S2MP服务器的结构与实现   总被引:1,自引:0,他引:1  
叙述S2MP服务器的模块化结构,共享内存的分布处理,系统可扩展的新技术,缓冲一致规则和非一致性存储访问NUMA结构体系。显示出这种服务器具有巨大的处理能力、系统扩展的能力。  相似文献   
40.
For the past decades computer engineers have focused on building high-performance and large-scale computer systems with low-cost. One of the examples is a distributed-memory computer system like a cluster, where fast processing nodes to use commodity processors are connected through a high speed network. But it is not easy to develop applications on this system, because a programmer must consider all data and control dependences between processes and program them explicitly. For alleviating this problem the distributed virtual shared-memory (DVSM) system has been proposed. It is well known that the performance of the DVSM system highly depends on the network’s performance and programming semantics, and currently its performance is very limited on a conventional network. Recently many advanced hardware-based interconnection technologies have been introduced, and one of them is the InfiniBand Architecture (IBA) which supports shared-memory programming semantics by means of remote direct-memory access (RDMA) and atomic operations. In this paper, we present the implementation of our InfiniBand-based DVSM system and analyze the performance of SPEC OMP benchmarks in detail by comparing with the DVSM based on the traditional network architecture and the hardware shared-memory multiprocessor (SMP) system. As experiment result, we show that our DVSM system to use full features of the IBA can improve the performance significantly over the IPoIB-based traditional system on the IBA, and furthermore the performance of one application on the IBA-based DVSM system is better than on the hardware SMP.  相似文献   
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号