首页 | 本学科首页   官方微博 | 高级检索  
文章检索
  按 检索   检索词:      
出版年份:   被引次数:   他引次数: 提示:输入*表示无穷大
  收费全文   53篇
  免费   0篇
电工技术   1篇
自动化技术   52篇
  2021年   1篇
  2016年   1篇
  2015年   1篇
  2014年   3篇
  2013年   1篇
  2012年   2篇
  2011年   6篇
  2009年   4篇
  2008年   2篇
  2007年   5篇
  2006年   2篇
  2005年   7篇
  2004年   3篇
  2003年   2篇
  2002年   2篇
  2001年   3篇
  2000年   2篇
  1999年   1篇
  1997年   1篇
  1996年   1篇
  1995年   2篇
  1993年   1篇
排序方式: 共有53条查询结果,搜索用时 281 毫秒
1.
The Journal of Supercomputing - Deadlock-free dynamic network reconfiguration process is usually studied from the routing algorithm restrictions and resource reservation perspective. The dynamic...  相似文献   
2.
A complete and efficient CUDA-sharing solution for HPC clusters   总被引:1,自引:0,他引:1  
In this paper we detail the key features, architectural design, and implementation of rCUDA, an advanced framework to enable remote and transparent GPGPU acceleration in HPC clusters. rCUDA allows decoupling GPUs from nodes, forming pools of shared accelerators, which brings enhanced flexibility to cluster configurations. This opens the door to configurations with fewer accelerators than nodes, as well as permits a single node to exploit the whole set of GPUs installed in the cluster. In our proposal, CUDA applications can seamlessly interact with any GPU in the cluster, independently of its physical location. Thus, GPUs can be either distributed among compute nodes or concentrated in dedicated GPGPU servers, depending on the cluster administrator’s policy. This proposal leads to savings not only in space but also in energy, acquisition, and maintenance costs. The performance evaluation in this paper with a series of benchmarks and a production application clearly demonstrates the viability of this proposal. Concretely, experiments with the matrix–matrix product reveal excellent performance compared with regular executions on the local GPU; on a much more complex application, the GPU-accelerated LAMMPS, we attain up to 11x speedup employing 8 remote accelerators from a single node with respect to a 12-core CPU-only execution. GPGPU service interaction in compute nodes, remote acceleration in dedicated GPGPU servers, and data transfer performance of similar GPU virtualization frameworks are also evaluated.  相似文献   
3.
It is widely accepted that transient failures will appear more frequently in chips designed in the near future due to several factors such as the increased integration scale. On the other hand, chip-multiprocessors (CMP) that integrate several processor cores in a single chip are nowadays the best alternative to more efficient use of the increasing number of transistors that can be placed in a single die. Hence, it is necessary to design new techniques to deal with these faults to be able to build sufficiently reliable chip multiprocessors (CMPs). In this work, we present a coherence protocol aimed at dealing with transient failures that affect the interconnection network of a CMP, thus assuming that the network is no longer reliable. In particular, our proposal extends a token-based cache coherence protocol so that no data can be lost and no deadlock can occur due to any dropped message. Using GEMS full system simulator, we compare our proposal against TokenCMP. We show that in absence of failures our proposal does not introduce overhead in terms of increased execution time over TokenCMP. Additionally, our protocol can tolerate message loss rates much higher than those likely to be found in the real world without increasing execution time more than 15 percent.  相似文献   
4.
Cluster computers represent a cost-effective alternative solution to supercomputers. In these systems, it is common to constrain the memory address space of a given processor to the local motherboard. Constraining the system in this way is much cheaper than using a full-fledged shared memory implementation among motherboards. However, memory usage among motherboards can be unfairly balanced.  相似文献   
5.
The last years have witnessed a dramatic growth in the number as well as in the variety of distributed virtual environment systems. These systems allow multiple users, working on different client computers that are interconnected through different networks, to interact in a shared virtual world. One of the key issues in the design of scalable and cost-effective DVE systems is the partitioning problem. This problem consists of efficiently assigning the existing clients to the servers in the system and some techniques have been already proposed for solving it. This paper experimentally analyzes the correlation of the quality function proposed in the literature for solving the partitioning problem with the performance of DVE systems. Since the results show an absence of correlation, we also propose the experimental characterization of DVE systems. The results show that the reason for that absence of correlation is the nonlinear behavior of DVE systems with regard to the number of clients in the system. DVE systems reach saturation when any of the servers reaches 100 percent of CPU utilization. The system performance greatly decreases if this limit is exceeded in any server. Also, as a direct application of these results, we present a partitioning method that is targeted to keep all the servers in the system below a certain threshold value of CPU utilization, regardless of the amount of network traffic. Evaluation results show that the proposed partitioning method can improve DVE system performance, regardless of both the movement pattern of clients and the initial distribution of clients in the virtual world.  相似文献   
6.
A theory for the design of deadlock-free adaptive routing algorithms for wormhole networks, proposed by the author (1991, 1993), supplies sufficient conditions for an adaptive routing algorithm to be deadlock-free, even when there are cyclic dependencies between channels. Also, two design methodologies were proposed. Multicast communication refers to the delivery of the same message from one source node to an arbitrary number of destination nodes. A tree-like routing scheme is not suitable for hardware-supported multicast in wormhole networks because it produces many headers for each message, drastically increasing the probability of a message being blocked. A path-based multicast routing model was proposed by Lin and Ni (1991) for multicomputers with 2D-mesh and hypercube topologies. In this model, messages are not replicated at intermediate nodes. This paper develops the theoretical background for the design of deadlock-free adaptive multicast routing algorithms. This theory is valid for wormhole networks using the path-based routing model. It is also valid when messages with a single destination and multiple destinations are mixed together. The new channel dependencies produced by messages with several destinations are studied. Also, two theorems are proposed, developing conditions to verify that an adaptive multicast routing algorithm is deadlock-free, even when there are cyclic dependencies between channels. As an example, the multicast routing algorithms of Lin and Ni are extended, so that they can take advantage of the alternative paths offered by the network  相似文献   
7.
Fault-tolerant systems aim at providing continuous operation in the presence of faults. Multicomputers rely on an interconnection network between processors to support the message-passing mechanism. Therefore, the reliability of the interconnection network is very important for the reliability of the whole system. This paper analyzes the effective redundancy available in a wormhole network by combining connectivity and deadlock freedom. Redundancy is defined at the channel level. We propose a sufficient condition for channel redundancy, also computing the set of redundant channels. The redundancy level of the network is also defined, proposing a theorem that supplies its value. This theory is developed on top of our necessary and sufficient condition for deadlock-free adaptive routing. The new theory also considers the failure of physical channels when virtual channels are used. Finally, we propose a methodology for the design of fault-tolerant routing algorithms, showing its application to n-dimensional meshes  相似文献   
8.
The InfiniBand Architecture (IBA) is an industry-standard architecture for server I/O and interprocessor communication. InfiniBand is extensively used for interconnection in high-performance clusters. It has been developed by the InfiniBandSMSM Trade Association (IBTA) to provide the levels of reliability, availability, performance, scalability, and quality of service (QoS) necessary for present and future server systems.  相似文献   
9.
Quality of service (QoS) support in local and cluster area environments has become an issue of great interest in recent years. Most current high-performance interconnection solutions for these environments have been designed to enhance conventional best-effort traffic performance, but are not well-suited to the special requirements of the new multimedia applications. The multimedia router (MMR) aims at offering hardware-based QoS support within a compact interconnection component. One of the key elements in the MMR architecture is the algorithms used in traffic scheduling. These algorithms are responsible for the order in which information is forwarded through the internal switch. Thus, they are closely related to the QoS-provisioning mechanisms. In this paper, several traffic scheduling algorithms developed for the MMR architecture are described. Their general organization is motivated by chances for parallelization and pipelining, while providing the necessary support both to multimedia flows and to best-effort traffic. Performance evaluation results show that the QoS requirements of different connections are met, in spite of the presence of best-effort traffic, while achieving high link utilizations.  相似文献   
10.
One important issue the designer of a scalable shared-memory multiprocessor must deal with is the amount of extra memory required to store the directory information. It is desirable that the directory memory overhead be kept as low as possible, and that it scales very slowly with the size of the machine. Unfortunately, current directory architectures provide scalability at the expense of performance. This work presents a scalable directory architecture that significantly reduces the size of the directory for large-scale configurations of a multiprocessor without degrading performance. First, we propose multilayer clustering as an effective approach to reduce the width of directory entries. Based on this concept, we derive three new compressed sharing codes, some of them with a space complexity of O(log/sub 2/(log/sub 2/(N))) for an N-node system. Then, we present a novel two-level directory architecture to eliminate the penalty caused by compressed directories in general. The proposed organization consists of a small full-map first-level directory (which provides precise information for the most recently referenced lines) and a compressed second-level directory (which provides in-excess information for all the lines). The proposals are evaluated based on extensive execution-driven simulations (using RSIM) of a 64-node cc-NUMA multiprocessor. Results demonstrate that a system with a two-level directory architecture achieves the same performance as a multiprocessor with a big and nonscalable full-map directory, with a very significant reduction of the memory overhead.  相似文献   
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号