首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The Pyramid network is a desirable network topology used as both software data-structure and hardware architecture. In this paper, we propose a general definition for a class of pyramid networks that are based on grid connections between the nodes in each level. Contrary to the conventional pyramid network in which the nodes in each level form a mesh, the connections between these nodes may also be according to other grid-based topologies such as the torus, hypermesh or WK-recursive. Such pyramid networks form a wide class of interconnection networks that possess rich topological properties. We study a number of important properties of these topologies for general-purpose parallel processing applications. In particular, we prove that such pyramids are Hamiltonian-connected, i.e. for any arbitrary pair of nodes in the network there exists at least one Hamiltonian path between the two given nodes, and pancyclic, i.e. any cycle of length 3, 4 … and N, can be embedded in a given N-node pyramid network. It is also proven that two link-disjoint Hamiltonian cycles exist in the torus-pyramid and hypermesh-pyramid networks.  相似文献   

2.
A novel hierarchical approach toward fast parallel processing of chain-codable contours is presented. The environment, called the chain pyramid, is similar to a regular nonoverlapping image pyramid structure. The artifacts of contour processing on pyramids are eliminated by a probabilistic allocation algorithm. Building of the chain pyramid is modular, and for different applications new algorithms can be incorporated. Two applications are described: smoothing of multiscale curves and gap bridging in fragmented data. The latter is also employed for the treatment of branch points in the input contours. A preprocessing module allowing the application of the chain pyramid to raw edge data is also described. The chain pyramid makes possible fast, O[log(image-size)], computation of contour representation in discrete scale-space  相似文献   

3.
Recently, parallel processing systems have been studied very actively, and many topologies have been proposed. A hypercube is one of the most popular topologies for interconnection networks. In this paper, we propose two new fault-tolerant routing algorithms for hypercubes based on approximate directed routable probabilities. Probabilities represent the ability of routing toward any node located at a specific distance and are calculated by considering from which direction the message has been received. Each node chooses one of its neighbor nodes to forward the message by comparing the approximate directed routable probabilities. We also conducted a computer experiment to verify the effectiveness of our algorithms.  相似文献   

4.
We present a new class of interconnection topologies called the Linear Recursive Networks (LRNs) and examine their possible applications in distributed systems. Each LRN is characterized by a recursive pattern of interconnection which can be specified by simple parameters. Basic properties such as node degree, diameter, and the performance of routing algorithms for all LRNs are then collectively analyzed in terms of these parameters. By choosing appropriate values for the parameters, our results can assist a network designer in selecting a topology with required routing performance and cost of interconnection. A subclass of LRNs, called Congruent LRNs (CLRNs), is also identified here and shown to possess desirable properties for more tightly coupled systems. It is shown that the CLRNs include existing networks such as hypercube and generalized Fibonacci cubes. These results suggest that the linear recursive networks potentially have applications in interconnecting distributed systems  相似文献   

5.
使用群论中的半直积作为工具,将已有的若干构建互连网络的方法统一成一种Cayley图模型CSC(q,p,l,k),使其具有更好的可扩展性。并证明了CSC(q,p,l,k)网络包括了若干重要的互连网络作为它的特殊情形,例如立方连通圈、星连通圈和最近提出并受到关注的k度Cayley图。提出该模型的意义在于为计算机系统的设计者们提供只需要选择合适的参数就可以确定自己需要的互连网络模型。其次,该模型也在一定程度上避免一些在互连网络构建方面的冗余研究工作。  相似文献   

6.
Due to advances in fiber-optics and VLSI technology, interconnection networks that allow multiple simultaneous broadcasts are becoming feasible. Distributed-shared-memory implementations on such networks promise high performance even for applications with small granularity. This paper presents the architecture of one such implementation, called the simultaneous optical multiprocessor exchange bus, and examines the performance of augmented DSM protocols that exploit the natural duplication of data to maintain a recovery memory in each processing node and provide basic fault tolerance. Simulation results show that the additional data duplication necessary to create fault-tolerant DSM causes no reduction in system performance during normal operation and eliminates most of the overhead at checkpoint creation. Under certain conditions, data blocks that are duplicated to maintain the recovery memory are utilized by the underlying DSM protocol, reducing network traffic, and increasing the processor utilization significantly.  相似文献   

7.
This paper extends research into rhombic overlapping-connectivity interconnection networks into the area of parallel applications. As a foundation for a shared-memory non-uniform access bus-based multiprocessor, these interconnection networks create overlapping groups of processors, buses, and memories, forming a clustered computer architecture where the clusters overlap. This overlapping-membership characteristic is shown to be useful for matching parallel application communication topology to the architecture's bandwidth characteristics. Many parallel applications can be mapped to the architecture topology so that most or all communication is localized within an overlapping cluster, at the low latency of processor direct to cache (or memory) over a bus. The latency of communication between parallel threads does not degrade parallel performance or limit the graininess of applications. Parallel applications can execute with good speedup and scaling on a proposed architecture which is designed to obtain maximum advantage from the overlapping-cluster characteristic, and also allows dynamic workload migration without moving the instructions or data. Scalability limitations of bus-based shared-memory multiprocessors are overcome by judicious workload allocation schemes, that take advantage of the overlapping-cluster memberships. Bus-based rhombic shared-memory multiprocessors are examined in terms of parallel speedup models to explain their advantages and justify their use as a foundation for the proposed computer architecture. Interconnection bandwidth is maximized with bi-directional circular and segmented overlapping buses. Strategies for mapping parallel application communication topologies to rhombic architectures are developed. Analytical models of enhanced rhombic multiprocessor performance are developed with a unique bandwidth modeling technique, and are compared with the results of simulation.  相似文献   

8.
针对现有人体动作识别方法需输入固定长度的视频段、未充分利用时空信息等问题,提出一种基于时空金字塔和注意力机制相结合的深度神经网络模型,将包含时空金字塔的3D-CNN和添加时空注意力机制的LSTM模型相结合,实现了对视频段的多尺度处理和对动作的复杂时空信息的充分利用。以RGB图像和光流场作为空域和时域的输入,以融合金字塔池化层的运动和外观特征后的融合特征作为融合域的输入,最后采用决策融合策略获得最终动作识别结果。在UCF101和HMDB51数据集上进行实验,分别取得了94.2%和70.5%的识别准确率。实验结果表明,改进的网络模型在基于视频的人体动作识别任务上获得了较高的识别准确率。  相似文献   

9.
随着高速信号传输技术和VLSI技术的发展,使用高阶路由器来应对因高性能计算机峰值性能不断攀升给高性能互连网络带来的新挑战已是发展需求;同时,如何利用高阶特性减少互连网络延迟和成本开销,以支持更大的网络规模是设计高性能互连网络拓扑结构的关键和突破点。针对目前基于高阶路由器的典型拓扑结构进行了分析,并在此基础上提出一个新的高阶拓扑架构SuperStar,其不仅具有较短的网络直径而且具有良好的可扩展性;通过在一个基于OMNeT++平台自主开发的高阶互连网络性能测评模拟器上设定不同的通信负载,测评各种拓扑结构在通信系统下实际的网络延迟和吞吐率的走势,以分析SuperStar的通信开销。  相似文献   

10.
The hypercube is one of the most widely used topologies because it provides small diameter and embedding of various interconnection networks. For very large systems, however, the number of links needed with the hypercube may become prohibitively large. In this paper, we propose a hierarchical interconnection network based on hypercubes called hierarchical hypercube network (HHN) for massively parallel computers. The HHN has a smaller number of links than the comparable hypercube and in particular, when we construct networks with 2Knodes, the node degree of HHN with the minimum node degree isO([formula]) while that of hypercube isO(K). Regardless of its smaller node degree, many parallel algorithms can be executed in HHN with the same time complexity as in the hypercube.  相似文献   

11.
A Survey of Interconnection Networks   总被引:2,自引:0,他引:2  
Tse-yun Feng 《Computer》1981,14(12):12-27
Concurrent processing depends on interconnection networks for communication among processors and memory modules. Various network topologies and switching strategies are covered here.  相似文献   

12.
In this paper, we introduce the FLUX interconnection networks, a scheme where the interconnections of a parallel system are established on demand before or during program execution. We present a programming paradigm which can be utilized to make the proposed solution feasible. We perform several experiments to show the viability of our approach and the potential performance gain of using the most suitable network configuration for a given parallel program. We experiment on several case studies, evaluate different algorithms, developed for meshes or trees, and map them on “grid”-like or reconfigurable physical interconnection networks. Our results clearly show that, based on the underlying network, different mappings are suitable for different algorithms. Even for a single algorithm different mappings are more appropriate, when the processing data size, the number of utilized nodes or the hardware cost of the processing elements changes. The implication of the above is that changing interconnection topologies/mappings (dynamically) on demand depending on the program needs can be beneficial.  相似文献   

13.
To support parallel processing of data-intensive applications, the interconnection network of a parallel/distributed machine must provide high end-to-end communication bandwidth and handle the bursty and concentrated communication patterns generated by dynamic load balancing and data collection operations. A large-scale interconnection network architecture called a virtual bus is proposed. The virtual bus can scale to terabits-per-second end-to-end communication bandwidth with low queuing delay for nonuniform traffic. A terabit virtual bus architecture can be efficiently implemented for less than 5% of the total cost of an eight-thousand-node system. In addition, the virtual bus has an open system parallel interface that is flexible enough to support up to gigabytes per second data transfer rates, different grades of services, and broadcast operation. Such flexibility makes the virtual bus a plausible open system communication backbone for a broad range of applications  相似文献   

14.
We survey and extend nonlinear signal decompositions based on morphological pyramids, and their application to multiresolution maximum intensity projection (MIP) volume rendering with progressive refinement and perfect reconstruction. The structure of the resulting multiresolution rendering algorithm is very similar to wavelet splatting. Several existing classes of pyramids are discussed, and their limitations indicated. To enhance the approximation quality of visualizations from reduced data (higher levels of the pyramid), two approaches are explored. First, a new class of morphological pyramids, involving connectivity enhancing operators, is considered. In the pyramidal analysis phase, a conditional dilation operator is used, with a given number n of iterations. The corresponding pyramids for n = 0 and n = 1 are known as the adjunction pyramid and Sun-Maragos pyramid, respectively. We show that the approximation quality when rendering from higher levels of the pyramid does increase as a function of the number of iterations n of the conditional dilation operator, but the improvement for n > 1 is limited. The second new approach, called streaming MIP-splatting, again starts from the adjunction pyramid. The new element is that detail coefficients of all levels are considered simultaneously and are resorted with respect to decreasing magnitude of a suitable error measure. All resorted coefficients are projected successively, until a desired accuracy of the resulting MIP image is obtained. We show that this method outperforms the previous methods based on morphological pyramids, both with respect to image quality with a fixed amount of detail data, and in terms of flexibility of controlling approximation error or computation time.Jos B.T.M. Roerdink received his M.Sc. (1979) in theoretical physics from the University of Nijmegen, the Netherlands. Following his Ph.D. (1983) from the University of Utrecht and a two-year position (1983–1985) as a Postdoctoral Fellow at the University of California, San Diego, both in the area of stochastic processes, he joined the Centre for Mathematics and Computer Science in Amsterdam. There he worked from 1986–1992 on image processing and tomographic reconstruction. He was appointed associate professor (1992) and full professor (2003), respectively, at the Institute for Mathematics and Computing Science of the University of Groningen, where he currently holds a chair in Scientific Visualization and Computer Graphics. His current research interests include morphological image processing, biomedical visualization, neuroimaging and bioinformatics.  相似文献   

15.
Cut-through switching promises low latency delivery and has been used in new generation switches, especially in high speed networks demanding low communication latency. The interconnection of cut-through switches provides an excellent network platform for high speed local area networks (LANs). For cost and performance reasons. Irregular topologies should be supported in such a switch-based network. Switched irregular networks are truly incrementally scalable and have potential to be reconfigured to adapt to the dynamics of network traffic conditions. Due to the arbitrary topologies of networks, it is critical to develop an efficient deadlock-free routing algorithm. A novel deadlock-free adaptive routing algorithm called adaptive-trail routing is proposed to allow irregular interconnection of cut-through switches. The adaptive routing algorithm is based on two unidirectional adaptive trails constructed from two opposite unidirectional Eulerian trails. Some heuristics are suggested in terms of the selection of Eulerian trails, the avoidance of long routing paths, and the degree of adaptivity. Extensive simulation experiments are conducted to evaluate the performance of the proposed and two other routing algorithms under different topologies and traffic workloads  相似文献   

16.
《Performance Evaluation》1999,35(1-2):49-74
Multicast network traffic is information with one source node, but many destination nodes. Rather than setting up individual connections between the source node and each destination node, or broadcasting the information to the entire network, multicasting efficiently exploits link capacity by allowing the source node to transmit a small number of copies of the information to mutually-exclusive groups of destination nodes. Multicasting is an important topic in the fields of networking (video and audio conferencing, video on demand, local-area network interconnection) and computer architecture (cache coherency, multiprocessor message passing). In this paper, we derive approximate expressions for the minimum cost (in terms of link utilization) of shortest-path multicast traffic in arbitrary tree networks. Our results provide a theoretical best-case scenario for link utilization of multicast distribution in tree topologies overlaid onto arbitrary graphs. In real networks such as the Internet MBONE, multicast distribution paths are often tree-like, but contain some cycles for purposes of fault tolerance. We find that even for richly-connected graphs such as the shufflenet and the hypercube, our expression provides a good prediction of the cost (in terms of link utilization) of multicast communication. Thus, this theoretical result has two applications: (1) a lower bound on the link capacity required for multicasting in random tree topologies, and (2) an approximation of the cost of multicasting in regular LAN and MAN topologies.  相似文献   

17.
This paper describes three hierarchical organizations of small processors for bottom-up image analysis:pyramids, interleaved pyramids, and pyramid trees. Progressively lower levels in the hierarchies process image windows of decreasing size. Bottom-up analysis is made feasible by transmitting up the levels quadrant borders and border-related information that captures quadrant interaction of interest for a given computation. The operation of the pyramid is illustrated by examples of standard algorithms for interior-based computations (e.g., area) and border-based computations of local properties (e.g., perimeter). A connected component counting algorithm is outlined that illustrates the role of border-related information in representing quadrant interaction. Interleaved pyramids are obtained by sharing processors among several pyramids. They increase processor utilization and throughput rate at the cost of increased hardware. Trees of shallow interleaved pyramids, calld pyramid trees, are introduced to reduce the hardware requirements of large interleaved pyramids at the expense of increased processing time, without sacrificing processor utilization. The three organizations are compared with respect to several performance measures.  相似文献   

18.
The astonishing development in the field of artificial neural networks (ANN) has brought significant advancement in many application domains, such as pattern recognition, image classification, and computer vision. ANN imitates neuron behaviors and makes a decision or prediction by learning patterns and features from the given data set. To reach higher accuracies, neural networks are getting deeper, and consequently, the computation and storage demands on hardware platforms are steadily increasing. In addition, the massive data communication among neurons makes the interconnection more complex and challenging. To overcome these challenges, ASIC-based DNN accelerators are being designed which usually incorporate customized processing elements, fixed interconnection, and large off-chip memory storage. As a result, DNN computation involves large memory accesses due to frequent load/off-loading data, which significantly increases the energy consumption and latency. Also, the rigid architecture and interconnection among processing elements limit the efficiency of the platform to specific applications. In recent years, Network-on-Chip-based (NoC-based) DNN becomes an emerging design paradigm because the NoC interconnection can help to reduce the off-chip memory accesses while offers better scalability and flexibility. To evaluate the NoC-based DNN in the early design stage, we introduce a cycle-accurate NoC-based DNN simulator, called DNNoC-sim. To support various operations such as convolution and pooling in the modern DNN models, we first propose a DNN flattening technique to convert diverse DNN operation into MAC-like operations. In addition, we propose a DNN slicing method to evaluate the large-scale DNN models on a resource-constraint NoC platform. The evaluation results show a significant reduction in the off-chip memory accesses compared to the state-of-the-art DNN model. We also analyze the performance and discuss the trade-off between different design parameters.  相似文献   

19.
无线传感器网络研究综述   总被引:39,自引:17,他引:39  
无线传感器网络作为计算、通信和传感器三项技术相结合的产物,是一种全新的信息获取和处理技术。在简要介绍无线传感器网络体系结构的基础上,分析和展望了一些有价值的应用领域。结合已有研究,从数据采集、数据计算和路由协议三方面介绍无线传感器网络的研究现状,并着重介绍了目前无线传感器网络采用的路由技术,最后指出了下一步的研究方向。  相似文献   

20.
Due to advances in fiber optics and VLSI technology, interconnection networks that allow simultaneous broadcasts are becoming feasible. Distributed shared memory (DSM) implementations on such networks promise high performance even for small applications with small granularity. This paper, after summarizing the architecture of one such implementation called the Simultaneous Multiprocessor Optical Exchange Bus (SOME-Bus), presents simple algorithms for improving the performance of parallel programs running on the SOME-Bus multiprocessor implementing cache-coherent DSM. The algorithms are based on run-time data redistribution via dynamic page migration protocol. They use memory access references together with the information of average channel utilization, average channel waiting time, number of messages in the channel queue or short-term average channel waiting time reported by each node and gathered by hardware monitors to make correct decisions related to the placement of shared data. Simulations with four parallel codes on a 64-processor SOME-Bus show that the algorithms yield significant performance improvements such as reduction in the execution times, number of remote memory accesses, average channel waiting times, average network latencies and increase in average channel utilizations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号