首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 593 毫秒
1.
Per-core scratchpad memories (or local stores) allow direct inter-core communication, with latency and energy advantages over coherent cache-based communication, especially as CMP architectures become more distributed. We have designed cache-integrated network interfaces, appropriate for scalable multicores, that combine the best of two worlds – the flexibility of caches and the efficiency of scratchpad memories: on-chip SRAM is configurably shared among caching, scratchpad, and virtualized network interface (NI) functions. This paper presents our architecture, which provides local and remote scratchpad access, to either individual words or multiword blocks through RDMA copy. Furthermore, we introduce event responses, as a technique that enables software configurable communication and synchronization primitives. We present three event response mechanisms that expose NI functionality to software, for multiword transfer initiation, completion notifications for software selected sets of arbitrary size transfers, and multi-party synchronization queues. We implemented these mechanisms in a four-core FPGA prototype, and measure the logic overhead over a cache-only design for basic NI functionality to be less than 20%. We also evaluate the on-chip communication performance on the prototype, as well as the performance of synchronization functions with simulation of CMPs with up to 128 cores. We demonstrate efficient synchronization, low-overhead communication, and amortized-overhead bulk transfers, which allow parallelization gains for fine-grain tasks, and efficient exploitation of the hardware bandwidth.  相似文献   

2.
Technological advances in network and processor speeds do not lead to equally large improvements in the performance of client-server systems. For instance, hardware performance improvements do not translate into faster user applications. This is primarily because software overhead dominates communication. The Shrimp project at Princeton University seeks solutions to this problem. Shrimp (Scalable High-Performance Really Inexpensive Multiprocessor) supports protected user-level communication between processes by mapping memory pages between virtual address spaces. This virtual memory-mapped network interface has several advantages, including flexible user-level communication and very low overhead for initiating data transfers. Here, we examine two remote procedure call (RPC) protocols and one socket implementation for Shrimp that deliver almost undiminished hardware performance to user applications  相似文献   

3.
The advanced metering infrastructure (AMI) in a smart grid contains hardware, software, and other electronic components connected through a communication infrastructure. AMI transfers meter-reading data between a group of smart meters and a utility centre. Herein, a wireless mesh network (WMN) with a random mesh topology is used to deploy the AMI communication network. In a WMN, paths are identified using a hybrid wireless mesh routing protocol (HWMP) with a load balancing feature called load aware-HWMP (LA-HWMP). These paths reduce the demand on links with a minimal air time metric; however, the delay in the data transmission of certain smart meters is high, given the large number of retransmissions caused by packet drop. To avert this problem and enhance the end-to-end delay, a genetic algorithm is applied on the LA-HWMP to obtain the optimal path. The optimisation process will result in the selection of paths with minimal delay. The genetic algorithm is developed with a rank-based selection, a two-point crossover, and a random reset mutation with a repair function to eliminate duplicate entries. The proposed method is compared with the HWMP, the LA-HWMP, and a state-of-the-art method that uses a combination of the ant colony algorithm and simulated annealing (ACA-SA) for AMI networks of different sizes. The obtained results show that the path identified by the proposed method yields a shorter delay and higher throughput than paths identified using the other methods.  相似文献   

4.
基于CAN总线的多传感器实时监控系统的设计   总被引:4,自引:1,他引:3  
CAN是一种有效支持分布式控制和实时控制的串行通信网络,提出了一种基于CAN总线的多传感器实时监控系统的设计方案,并详细介绍了Atmel公司片内集成CAN控制器的微控制器T89C51CC01的结构与功能,阐述了该系统的硬件电路结构和系统软件的设计,同时结合实际使用给出了硬件抗干扰措施,实际应用表明,该系统具有可靠性高、实时性强、扩展容易等应用优势.  相似文献   

5.
提出一种将基于MODBUS协议的通讯模块与上层组态软件相结合的通讯方法,实现了汽轮机保护系统的数据通讯。介绍了汽轮机保护系统通讯网络的硬件结构,MODBUS协议通讯模块的设计与实现,以及上层组态控制软件的开发。目前,基于MODBUS协议的汽轮机保护系统通讯设计已经开发完毕,在电厂得到了良好的应用。  相似文献   

6.
We present an issue of the dynamically reconfigurable hardware-software architecture which allows for partitioning networking functions on a SoC (System on Chip) platform. We address this issue as a partition problem of implementing network protocol functions into dynamically reconfigurable hardware and software modules. Such a partitioning technique can improve the co-design productivity of hardware and software modules. Practically, the proposed partitioning technique, which is called the ITC (Inter-Task Communication) technique incorporating the RT-IJC2 (Real-Time Inter-Job Communication Channel), makes it possible to resolve the issue of partitioning networking functions into hardware and software modules on the SoC platform. Additionally, the proposed partitioning technique can support the modularity and reuse of complex network protocol functions, enabling a higher level of abstraction of future network protocol specifications onto the SoC platform. Especially, the RT-IJC2 allows for more complex data transfers between hardware and software tasks as well as provides real-time data processing simultaneously for given application-specific real-time requirements. We conduct a variety of experiments to illustrate the application and efficiency of the proposed technique after implementing it on a commercial SoC platform based on the Altera’s Excalibur including the ARM922T core and up to 1 million gates of programmable logic.  相似文献   

7.
郝兴茂  戴冠中  刘航  叶芳宏 《测控技术》2007,26(7):45-47,52
设计了一种高性价比的报警主机系统,该系统引入虚拟键盘和控制台PC机,实现了对接入以太网的报警主机系统的网络化控制.详细介绍了该系统硬件软件的实现方案,提出了报警主机与虚拟键盘的通信协议以及采用非完全TCP/IP协议栈时的网络故障恢复机制.实践表明,该系统运行稳定,具有较强的抗网络干扰能力,达到了网络化管理报警主机系统的设计目标.  相似文献   

8.
在分析工业控制网络发展趋势、嵌入式以太网接入Internet网络数据传输需要的协议的基础上,针对现场总线通信方式的不足,设计了一个基于Ethernet的嵌入式测控系统。文章阐述了系统的设计方案,在硬件电路设计上,重点介绍了测控系统数据采集节点设计和以太网通信接口电路的设计;在软件系统设计上,提出测控终端软件设计方案,并实现了基于TCP/IP协议栈的网络通信设计。实际使用表明,该系统运行灵活、可靠、稳定,可直接使用企业内部的Intranet信息网,也可直接接人现行的公共Internet网络。  相似文献   

9.
With the increase in personal computer clusters in popularity and quantity, message passing between nodes has been an important issue for high failure rate in the network. File access in a cluster file system often contains several sub-operations; each includes one or more network transmissions. Any network failures cause the file system service unavailable. In this paper, we describe a highly reliable message-passing mechanism (HR-NET), which tolerates both software and hardware network failures. HR-NET provides fine-grained, connection-level failover across redundant communication paths. With it, the file system can keep passing messages because HR-NET handles failures automatically by either recovery from network failures or failed over to a backup; therefore, it screens network failures from requests or data transmission of cluster file system. Load balance for messages is also achieved to relieve network traffic. For transmission timeout, HR-NET proposes a priority-based message scheduling which dynamically manages messages in an appropriate order to tolerate request–response failures between clients and servers. HR-NET is implemented upon standard network protocol stack. Performance results show that HR-NET can provide almost full underlying network bandwidth with average 6.17% throughput loss and provide a fast recovery. Experiments with cluster file system show that the overall performance degradation is below 8% due to failover of HR-NET while the reliability is highly enhanced.  相似文献   

10.
General purpose (GP)GPU programming demands to couple highly parallel computing units with classic CPUs to obtain a high performance. Heterogenous systems lead to complex designs combining multiple paradigms and programming languages to manage each hardware architecture. In this paper, we present tools to harness GPGPU programming through the high-level OCaml programming language. We describe the SPOC library that allows to handle GPGPU subprograms (kernels) and data transfers between devices. We then present how SPOC expresses GPGPU kernel: through interoperability with common low-level extensions (from Cuda and OpenCL frameworks) but also via an embedded DSL for OCaml. Using simple benchmarks as well as a real world HPC software, we show that SPOC can offer a high performance while efficiently easing development. To allow better abstractions over tasks and data, we introduce some parallel skeletons built upon SPOC as well as composition constructs over those skeletons.  相似文献   

11.
针对在无PC机参与的情况下,两台嵌入式设备之间的高速、稳定的点对点通信迫切需求,文中提出了以USB协议为背景,通过USB协议的控制、批量和中断3种传输方式实现嵌入式设备之间数据交互的解决方案;该方案选择以LPC3250为硬件平台,RTX实时操作系统为软件平台,充分利用了RTX操作系统的微内核,高实时性、多任务和邮箱机制,设计和实现了一款轻型,实用且实时性高的USB主机驱动程序;该USB驱动支持控制传输、批量传输和中断传输3种方式。通过实际设备测试,该系统可以很好地完成大容量存储器各种基本功能。  相似文献   

12.
The creation of a routing overlay network on the Internet requires the identification of shorter detour paths between end hosts in comparison to the default path available. These detour paths are typically the edges forming a Triangle Inequality Violation (TIV), an artifact of the Internet delay space where the sum of latencies across an intermediate hop is lesser than the direct latency between the pair of end hosts. These violations are caused mainly due to interdomain routing policies between Autonomous Systems (ASes) and AS peering through Internet eXchange Points (IXPs). Identifying detours for a global overlay network requires large amounts of computational capabilities due to the sheer number of possible paths linking source and destination ASes. In this work, we use parallel programming paradigms to exploit the massively parallel capabilities of analyzing the large network measurement datasets made available to the network research community by CAIDA. We study Internet routes traversing IXPs and measure potential TIVs created by these paths. Large scale analysis of the dataset is carried out by implementing an efficient parallel solution on the CPU and then the general purpose graphics processor unit (GPGPU) as well. Both multicore CPU and GPGPU implementations can be carried out with ease on desktop environments with readily available software. We find both parallel solutions yield high improvements in speedup (2-35x) in comparison to the serial methodologies thereby opening up the possibility of harnessing the power of parallel programming with readily available hardware. The large amount of data analyzed and studied helps draw various inferences for the networking research community in building future scalable Internet routing overlays with greater routing efficiencies.  相似文献   

13.
模拟训练系统具有低风险、良好的保密性和易于自动化考核等优点,得到了广泛应用。设计了一种应用于通信台站的综合模拟系统解决方案。方案采用了集成化架构,将通信模拟系统划分为多个功能模块,通过网络接口交互,对于各类通信系统设备的模拟,采用软件和硬件结合的设计方式,划分为上位机软件和硬件前端,并制定了软硬件之间的操作输入和输出双向交互数据协议,这种设计方式为硬件前端的控制提供了通用的软件开发方法,便于软件大规模开发和灵活调试。  相似文献   

14.
The SMILE project main aim is to build an efficient low-cost cluster based on FPGA boards in order to take advantage of its reconfigurable capabilities. This paper shows the cluster architecture, describing: the SMILE nodes, the high-speed communication network for the nodes and the software environment. Simulating complex applications can be very hard, therefore a SystemC model of the whole system has been designed to simplify this task and provide error-free downloading and execution of the applications in the cluster. The hardware–software co-design process involved in the architecture and SystemC design is presented as well. The SMILE cluster functionality is tested executing a real complex Content-Based Information Retrieval (CBIR) parallel application and the performance of the cluster is compared (time, power and cost) with a traditional cluster approach.  相似文献   

15.
In this paper an efficient algorithm is proposed which optimizes periodic message scheduling in a real-time multiprocessor system. The system is based on a many-core single-chip computer architecture and uses a multistage baseline network for inter-core communication. Due to its basic architecture, internal blockings can occur during data transfers, i.e. the baseline network is not real-time capable by itself. Therefore, we propose a scheduling algorithm that may be performed before the execution of an application in order to compute a non-blocking schedule of periodic message transfers. Additionally, we optimize the clock rate of the network subject to the constraint that all data transfers can be performed in a non-blocking way. Our solution algorithm is based on a generalized graph coloring model and a randomized greedy approach. The algorithm was tested on some realistic communication scenarios as they appear in modern electronic car units. Computational results show the effectiveness of the proposed algorithm.  相似文献   

16.
There is a growing demand for network support for group applications, in which messages from one or more sender(s) are delivered to a large number of receivers. Here, we propose a network architecture for supporting a fundamental type of group communication, conferencing. A conference refers to a group of members in a network who communicate with each other within the group. We consider adopting a class of multistage networks, such as a baseline, an omega, or an indirect binary cube network, composed of switch modules with fan-in and fan-out capability for a conference network which supports multiple disjoint conferences. The key issue in designing a conference network is to determine the multiplicity of routing conflicts, which is the maximum number of conflict parties competing a single interstage link when multiple disjoint conferences simultaneously present in the network. Our results show that, for a network of size n /spl times/ n, the multiplicities of routing conflicts are small constants (between 2 and 4) for an omega network or an indirect binary cube network; while it can be as large as /spl radic/n/q + 1 for a baseline network, where q is the minimum allowable conference size. Thus, our design for conference networks is based on an omega network or an indirect binary cube network. We also develop fast self-routing algorithms for setting up routing paths in the newly designed conference networks. As can be seen, such an n /spl times/ n conference network has O(logn) routing time and communication delay and O(nlogn) hardware cost. The conference networks are superior to existing designs in terms of routing complexity, communication delay and hardware cost. The conference network proposed is rearrangeably nonblocking in general, and is strictly nonblocking under some conference service policy. It can be used in applications that require efficient or real-time group communication.  相似文献   

17.
FlexiTP is a novel TDMA protocol that offers a synchronized and loose slot structure. Nodes in the network can build, modify, or extend their scheduled number of slots during execution, based on their local information. Nodes wake up for their scheduled slots; otherwise, they switch into power-saving sleep mode. This flexible schedule allows FlexiTP to be strongly fault tolerant and highly energy efficient. FlexiTP is scalable for a large number of nodes because its depth-first-search schedule minimizes buffering, and it allows communication slots to be reused by nodes outside each other's interference range. Hence, the overall scheme of FlexiTP provides end-to-end guarantees on data delivery (throughput, fair access, and robust self-healing) while also respecting the severe energy and memory constraints of wireless sensor networks. Simulations in ns-2 show that FlexiTP ensures energy efficiency and is robust to network dynamics (faults such as dropped packets and nodes joining or leaving the network) under various network configurations (network topology and network density), providing an efficient solution for data-gathering applications. Furthermore, under high contention, FlexiTP outperforms 2-MAC in terms of energy efficiency and network performance.  相似文献   

18.
《Parallel Computing》1988,9(1):1-24
The Connection Machine is a massively parallel architecture with 65 536 single-bit processors and 32 Mbytes of memory, organized as a high-dimensional hypercube. A sophisticated router system provides efficient communication between remote processors. A rich software environment, including a parallel extension of COMMON LISP, provides access to the processors and network. Virtual processor capability extends the degree of fine-grained parallelism beyond 1 000 000.We describe the hardware and the parallel programming environment. We then present implementations of SOR, Multigrid and Conjugate Gradient algorithms for solving Partial Differential Equations on the Connection Machine. Measurements of computational efficiency are provided as well as an analysis of opportunities for achieving better performance. Despite the lack of floating-point hardware, computation rates above 100 Mflops have been achieved in PDE solution. Virtual processors prove to be a real advantage, easing the effort of software development while improving system performance significantly.  相似文献   

19.
RFID网络是由多种RFID软硬件资源组成的普适网络系统,需要研究和解决的一个重要问题就是多种资源的描述和发现机制.通过使用可自定义的(属性,值)匹配队列表方法,可以为RFID网络中多种软硬件资源提供通用和强大的描述能力.资源描述管理树可以实现组织内资源描述的高效和一致性管理,实现描述资源的快速匹配和发现,并支持应用系统和资源之间异步的数据传送.通过共享资源描述管理树,可以实现跨组织、受限的资源发现和共享.对该资源描述和发现机制的性能分析和实际实现及应用,证明该机制能在较大规模的RFID网络和应用系统中得到应用.  相似文献   

20.
卫星网络由多颗卫星和地面站组成,星间通过高速激光链路通信。卫星网络的核心之一就是路由器。高性能路由器的典型特点为数据路径和控制路径的分离。控制路径处理与高层路由协议相关的数据包,数据路径处理需要转发的数据包。数据路径是路由器的关键路径,直接影响着路由器的整体性能。在调研路由器技术发展历程之后,分析了高性能路由器典型结构及相关关键技术,考虑目前卫星网络系统需求、软硬件环境约束条件,对现有技术进行了优化和适应性修改,确定了卫星网络路由器中数据路径的实现方案。该方案满足当前卫星网络应用需求,且经简单扩展后,还可满足后续更高性能卫星网络路由器的设计需求。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号