期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Long-range dependence ten years of Internet traffic modeling 总被引：3，自引：0，他引：3

Karagiannis T. Molle M. Faloutsos M. 《Internet Computing, IEEE》2004,8(5):57-64

Self-similarity and scaling phenomena have dominated Internet traffic analysis for the past decade. With the identification of long-range dependence (LRD) in network traffic, the research community has undergone a mental shift from Poisson and memory-less processes to LRD and bursty processes. Despite its widespread use, though, LRD analysis is hindered by the difficulty of actually identifying dependence and estimating its parameters unambiguously. The authors outline LRD findings in network traffic and explore the current lack of accuracy and robustness in LRD estimation. In addition, they present recent evidence that packet arrivals appear to be in agreement with the Poisson assumption in the Internet core. 相似文献

2.

High-performance processor design based on 3D on-chip cache

《Microprocessors and Microsystems》2016

Interconnection becomes one of main concerns in current and future microprocessor designs from both performance and consumption. Three-dimensional integration technology, with its capability to shorten the wire length, is a promising method to mitigate the interconnection related issues. In this paper we implement a novel high-performance processor architecture based 3D on-chip cache to show the potential performance and power benefits achievable through 3D integration technology. We separate other logic module and cache module and stack 3D cache with the processor which reduces the global interconnection, power consumption and improves access speed. The performance of 3D processor and 3D cache at different node is simulated using 3D Cacti tools and theoretical algorithms. The results show that comparing with 2D, power consumption of the storage system is reduced by about 50%, access time and cycle time of the processor increase 18.57% and 21.41%, respectively. The reduced percentage of the critical path delay is up to 81.17%. 相似文献

3.

Bandwidth optimization for Internet traffic in generalized processor sharing servers

Ju Yong Lee Sunggon Kim Deokseong Kim Dan Keun Sung 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(4):324-334

Bandwidth optimization is considered when several classes of Internet traffic are served in generalized processor sharing (GPS) servers. Internet traffic shows self-similar patterns that make it difficult to obtain analytical performance in GPS. Thus, for performance estimation of different classes of traffic, we use fluid simulation techniques that can reduce the simulation complexity, compared to packet-level simulation. Using the relationship between the guaranteed bandwidth vector and the corresponding performance, we propose a bandwidth optimization problem to minimize the total bandwidth such that performance requirements are satisfied. We use an exterior penalty function method to solve the optimization problem. However, a penalized objective function may have local minimum which is not a global minimum. Thus, we propose a new methodology to circumvent the limitation of the exterior penalty function method. 相似文献

4.

Optimizing memory access traffic via runtime thread migration for on-chip distributed memory systems

Weiwei Fu Tianzhou Chen Chao Wang Li Liu 《The Journal of supercomputing》2014,69(3):1491-1516

On-chip distributed memory system has become an attractive solution for massive parallel memory accesses found in future many-core processors. However, increasing number of on-chip cores and memory controllers inevitably introduce many remote memory accesses, which generate a large amount of on-chip traffic and put great pressure on the interconnection. This paper tries to optimize on-chip memory access traffic via runtime thread migration. We first analyze memory access behaviors in multi-threaded applications and find that the memory access targets and volumes are similar during short periods, which makes runtime prediction feasible. But the memory access targets exhibit great mobility during long periods, motivating us to dynamically move threads towards the data. Based on these observations, we propose a novel low-cost and distributed thread migration algorithm which adjusts thread placement in chains based on benefit estimation. We present details of the workflow, including the trigger and arbitration of migration requests and the procedures to determine the migration chains. Simulation results show that our algorithm achieves system performance speedup of 11.5 % and reduces average memory access latency by 11.0 %. It can find a few but effective thread migrations to optimize on-chip memory access traffic with acceptable hardware and runtime overheads. 相似文献

5.

A heuristic flow-decomposition approach for generalized processor sharing under self-similar traffic

Xiaolong Jin Geyong Min 《Journal of Computer and System Sciences》2008,74(6):1055-1066

The well-known Generalized Processor Sharing (GPS) scheduling principle and its variants have received tremendous research efforts due to their appealing properties of fairness, traffic isolation, and work conservation. Traffic self-similarity is highly detrimental to the performance of scheduling mechanisms and communication networks. This paper proposes a novel and heuristic flow-decomposition approach to performance modeling of the GPS system under self-similar traffic. Based on the comprehensive analysis of the excess service sharing behavior of traffic flows, we decompose the GPS system equivalently into a group of single-server single-queue systems. Extensive simulation experiments are used to validate the correctness of the proposed flow-decomposition approach as well as the analytical performance results. 相似文献

6.

On time mapping of uniform dependence algorithms into lowerdimensional processor arrays

Shang W. Fortes J.A.B. 《Parallel and Distributed Systems, IEEE Transactions on》1992,3(3):350-363

Most existing methods of mapping algorithms into processor arrays are restricted to the case where n-dimensional algorithms, or algorithms with n nested loops, are mapped into (n-1)-dimensional arrays. However, in practice, it is interesting to map n-dimensional algorithms into (k-1)-dimensional arrays where k<n. A computational conflict occurs if two or more computations of an algorithm are mapped into the same execution time. Based on the Hermite normal form of the mapping matrix, necessary and sufficient conditions are derived to identify mapping without computational conflicts. These conditions are used to find time mappings of n-dimensional algorithms into (k-1)-dimensional arrays, k<n , without computational conflicts. For some applications, the mapping is time-optimal 相似文献

7.

Network internal traffic characterization and end-to-end delay bound calculus for generalized processor sharing scheduling discipline

F. P. R. J. 《Computer Networks》2005,48(6):910-940

相似文献

8.

Non-destructive on-chip imaging flow cell-sorting system for on-chip cellomics

Kenji Yasuda Akihiro Hattori Hyonchol Kim Hideyuki Terazono Masahito Hayashi Hiroyuki Takei Tomoyuki Kaneko Fumimasa Nomura 《Microfluidics and nanofluidics》2013,14(6):907-931

We have developed a non-destructive imaging flow cell-sorting system using an ultra-high-speed camera (shutter speed of 1/10,000 s) with a real-time image analysis unit and a poly(methyl methacrylate) (PMMA)-based disposable microfluidic chip for single-cell-based on-chip cellomics. It has a 3-D micropipetting device that supports fully automated sorting and collection of samples. The entire fluidic system is implemented in a disposable plastic chip, enabling biological samples to be lined up in a laminar flow using hydrodynamic focusing. Its optical system enables direct observation-based cell identification using specific image indexes and phase-contrast/fluorescence microscopy, real-time image processing. It has a non-destructive, wider dynamic range, sorting procedure using mild electrostatic force in a laminar flow; agarose gel electrodes are used to prevent electrode loss and electrolysis bubble formation. The microreservoir used for recultivating collected target cells is contamination-free. An integrated ultra-high-speed droplet polymerase chain reaction measurement module is used for DNA/mRNA analysis of the collected target cells. This system was used to separate cardiomyocyte cells from a mixture of various cells. All the operations were automated using the 3-D micropipetting device. The results demonstrate that this imaging flow cell-sorting system is practically applicable for biological research and clinical diagnosis. 相似文献

9.

Toward on-chip datacenters: a perspective on general trends and on-chip particulars

Miray Kas 《The Journal of supercomputing》2012,62(1):214-226

Due to economical reasons, the traditional philosophy in data centers was to scale out, rather than scaling up. However, the advances in CMP technology enabled chip multiprocessors to become more prevalent and they are expected to become more affordable and power-efficient in the coming years. Current trend towards more densely packaged systems and increasing demand for higher performance push the market towards placing datacenters on highly powerful chips that have many cores on a single platform. However, increasing the number of cores on a single chip brings along very important problems to be addressed at the chip level regarding the use of shared resources and QoS satisfaction. After briefly exploring current datacenter perspective, this paper captures the current state of the art in the field of chip multiprocessors through a detailed discussion of different studies that pave the way to the datacenters on-chip. Finally, a number of open research issues are highlighted with the intention of inspiring new contributions and developments in the field of datacenters on-chip. 相似文献

10.

Comparisons of air traffic control implementations on an associative processor with a MIMD and consequences for parallel computing

Man Yuan Johnnie W. Baker Will C. Meilander 《Journal of Parallel and Distributed Computing》2013

This paper has two complementary focuses. The first is the system design and algorithmic development for air traffic control (ATC) using an associative SIMD processor (AP). The second is the comparison of this implementation with a multiprocessor implementation and the implications of these comparisons. This paper demonstrates how one application, ATC, can more easily, more simply, and more efficiently be implemented on an AP than is generally possible on other types of traditional hardware. The AP implementation of ATC will take advantage of its deterministic hardware to use static scheduling. The software will be dramatically smaller and cheaper to create and maintain. Likewise, a large AP system will be considerably simpler and cheaper than the MIMD hardware currently used. While APs were used for ATC-type applications earlier, these are no longer available. We use a ClearSpeed CSX600 accelerator to emulate the AP solutions of ATC on an ATC prototype consisting of eight data-intensive ATC real-time tasks. Its performance is compared with an 8-core multiprocessor (MP) using OpenMP. Our extensive experiments show that the AP implementation meets all deadlines while the MP will regularly miss a large number of deadlines. The AP code will be similar in size to sequential code for the same tasks and will avoid all of the additional support software needed with an MP to handle dynamic scheduling, load balancing, shared resource management, race conditions, false sharing, etc. At this point, essentially only MIMD systems are built. Many of the advantages of using an AP to solve an ATC problem would carry over to other applications. AP solutions for a wide variety of applications will be cited in this paper. Applications that involve a high degree of data parallelism such as database management, text processing, image processing, graph processing, bioinformatics, weather modeling, managing UAS (Unmanned Aircraft Systems or drones) etc., are good candidates for AP solutions. This raises the issue of whether we should routinely consider using non-multiprocessor hardware like the AP for applications where substantially simpler software solutions will normally exist. It also raises the question of whether the use of both AP and MIMD hardware in a single hetergeneous system could provide more versatility and efficiency. Either the AP or MIMD could serve as the primary system, but could hand off jobs it could not handle efficiently to the other system. 相似文献

11.

Criticisms of modelling packet traffic using long-range dependence (extended version)

Richard G. Clegg Raul Landa Miguel Rio 《Journal of Computer and System Sciences》2011,77(5):861-868

This paper criticises the notion that long-range dependence is an important contributor to the queuing behaviour of real Internet traffic. The idea is questioned in two different ways. Firstly, a class of models used to simulate Internet traffic is shown to have important theoretical flaws. It is shown that this behaviour is inconsistent with the behaviour of real traffic traces. Secondly, the notion that long-range correlations significantly affects the queuing performance of traffic is investigated by destroying those correlations in real traffic traces (by reordering). It is shown that the longer ranges of correlations are not important to mean queue length except in one case with an extremely high load. 相似文献

12.

面向并行DSP应用的双路由多层Mesh结构研究

袁涛樊晓桠荆元利《计算机工程与应用》2007,43(6):88-91,107

论文提出了面向并行DSP应用并具备基于QoS控制路由器极大功耗的双路由多层Mesh片上网络互连结构,仿真结果表明该结构与单路由Mesh结构相比可减少功耗31.4%、时延30.6%。相似文献

13.

片上TCAM的研究和应用

许俊龚源泉李丽《微型机与应用》2013,(19):69-71,75

为了提高片上TCAM的摆放密度和降低功耗,基于IBM 32 nm工艺库提供的TCAM的特性和优先编码器硬核,设计出同时满足多个查找宽度的外围控制电路。相比于之前的设计和实现,该设计可以减少TCAM的块数和相关寄存器的数量,减少片上TCAM的摆放面积,降低芯片的整体功耗。该设计已经成功应用于公司第4代路由交换ASIC芯片上。相似文献

14.

Dual-micro processor

《Microprocessors and Microsystems》1981,5(6):268

相似文献

15.

Text processor

《Computer Communications》1978,1(5):281

相似文献

16.

Communications processor

《Computer Communications》1979,2(4):199

相似文献

17.

AMBA片内总线结构的设计 总被引：8，自引：2，他引：8

张庆利王进祥叶以正朱昌盛《微处理机》2002,(2):7-10

对AMBA片内总线通讯协议进行简要介绍之后，采用Top-Down设计方法完成了AMBA片内总线结构所有控制部件的RTL级硬件建模，并通过逻辑综合、优化得到了门级电路网表。经验证，符合AMBA规范，频率达100MHz。相似文献

18.

片上逻辑分析仪的设计

张小林姜大力李华旺杨根庆《计算机测量与控制》2008,16(3):430-432

提出了一种用在FPGA上实现的片上逻辑分析仪的设计方案;随着FPGA的规模的增大,在其内部可以实现复杂的SoC设计,但是I/O端口数量有限,采用VHDL设计,可以在源代码级插入到设计中,这也使得它与FPGA的器件类型和开发软件保持独立,它可以对FPGA内部的任何信号和复杂的事件进行追踪,采样的结果保存在通过片上的同步RAM实现的循环跟踪缓存区,通过AMBAAPB总线接口完成对触发引擎控制和缓存区的读写;这种实现方案的逻辑分析仪占用资源小,可以达到的频率高,可广泛应用到基于AMBA总线的SoC设计中;最后,对可改进的方向进行了分析。相似文献

19.

An on-chip self-repair calculation and fusing methodology

Anand D. Cowan B. Farnsworth O. Jakobsen P. Oakland S. Ouellette M.R. Wheater D.L. 《Design & Test of Computers, IEEE》2003,20(5):67-75

Laser fusing is a standard technique for improving yield with memory reconfiguration and repair, but implementing fusing in production can be challenging and costly. This article introduces an electrically programmable polysilicon fuse and shows how it can reduce fuse area and programming complexity. 相似文献

20.

Asynchronous processor survey 总被引：1，自引：0，他引：1

Werner T. Akella V. 《Computer》1997,30(11):67-76

Virtually all computers today are synchronous. As systems grow increasingly large and complex the clock can cause big problems with clock skew, a timing delay that can create havoc with the overall design. It can also increase the circuit silicon and power dissipation, which can affect overheating and power supplies. Computer architecture researchers are actively considering asynchronous processor design. Asynchronous architectures permit modular design. Each subsystem or functional block can be optimized without being synchronized to a global clock, which simplifies interfacing. Moreover, an asynchronous system exhibits the average performance of all the individual components, rather than the synchronous system's worst-case performance of a single component. Furthermore, asynchronous processors may yet prove to offer reduced power dissipation by inherently shutting down unused portions of the circuit. This article examines the key architecture issues that concern designers and compares six developmental asynchronous architectures: CAP, the Caltoch Asynchronous Processor; FAM, the Fully Asynchronous Microprocessor; NSR, the Nonsynchronous RISC; CFPP, the Counterflow Pipeline Processor; Strip, a Self-Assured RISC Processor; and Amulet 1 相似文献