首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
Complex network protocols and various network services require significant processing capability for modern network applications. One of the important features in modern networks is differentiated service. Along with differentiated service, rapidly changing network environments result in congestion problems. In this paper, we analyze the characteristics of representative congestion control applications-scheduling and queue management algorithms, and we propose application-specific acceleration techniques that use instruction-level parallelism (ILP) and packet-level parallelism (PLP) in these applications. From the PLP perspective, we propose a hardware acceleration model based on detailed analysis of congestion control applications. In order to get large throughputs, a large number of processing elements (PEs) and a parallel comparator are designed. Such hardware accelerators provide large parallelism proportional to the number of processing elements added. A 32-PE enhancement yields 24/spl times/ speedup for weighted fair queueing (WFQ) and 27/spl times/ speedup for random early detection (RED). For ILP, new instruction set extensions for fast conditional operations are applied for congestion control applications. Based on our experiments, proposed architectural extensions show 10%-12% improvement in performance for instruction set enhancements. As the performance of general-purpose processors rapidly increases, defining architectural extensions (e.g., multi-media extensions (MMX) as in multimedia applications) for general-purpose processors could be an alternative solution for a wide range of network applications.  相似文献   

2.
Optimization flow control. I. Basic algorithm and convergence   总被引:4,自引:0,他引:4  
We propose an optimization approach to flow control where the objective is to maximize the aggregate source utility over their transmission rates. We view network links and sources as processors of a distributed computation system to solve the dual problem using a gradient projection algorithm. In this system, sources select transmission rates that maximize their own benefits, utility minus bandwidth cost, and network links adjust bandwidth prices to coordinate the sources' decisions. We allow feedback delays to be different, substantial, and time varying, and links and sources to update at different times and with different frequencies. We provide asynchronous distributed algorithms and prove their convergence in a static environment. We present measurements obtained from a preliminary prototype to illustrate the convergence of the algorithm in a slowly time-varying environment. We discuss its fairness property.  相似文献   

3.
徐扬  唐毅  文振煙  刘斌 《电子学报》2007,35(10):1809-1816
调度算法是决定交换结构性能和实现复杂度的重要因素,极大匹配算法在这两方面存在不足.本文提出一类广义极大匹配(EMM)算法,使用不同权值参数能够派生出不同子类的算法.对广义极大匹配算法的研究从两方面展开,首先在2倍数据加速比下证明任何EMM(2)算法都能取得100%的吞吐量,并通过仿真表明能够取得与理想输出排队相近的延时性能;其次在没有加速比的条件下通过仿真表明具有2个以上权值参数的广义极大匹配算法能够大大提高极大匹配算法的吞吐量性能.  相似文献   

4.
Partial computation elimination techniques are often used for fast template matching. At a particular search location, computations are prematurely terminated as soon as it is found that this location cannot compete with an already known best match location. Due to the nonmonotonic growth pattern of the correlation-based similarity measures, partial computation elimination techniques have been traditionally considered inapplicable to speed up these measures. In this paper, we show that partial elimination techniques may be applied to a correlation coefficient by using a monotonic formulation, and we propose basic-mode and extended-mode partial correlation elimination algorithms for fast template matching. The basic-mode algorithm is more efficient on small template sizes, whereas the extended mode is faster on medium and larger templates. We also propose a strategy to decide which algorithm to use for a given data set. To achieve a high speedup, elimination algorithms require an initial guess of the peak correlation value. We propose two initialization schemes including a coarse-to-fine scheme for larger templates and a two-stage technique for small- and medium-sized templates. Our proposed algorithms are exact, i.e., having exhaustive equivalent accuracy, and are compared with the existing fast techniques using real image data sets on a wide variety of template sizes. While the actual speedups are data dependent, in most cases, our proposed algorithms have been found to be significantly faster than the other algorithms.  相似文献   

5.
Dynamic voltage scaling has been widely acknowledged as a powerful technique for trading off power consumption and delay for processors. Recently, variable-frequency (and variable-voltage) parallel and serial links have also been proposed, which can save link power consumption by exploiting variations in the bandwidth requirement. This provides a new dimension for power optimization in a distributed embedded system connected by a voltage-scalable interconnection network. At the same time, it imposes new challenges for variable-voltage scheduling as well as flow control. First, the variable-voltage scheduling algorithm should be able to trade off the power consumption and delay jointly for both processors and links. Second, for the variable-frequency network, the scheduling algorithm should not only consider the real-time constraints, but should also be consistent with the underlying flow control techniques. In this paper, we address joint dynamic voltage scaling for variable-voltage processors and communication links in such systems. We propose a scheduling algorithm for real-time applications that captures both data flow and control flow information. It performs efficient routing of communication events through multihops, as well as efficient slack allocation among heterogeneous processors and communication links to maximize energy savings, while meeting all real-time constraints. Our experimental study shows that on an average, joint voltage scaling on processors and links can achieve 32% less power compared with voltage scaling on processors alone  相似文献   

6.
In this paper, we consider multiprocessor implementation of real-time recursive digital signal processing algorithms. The objective is to devise a periodic schedule, and a fully static task assignment scheme to meet the desired throughput rate while minimizing the number of processors. Toward this goal, we propose a notion called cutoff time. We prove that the minimum-processor schedule can be found within a finite time interval bounded by the cutoff time. As such the complexity of the scheduling algorithm and the allocation algorithm can be significantly reduced. Next, taking advantage of the cutoff time, we derive efficient heuristic algorithms which promise better performance and less computation complexity compared to other existing algorithms. Extensive benchmark examples are tested which yield most encouraging results  相似文献   

7.
Eun  S. Kim  J.S. Maeng  S.R. Yoon  H. 《Electronics letters》1993,29(7):609-611
It has been frequently reported that the Hopfield neural network operating in discrete-time and parallel update mode will not converge to a stable state, which inhibits the parallel execution of the model. The authors propose a systolic array algorithm for the parallel simulation of the Hopfield neural network which guarantees the convergence of the network and achieves linear speedup as the number of processors is increased.<>  相似文献   

8.
This paper considers wireless networks where communication links are unstable and link interference is a challenge to design high performance scheduling algorithms. Wireless links are time varying and are modeled by Markov stochastic processes. The problem of designing an optimal link scheduling algorithm to maximize the expected reliability of the network is formulated into a Markov Decision Process first. The optimal solution can be obtained by the finite backward induction algorithm. However, the time complexity is very high. Thus, we develop an approximate link scheduling algorithm with an approximate ratio of \(2(N - 1)(r_{M}\Delta - r_{m} \delta ),\) where N is the number of decision epochs, r M is the maximum link reliability, r m is the minimum link reliability, Δ is the number of links in the largest maximal independent set and δ is the number of links in the smallest maximal independent set. Simulations are conducted in different scenarios under different network topologies.  相似文献   

9.
Many commercially available embedded processors are capable of extending their base instruction set for a specific domain of applications. While steady progress has been made in the tools and methodologies of automatic instruction set extension for configurable processors, the limited data bandwidth available in the core processor (e.g., the number of simultaneous accesses to the register file) becomes a potential performance bottleneck. In this paper, we first present a quantitative analysis of the data bandwidth limitation in configurable processors, and then propose a novel low-cost architectural extension and associated compilation techniques to address the problem. Specifically, we embed a single control bit in the instruction op-codes to selectively copy the execution results to a set of hash-mapped shadow registers in the write-back stage. This can efficiently reduce the communication overhead due to data transfers between the core processor and the custom logic. We also present a novel simultaneous global shadow register binding with a hash function generation algorithm to take full advantage of the extension. The application of our approach leads to a nearly optimal performance speedup  相似文献   

10.
This paper deals with the problem of one-to-one mapping of 2n task modules of a parallel program to an n-dimensional hypercube multicomputer so as to minimize the total communication cost during the execution of the task. The problem of finding an optimal mapping has been proven to be NP-complete. First we show that the mapping problem in a hypercube multicomputer can be transformed into the problem of finding a set of maximum cutsets on a given task graph using a graph modification technique. Then we propose a repeated mapping scheme, using an existing graph bipartitioning algorithm, for the effective mapping of task modules onto the processors of a hypercube multicomputer. The repeated mapping scheme is shown to be highly effective on a number of test task graphs; it increasingly outperforms the greedy and recursive mapping algorithms as the number of processors increases. Our repeated mapping scheme is shown to be very effective for regular graphs, such as hypercube-isomorphic or ‘almost’ isomorphic graphs and meshes; it finds optimal mappings on almost all the regular task graphs considered.  相似文献   

11.
The singular value decomposition (SVD) of complex matrices is computed in a highly parallel fashion on a square array of processors using Kogbetliantz's analog of Jacobi's eigenvalue decomposition method. To gain further speed, new algorithms for the basic SVD operations are proposed and their implementation as specialized processors is presented. The algorithms are 3-D and 4-D extensions of the CORDIC algorithm for plane rotations. When these extensions are used in concert with an additive decomposition of 2×2 complex matrices, which enhances parallelism, and with low resolution rotations early on in the SVD process, which reduce operation count, a fivefold speedup can be achieved over the fastest alternative approach  相似文献   

12.
Generally, the channel-assignment problem (CAP) for mobile cellular systems is solved by graph-coloring algorithms. These algorithms, though sometimes can yield an optimal solution, do not supply any information on whether an optimal solution has been found or bow far away it is from the optimum. In view of these undesirable features, two relevant results are presented. First, a lower bound for the minimum number of channels required to satisfy a given call-traffic demand is derived. This lower bound is tighter than the existing ones under certain conditions and can be used as a supplement for other approximate algorithms. Second, we propose an efficient heuristic algorithm to solve this problem. Although the CAP is nondeterministic polynomial (NP) complete in general, our algorithm provides an optimal solution for a special class of network topologies. For the general case, promising results are obtained, and numerical examples show that our algorithm has a better performance than many existing algorithms  相似文献   

13.
We consider a network control problem for wireless networks with flow level dynamics under the general k-hop interference model. In particular, we investigate the control problem in low load and high load regimes. In the low load regime, we show that the network can be stabilized by a regulated maximal scheduling policy considering flow level dynamics if the offered load satisfies a constraining bound condition. Because maximal scheduling is a general scheduling rule whose implementation is not specified, we propose a constant-time and distributed scheduling algorithm for a general k-hop interference model which can approximate the maximal scheduling policy within an arbitrarily small error. Under the stability condition, we show how to calculate transmission rates for different user classes such that the long-term (time average) network utility is maximized. This long-term network utility captures the real network performance due to the fact that under flow level dynamics, the number of users randomly change so instantaneous network utility maximization does not result in useful network performance. Our results imply that congestion control is unnecessary when the offered load is low and optimal user rates can be determined to maximize users’ long-term satisfaction. In the high load regime where the network can be unstable under the regulated maximal scheduling policy, we propose a cross-layer congestion control and scheduling algorithm which can stabilize the network under arbitrary network load. Through extensive numerical analysis for some typical networks, we show that the proposed scheduling algorithm has much lower overhead than other existing queue-length-based constant-time scheduling schemes in the literature, and it achieves performance much better than the guaranteed bound.  相似文献   

14.
Given a wireless network where some pairs of communication links interfere with each other, we study sufficient conditions for determining whether a given set of minimum bandwidth quality-of-service requirements can be satisfied. We are especially interested in algorithms which have low communication overhead and low processing complexity. The interference in the network is modeled using a conflict graph whose vertices correspond to the communication links in the network. Two links are adjacent in this graph if and only if they interfere with each other due to being in the same vicinity and hence cannot be simultaneously active. The problem of scheduling the transmission of the various links is then essentially a fractional, weighted vertex coloring problem, for which upper bounds on the fractional chromatic number are sought using only localized information. We recall some distributed algorithms for this problem, and then assess their worst-case performance. Our results on this fundamental problem imply that for some well known classes of networks and interference models, the performance of these distributed algorithms is within a bounded factor away from that of an optimal, centralized algorithm. The performance bounds are simple expressions in terms of graph invariants. It is seen that the induced star number of a network plays an important role in the design and performance of such networks.  相似文献   

15.
FPGA-based configurable computing machines are evolving rapidly in large signal processing applications due to flexibility and high performance. In this paper, given a reconfigurable processing unit (RPU) with a logic capacity of ARPU and a computational task represented by a data flow graph G = (V, E, W), we propose a network flow-based multiway task partitioning algorithm to minimize communication costs for temporal partitioning. The proposed algorithm obtains an optimal solution with minimum interconnection under area constraints. The optimal solution is a cut set. In our approach, two techniques are applied. In the initial partition, any feasible min-cut is produced by the proposed network flow-based algorithm, so a set of feasible min-cuts is obtained. From the feasible solutions, the scheduling technique selects an optimal global solution.  相似文献   

16.
Hardware implementation of data compression algorithms is receiving increasing attention due to exponentially expanding network traffic and digital data storage usage. In this paper, we propose several serial one-dimensional and parallel two-dimensional systolic-arrays for Lempel-Ziv data compression. A VLSI chip implementing our optimal linear array is fabricated and tested. The proposed array architecture is scalable. Also, multiple chips (linear arrays) can be connected in parallel to implement the parallel array structure and provide a proportional speedup  相似文献   

17.
Parallel image processing with the block data parallel architecture   总被引:2,自引:0,他引:2  
Many digital signal and image processing algorithms can be speeded up by executing them in parallel on multiple processors. The speed of parallel execution is limited by the need for communication and synchronization between processors. In this paper, we present a paradigm for parallel processing that we call the block data flow paradigm (BDFP). The goal of this paradigm is to reduce interprocessor communication and relax the synchronization requirements for such applications. We present the block data parallel architecture which implements this paradigm, and we present methods for mapping algorithms onto this architecture. We illustrate this methodology for several applications including two-dimensional (2-D) digital filters, the 2-D discrete cosine transform, QR decomposition of a matrix and Cholesky factorization of a matrix. We analyze the resulting system performance for these applications with regard to speedup and efficiency as the number of processors increases. Our results demonstrate that the block data parallel architecture is a flexible, high-performance solution for numerous digital signal and image processing algorithms  相似文献   

18.
The shared-medium multihop nature of wireless ad hoc networks poses fundamental challenges to the design of effective resource allocation algorithms that are optimal with respect to resource utilization and fair across different network flows. None of the existing resource allocation algorithms in wireless ad hoc networks have realistically considered end-to-end flows spanning multiple hops. Moreover, strategies proposed in wireline networks are not applicable in the context of wireless ad hoc networks, due to their unique characteristics of location-dependent contention. In this paper, we propose a new price-based resource allocation framework in wireless ad hoc networks to achieve optimal resource utilization and fairness among competing end-to-end flows. We build our pricing framework on the notion of maximal cliques in wireless ad hoc networks, as compared to individual links in traditional wide-area wireline networks. Based on such a price-based theoretical framework, we present a two-tier iterative algorithm. Distributed across wireless nodes, the algorithm converges to a global network optimum with respect to resource allocations. We further improve the algorithm toward asynchronous network settings and prove its convergence. Extensive simulations under a variety of network environments have been conducted to validate our theoretical claims.  相似文献   

19.
This survey paper reviews numerous high-level transformation techniques which can be applied at the algorithm or the architecture level to improve the performance of digital signal and image processing architectures and circuits implemented using VLSI technology. Successful design of VLSI signal and image processors requires careful selection of algorithms, architectures, implementation styles, and synthesis techniques. High-level transformations can play an important role in reducing silicon area or power at the same speed or in increasing the speed for same area. These transformations can also increase the suitability of an algorithm for a particular architectural style. The transformation techniques reviewed in this paper include pipelining, parallel processing, retiming, unfolding, folding, look-ahead, relaxed look-ahead, associativity, distributivity, and reduction in strength.  相似文献   

20.
A fast recursive algorithm for the vertical Bell Laboratories layered space-time (V-BLAST) with the optimal ordered successive interference cancellation (SIC) detection has been proposed by Benesty-Huang-Chen and two improved algorithms have been also recently introduced by Szczeci ′nski-Massicotte and Zhu-Lei-Chin independently. In this letter, our first contribution is to incorporate the existing improvements into the original fast recursive algorithm to provide the fastest known algorithm for the optimal ordered SIC detection of V-BLAST. Subsequently, we propose two new algorithms that result from one further improvement for the fast recursive algorithm. Compared with the fastest known algorithm built from the existing improvements, one new proposed algorithm has a speedup of 1.3 times in both the number of multiplications and the number of additions, and the other new proposed algorithm requires less intermediate variables and saves memories.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号