期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Reducing energy consumption of parallel sparse matrix applications through integrated link/CPU voltage scaling

Seung Woo Son Konrad Malkowski Guilin Chen Mahmut Kandemir Padma Raghavan 《The Journal of supercomputing》2007,41(3):179-213

Reducing power consumption is quickly becoming a first-class optimization metric for many high-performance parallel computing platforms. One of the techniques employed by many prior proposals along this direction is voltage scaling and past research used it on different components such as networks, CPUs, and memories. In contrast to most of the existent efforts on voltage scaling that target a single component (CPU, network or memory components), this paper proposes and experimentally evaluates a voltage/frequency scaling algorithm that considers CPU and communication links in a mesh network at the same time. More specifically, it scales voltages/frequencies of CPUs in the nodes and the communication links among them in a coordinated fashion (instead of one after another) such that energy savings are maximized without impacting execution time. Our experiments with several tree-based sparse matrix computations reveal that the proposed integrated voltage scaling approach is very effective in practice and brings 13% and 17% energy savings over the pure CPU and pure communication link voltage scaling schemes, respectively. The results also show that our savings are consistent with the different network sizes and different sets of voltage/frequency levels.

Padma RaghavanEmail:

相似文献

2.

Spatial Query Estimation without the Local Uniformity Assumption

Yufei Tao Christos Faloutsos Dimitris Papadias 《GeoInformatica》2006,10(3):261-293

Existing estimation approaches for spatial databases often rely on the assumption that data distribution in a small region is uniform, which seldom holds in practice. Moreover, their applicability is limited to specific estimation tasks under certain distance metric. This paper develops the Power-method, a comprehensive technique applicable to a wide range of query optimization problems under both L_∞ and L₂ metrics. The Power-method eliminates the local uniformity assumption and is, therefore, accurate even for datasets where existing approaches fail. Furthermore, it performs estimation by evaluating only one simple formula with minimal computational overhead. Extensive experiments confirm that the Power-method outperforms previous techniques in terms of accuracy and applicability to various optimization scenarios.

Yufei TaoEmail:

相似文献

3.

Panconnectivity and edge-pancyclicity of 3-ary <Emphasis Type="Italic">N</Emphasis>-cubes

Sun-Yuan Hsieh Tsong-Jie Lin Hui-Ling Huang 《The Journal of supercomputing》2007,42(2):225-233

We study two topological properties of the 3-ary n-cube Q _n³. Given two arbitrary distinct nodes x and y in Q _n³, we prove that there exists an x–y path of every length ranging from d(x,y) to 3ⁿ−1, where d(x,y) is the length of a shortest path between x and y. Based on this result, we prove that Q _n³ is edge-pancyclic by showing that every edge in Q _n³ lies on a cycle of every length ranging from 3 to 3ⁿ.

Hui-Ling HuangEmail:

相似文献

4.

Reconfigurable hardware solution to parallel prefix computation

Jin Hwan Park H. K. Dai 《The Journal of supercomputing》2008,43(1):43-58

This paper presents the design and implementation of an efficient reconfigurable parallel prefix computation hardware on field-programmable gate arrays (FPGAs). The design is based on a pipelined dataflow algorithm, and control logic is added to reconfigure the system for arbitrary parallelism degree. The system receives multiple input streams of elements in parallel and produces output streams in parallel. It has an advantage of controlling the degree of parallelism explicitly at run time. The time complexity of the design is O(d+(N−d)/d), where d and N are parallelism degree and stream size, respectively. When the stream size is sufficiently larger than the initial trigger time of the pipeline (d), the time complexity becomes O(N/d). Unlike the prefix computation circuits found in the literature, the design is scalable for different problem sizes including unknown sized data. The design is modular based on a finite state machine, and implemented and tested for target FPGA devices Xilinx Spartan2S XC2S300EFT256-6Q and XC2S600EFG676-6.

H. K. DaiEmail:

相似文献

5.

Profile-based dynamic pipeline scaling

Kuan-Wei Cheng Tzong-Yen Lin Rong-Guey Chang 《The Journal of supercomputing》2009,48(2):210-226

Low power has played an increasingly important role for embedded systems. To save power, lowering voltage and frequency is very straightforward and effective; therefore, dynamic voltage scaling (DVS) has become a prevalent low-power technique. However, DVS makes no effect on power saving when the voltage reaches a lower bound. Fortunately, a technique called dynamic pipeline scaling (DPS) can overcome this limitation by switching pipeline modes at low-voltage level. Approaches proposed in previous work on DPS were based on hardware support. From viewpoint of compiler, little has been addressed on this issue. This paper presents a DPS optimization technique at compiler time to reduce power dissipation. The useful information of an application is exploited to devise an analytical model to assess the cost of enabling DPS mechanism. As a consequence, we can determine the switching timing between pipeline modes at compiler time without causing significant run-time overhead. The experimental result shows that our approach is effective in reducing energy consumption.

Rong-Guey Chang (Corresponding author)Email:

相似文献

6.

A metric space for computer programs and the principle of computational least action

Robert W. Numrich 《The Journal of supercomputing》2008,43(3):281-298

We define a normed metric space for computer programs and derive from it the Principle of Computational Least Action. In our model, programs follow trajectories determined by Newton’s equation of motion in an abstract computational phase space and generate computational action as they evolve. A program’s action norm is the L ₁-norm of its action function, and its distance from other programs is the distance derived from the action norm. The Principle of Computational Least Action states the goal of performance optimization as finding the program with the smallest action norm. We illustrate this principle by analyzing a simple program.

Robert W. NumrichEmail:

相似文献

7.

Recognising Algebraic Surfaces from Two Outlines 总被引：1，自引：1，他引：0

Simon Collings Ryszard Kozera Lyle Noakes 《Journal of Mathematical Imaging and Vision》2008,30(2):181-193

Photographic outlines of 3 dimensional solids are robust and rich in information useful for surface reconstruction. This paper studies algebraic surfaces viewed from 2 cameras with known intrinsic and extrinsic parameters. It has been known for some time that for a degree d=2 (quadric) algebraic surface there is a 1-parameter family of surfaces that reproduce the outlines. When the algebraic surface has degree d>2, we prove a new result: that with known camera geometry it is possible to completely reconstruct an algebraic surface from 2 outlines i.e. the coefficients of its defining polynomial can be determined in a known coordinate frame. The proof exploits the existence of frontier points, which are calculable from the outlines. Examples and experiments are presented to demonstrate the theory and possible applications.

Simon CollingsEmail:

相似文献

8.

Computing the Discrete Fourier Transform on a Hexagonal Lattice

Andrew Vince Xiqiang Zheng 《Journal of Mathematical Imaging and Vision》2007,28(2):125-133

The computation of the Discrete Fourier Transform for a general lattice in ℝ^d can be reduced to the computation of the standard 1-dimensional Discrete Fourier Transform. We provide a mathematically rigorous but simple treatment of this procedure and apply it to the DFT on the hexagonal lattice.

Xiqiang ZhengEmail:

相似文献

9.

Subspace sums for extracting non-random data from massive noise 总被引：1，自引：1，他引：0

Anne M. Denton 《Knowledge and Information Systems》2009,20(1):35-62

An algorithm is introduced that distinguishes relevant data points from randomly distributed noise. The algorithm is related to subspace clustering based on axis-parallel projections, but considers membership in any projected cluster of a given side length, as opposed to a particular cluster. An aggregate measure is introduced that is based on the total number of points that are close to the given point in all possible 2^d projections of a d-dimensional hypercube. No explicit summation over subspaces is required for evaluating this measure. Attribute values are normalized based on rank order to avoid making assumptions on the distribution of random data. Effectiveness of the algorithm is demonstrated through comparison with conventional outlier detection on a real microarray data set as well as on time series subsequence data.

Anne M. DentonEmail:

相似文献

10.

Complexity of Computing the Local Dimension of a Semialgebraic Set

Nicolai Vorobjov 《Journal of Symbolic Computation》1999,27(6):2346

The paper describes several algorithms related to a problem of computing the local dimension of a semialgebraic set. Let a semialgebraic set V be defined by a system of k inequalities of the formf ≥ 0 with f R [ X₁, ,X_n ], deg(f) < d , andx V . An algorithm is constructed for computing the dimension of the Zariski tangent space to V at x in time (kd)^O(n). Let x belong to a stratum of codimension l_xin V with respect to a smooth stratification ofV . Another algorithm computes the local dimension dim_x(V) with the complexity (k(l_x + 1)d)^O(l_x²n). Ifl = max_x Vl_x, and for every connected component the local dimension is the same at each point, then the algorithm computes the dimension of every connected component with complexity (k(l + 1)d)^O(l²n). If V is a real algebraic variety defined by a system of equations, then the complexity of the algorithm is less thankd^O(l²n) , and the algorithm also finds the dimension of the tangent space to V at x in time kd^O(n). Whenl is fixed, like in the case of a smooth V , the complexity bounds for computing the local dimension are (kd)^O(n)andkd^O(n) respectively. A third algorithm finds the singular locus ofV in time (kd)^O(n²). 相似文献

11.

Uniqueness of the Perfect Fusion Grid on ℤ d

Jean Cousty Gilles Bertrand 《Journal of Mathematical Imaging and Vision》2009,34(3):291-306

Region merging methods consist of improving an initial segmentation by merging some pairs of neighboring regions. In a graph, merging two regions, separated by a set of vertices, is not straightforward. The perfect fusion graphs defined in J. Cousty et al. (J. Math. Imaging Vis. 30:(1):87–104, 2008) verify all the basic properties required by region merging algorithms as used in image segmentation. Unfortunately, the graphs which are the most frequently used in image analysis (namely, those induced by the direct and the indirect adjacency relations) are not perfect fusion graphs. The perfect fusion grid, introduced in the above mentioned reference, is an adjacency relation on ℤ^d which can be used in image analysis, which indeed induces perfect fusion graphs and which is “between” the graphs induced by the direct and the indirect adjacencies. One of the main results of this paper is that the perfect fusion grid is the only such graph whatever the dimension d.

Gilles BertrandEmail:

相似文献

12.

Lower Bounds and Separations for Constant Depth Multilinear Circuits

Ran Raz Amir Yehudayoff 《Computational Complexity》2009,18(2):171-207

相似文献

13.

VsLattice: A vector-based conceptual index structure for web service retrieval

Dunlu Peng Xiaoling Wang Aoying Zhou 《Information Systems Frontiers》2007,9(4):423-437

With the rising adoption of web services, effective management of web services becomes a critical issue in making the paradigm of service-oriented computing more practical. In this paper, a novel structure, called Vector-based service Lattice ( VsLattice), is devised to index web services in a semantic way. Each web service is modeled as a group of Service Operation Vectors (SOVs) in the vector space, and each SOV represents an operation provided by the service. The web services, SOVs and the relationship between web services and SOVs form the Conceptual Indexing Context (CIC) of a given service collection. In the CIC, web services that provide similar operations (functions) are conceptually indexed by the same Operation Vector Concepts (OVCs). The underlying relationships among the OVCs are captured with the VsLattice, which is constructed by adopting the traditional concept lattice in a CIC. By taking advantage of the information obtained from the VsLattice, a new representation of SOV is devised. Based on this representation, a novel service retrieval model and the implemental system are developed to retrieve web services efficiently. The performance and retrieving quality of the proposed approach has been evaluated through a series of experiments.

Aoying Zhou (Corresponding author)Email:

相似文献

14.

An electrokinetic mixer driven by oscillatory cross flow

Frederick R. PhelanJr Prasad Kutty Jai A. Pathak 《Microfluidics and nanofluidics》2008,5(1):101-118

An electrokinetic mixer driven by oscillatory cross flow has been studied numerically as a means for generating chaotic mixing in microfluidic devices for both confined and throughput mixing configurations. The flow is analyzed using numerical simulation of the unsteady Navier–Stokes equations combined with the tracking of single and multi-species passive tracer particles. First, the case of confined flow mixing is studied in which flow in the perpendicular channels of the oscillatory mixing element is driven sinusoidally, and 90° out of phase. The flow is shown to be chaotic by means of positive effective (finite time) Lyapunov exponents, and the stretching and folding of material lines leading to Lagrangian tracer particle dispersion. The transition to chaotic flow in this case depends strongly on the Strouhal number (St), and weakly on the ratio of the cross flow channel length to width (L/W). For L/W = 2, the flow becomes appreciably chaotic as evidenced by visual particle dispersion at approximately St = 0.32, and the transitional value of St increases slightly with increasing aspect ratio. A peak degree of mixing on the order of 85% is obtained for the range of parameter values explored here. In the second phase of the analysis, the effect of combining a fixed throughput flow with the oscillatory cross channel motion for use in a continuous mixing operation is examined in a star cell geometry. Chaotic mixing is again observed, and the characteristics of the downstream dispersion patterns depend mainly on the Strouhal number and the (dimensionless) throughput rate. In the star cell, the flow becomes appreciably chaotic as evidenced by visual particle dispersion at approximately St = 1, slightly higher than for the case of cross cell. The star cell mixing behavior is marked by the convergence of the degree of mixing to a plateau level as the Strouhal number is increased at fixed flow rate. Degree of mixing values from 70 to 80% are obtained indicating that the continuous flow is bounded by the maximum degree of mixing obtained from the confined flow configuration.

Jai A. PathakEmail:

相似文献

15.

Minimization of a Detail-Preserving Regularization Functional for Impulse Noise Removal

Jian-Feng Cai Raymond H. Chan Carmine Di Fiore 《Journal of Mathematical Imaging and Vision》2007,29(1):79-91

Recently, a powerful two-phase method for restoring images corrupted with high level impulse noise has been developed. The main drawback of the method is the computational efficiency of the second phase which requires the minimization of a non-smooth objective functional. However, it was pointed out in (Chan et al. in Proc. ICIP 2005, pp. 125–128) that the non-smooth data-fitting term in the functional can be deleted since the restoration in the second phase is applied to noisy pixels only. In this paper, we study the analytic properties of the resulting new functional ℱ. We show that ℱ, which is defined in terms of edge-preserving potential functions φ _α, inherits many nice properties from φ _α, including the first and second order Lipschitz continuity, strong convexity, and positive definiteness of its Hessian. Moreover, we use these results to establish the convergence of optimization methods applied to ℱ. In particular, we prove the global convergence of some conjugate gradient-type methods and of a recently proposed low complexity quasi-Newton algorithm. Numerical experiments are given to illustrate the convergence and efficiency of the two methods.

Carmine Di FioreEmail:

相似文献

16.

On the Discrete Maximum Principle for the Beltrami Color Flow

Lorina Dascal Adi Ditkowski Nir A. Sochen 《Journal of Mathematical Imaging and Vision》2007,29(1):63-77

We analyze the discrete maximum principle for the Beltrami color flow. The Beltrami flow can display linear as well as nonlinear behavior according to the values of a parameter β, which represents the ratio between spatial and color distances. In general, the standard schemes fail to satisfy the discrete maximum principle. In this work we show that a nonnegative second order difference scheme can be built for this flow only for small β, i.e. linear-like diffusion. Since this limitation is too severe, we construct a novel finite difference scheme, which is not nonnegative and satisfies the discrete maximum principle for all values of β. Numerical results support the analysis.

Nir A. Sochen (Corresponding author)Email:

相似文献

17.

On the partial terminal Steiner tree problem 总被引：1，自引：0，他引：1

Sun-Yuan Hsieh Huang-Ming Gao 《The Journal of supercomputing》2007,41(1):41-52

We investigate a practical variant of the well-known graph Steiner tree problem. For a complete graph G = ( V,E ) with length function l:E → R ⁺ and two vertex subsets R ⊂ V and R ′ ⊆ R, a partial terminal Steiner tree is a Steiner tree which contains all vertices in R such that all vertices in R ∖ R ′ belong to the leaves of this Steiner tree. The partial terminal Steiner tree problem is to find a partial terminal Steiner tree T whose total lengths ∑ _{(u,v)
∈T} l ( u,v ) is minimum. In this paper, we show that the problem is both NP-complete and MAX SNP-hard when the lengths of edges are restricted to either 1 or 2. We also provide an approximation algorithm for the problem.

Sun-Yuan HsiehEmail:

相似文献

18.

Numerical investigation of micro shock waves generation

J. D. Parisse J. Giordano P. Perrier Y. Burtschell I. A. Graur 《Microfluidics and nanofluidics》2009,6(5):699-709

The shock wave propagation in the micro channel of the different sizes is studied numerically in order to estimate the possibility of the experimental apparatus development. The full compressible Navier–Stokes equations are used for the numerical simulation. The shock wave velocity attenuation is found for the channel height smaller than H = 200 μm. The influence of the channel size and of the diaphragm pressure ratio on the shock wave velocity is considered. The considerable influence of the viscous effects on the shock propagation is shown.

J. D. ParisseEmail:

相似文献

19.

Interacting with mobile services: an evaluation of camera-phones and visual tags

Eleanor Toye Richard Sharp Anil Madhavapeddy David Scott Eben Upton Alan Blackwell 《Personal and Ubiquitous Computing》2007,11(2):97-106

We present a study of using camera-phones and visual-tags to access mobile services. Firstly, a user-experience study is described in which participants were both observed learning to interact with a prototype mobile service and interviewed about their experiences. Secondly, a pointing-device task is presented in which quantitative data was gathered regarding the speed and accuracy with which participants aimed and clicked on visual-tags using camera-phones. We found that participants’ attitudes to visual-tag-based applications were broadly positive, although they had several important reservations about camera-phone technology more generally. Data from our pointing-device task demonstrated that novice users were able to aim and click on visual-tags quickly (well under 3 s per pointing-device trial on average) and accurately (almost all meeting our defined speed/accuracy tradeoff of 6% error-rate). Based on our findings, design lessons for camera-phone and visual-tag applications are presented.

Eleanor Toye (Corresponding author)Email:

Richard SharpEmail:

Anil MadhavapeddyEmail:

David ScottEmail:

Eben UptonEmail:

Alan BlackwellEmail:

相似文献

20.

The partitioned dynamic-priority scheduling of sporadic task systems 总被引：4，自引：3，他引：1

Sanjoy K. Baruah Nathan Wayne Fisher 《Real-Time Systems》2007,36(3):199-226

A polynomial-time algorithm is presented for partitioning a collection of sporadic tasks among the processors of an identical multiprocessor platform. Since the partitioning problem is NP-hard in the strong sense, this algorithm is unlikely to be optimal. A quantitative characterization of its worst-case performance is provided in terms of resource augmentation.

Nathan Wayne Fisher (Corresponding author)Email:

相似文献