期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient hardware implementation of 8 × 8 integer cosine transforms for multiple video codecs

Khan A. Wahid Muhammad Martuza Mousumi Das Carl McCrosky 《Journal of Real-Time Image Processing》2013,8(4):403-410

The current trend of digital convergence leads to the need of the video decoder that should support multiple video standards such as, H.264/AVC, JPEG, MPEG-2/4, VC-1, and AVS on a single platform. In this paper, we present a cost-sharing architecture of multiple transforms to support all five popular video codecs. The architecture is based on a new multi-dimensional delta mapping. Here the inverse transform matrix of the Discrete Cosine Transform (DCT) of AVS, that has the lowest computational unit, is taken as the base to compute the inverse DCT matrices of the other four codecs. The proposed architecture uses only adders and shifters on a shared basis to reduce the hardware cost significantly. The shared architecture is implemented on FPGA and later synthesized in CMOS 0.18 μm technology. The results show that the proposed design satisfies the requirement of all five codecs with a maximum decoding capability of 60 fps of a full HD video. The scheme is also suitable for low-cost implementation in modern multi-codec systems. 相似文献

2.

Approximation techniques for response-time analysis of static-priority tasks 总被引：1，自引：1，他引：0

Thi Huyen Chau Nguyen Pascal Richard Enrico Bini 《Real-Time Systems》2009,43(2):147-176

We consider sporadic tasks with static priorities and constrained deadlines to be executed upon a uniprocessor platform. Pseudo-polynomial time algorithms are known for computing worst-case response times for this task model. Some applications require to evaluate efficiently upper bounds of response times. For this purpose, we propose parametric algorithms that allow to make a tradeoff between quality of results and computational effort according to an input accuracy parameter. In this paper, we present a parametric polynomial-time algorithm for computing upper bounds of worst-case response times, that is based on an improved fptas (Fully Polynomial Time Approximation Scheme). Then, we show that our bound does not achieve constant error bound in comparison with the exact worst-case response time. However, using the resource augmentation technique, we obtain a performance guarantee that allows to define a compromise between our response-time bound and processor capacity requirements. The algorithm average behavior is then analyzed through numerical experimentations. 相似文献

3.

Preemption-light multiprocessor scheduling of sporadic tasks with high utilisation bound

Konstantinos Bletsas Björn Andersson 《Real-Time Systems》2011,47(4):319-355

Known algorithms capable of scheduling implicit-deadline sporadic tasks over identical processors at up to 100% utilisation invariably involve numerous preemptions and migrations. To the challenge of devising a scheduling scheme with as few preemptions and migrations as possible, for a given guaranteed utilisation bound, we respond with the algorithm NPS-F. It is configurable with a parameter, trading off guaranteed schedulable utilisation (up to 100%) vs preemptions. For any possible configuration, NPS-F introduces fewer preemptions than any other known algorithm matching its utilisation bound. 相似文献

4.

Efficient implementation of class-based decomposition schemes for Naïve Bayes

Sang-Hyeun Park Johannes Fürnkranz 《Machine Learning》2014,96(3):295-309

Previous studies have shown that the classification accuracy of a Naïve Bayes classifier in the domain of text-classification can often be improved using binary decompositions such as error-correcting output codes (ECOC). The key contribution of this short note is the realization that ECOC and, in fact, all class-based decomposition schemes, can be efficiently implemented in a Naïve Bayes classifier, so that—because of the additive nature of the classifier—all binary classifiers can be trained in a single pass through the data. In contrast to the straight-forward implementation, which has a complexity of O(n?t?g), the proposed approach improves the complexity to O((n+t)?g). Large-scale learning of ensemble approaches with Naïve Bayes can benefit from this approach, as the experimental results shown in this paper demonstrate. 相似文献

5.

Efficient implementation of a generalized polak-ribière algorithm for nonlinear optimization

《国际计算机数学杂志》2012,89(1-2):53-61

We have recently developed a new conjugate gradient type method, the Generalized Polak-Ribière (GPR) method, for unconstrained minimization. It is based on search directions that are parallel to the Newton direction of the restriction of the objective function f on the two dimensional subspace span{?g p}, with p a suitable direction in span{? g,s^?}, where g and s ^? are the current gradient and previous search direction respectively. The new approach proved to be considerably more efficient than the original Polak-Ribière method.

In this paper, various implementations of the GPR method are compared with a currently available standard NAG software routine and also with the Nocedal, Buckley-LeNir and Shanno's limited memory algorithms. The results demonstrate the general effectiveness of the new algorithm. We also give a very brief discussion of extensions of the GPR method that generate search directions parallel to Newton directions in subspaces of dimension greater than two. 相似文献

6.

An Efficient Algorithm for Modelling Duration in Hidden Markov Models, with a Dramatic Application

Søren Hauberg Jakob Sloth 《Journal of Mathematical Imaging and Vision》2008,31(2-3):165-170

For many years, the hidden Markov model (HMM) has been one of the most popular tools for analysing sequential data. One frequently used special case is the left-right model, in which the order of the hidden states is known. If knowledge of the duration of a state is available it is not possible to represent it explicitly with an HMM. Methods for modelling duration with HMM’s do exist (Rabiner in Proc. IEEE 77(2):257–286, [1989]), but they come at the price of increased computational complexity. Here we present an efficient and robust algorithm for modelling duration in HMM’s, and this algorithm is successfully used to control autonomous computer actors in a theatrical play. 相似文献

7.

Semi on-line scheduling on three processors with known sum of the tasks

Enrico Angelelli Maria Grazia Speranza Zsolt Tuza 《Journal of Scheduling》2007,10(4-5):263-269

We consider a semi on-line version of the multiprocessor scheduling problem on three processors, where the total size of the tasks is known in advance. We prove a lower bound on the competitive ratio of any algorithm and propose a simple algorithm with competitive ratio equal to 1.5. The performance is improved to by a preprocessing strategy. The latter algorithm is only 2% away from the lower bound. Z. Tuza is supported in part by the Hungarian Scientific Research Fund, OTKA grant T-049613. 相似文献

8.

An Efficient Newton-Krylov Implementation of the Constrained Runs Scheme for Initializing on a Slow Manifold

Christophe Vandekerckhove Ioannis Kevrekidis Dirk Roose 《Journal of scientific computing》2009,39(2):167-188

The long-term dynamic behavior of many dynamical systems evolves on a low-dimensional, attracting, invariant slow manifold, which can be parameterized by only a few variables (“observables”). The explicit derivation of such a slow manifold (and thus, the reduction of the long-term system dynamics) is often extremely difficult or practically impossible. For this class of problems, the equation-free framework has been developed to enable performing coarse-grained computations, based on short full model simulations. Each full model simulation should be initialized so that the full model state is consistent with the values of the observables and close to the slow manifold. To compute such an initial full model state, a class of constrained runs functional iterations was proposed (Gear and Kevrekidis, J. Sci. Comput. 25(1), 17–28, 2005; Gear et al., SIAM J. Appl. Dyn. Syst. 4(3), 711–732, 2005). The schemes in this class only use the full model simulator and converge, under certain conditions, to an approximation of the desired state on the slow manifold. In this article, we develop an implementation of the constrained runs scheme that is based on a (preconditioned) Newton-Krylov method rather than on a simple functional iteration. The functional iteration and the Newton-Krylov method are compared in detail using a lattice Boltzmann model for one-dimensional reaction-diffusion as the full model simulator. Depending on the parameters of the lattice Boltzmann model, the functional iteration may converge slowly or even diverge. We show that both issues are largely resolved by using the Newton-Krylov method, especially when a coarse grid correction preconditioner is incorporated. 相似文献

9.

Routes for Efficient Computational Homogenization of Nonlinear Materials Using the Proper Generalized Decompositions

H. Lamari A. Ammar P. Cartraud G. Legrain F. Chinesta F. Jacquemin 《Archives of Computational Methods in Engineering》2010,17(4):373-391

Computational homogenization is nowadays one of the most active research topics in computational mechanics. Different strategies have been proposed, the main challenge being the computing cost induced by complex microstructures exhibiting nonlinear behaviors. Two quite tricky scenarios lie in (i) the necessity of applying the homogenization procedure for many microstructures (e.g. material microstructure evolving at the macroscopic level or stochastic microstructure); the second situation concerns the homogenization of nonlinear behaviors implying the necessity of solving microscopic problems for each macroscopic state (history independent nonlinear models) or for each macroscopic history (history dependant nonlinear models). In this paper we present some preliminary results concerning the application of Proper Generalized Decompositions—PGD—for addressing the efficient solution of homogenization problems. This numerical technique could allow to compute the homogenized properties for any microstructure or for any macroscopic loading history by solving a single but highly multidimensional model. The PGD allows circumventing the so called curse of dimensionality that mesh based representations suffer. Even if this work only describes the first steps in a very ambitious objective, many original ideas are launched that could be at the origin of impressive progresses. 相似文献

10.

Accelerating incompressible flow computations with a Pthreads-CUDA implementation on small-footprint multi-GPU platforms

Julien C. Thibault Inanc Senocak 《The Journal of supercomputing》2012,59(2):693-719

Graphics processor units (GPU) that are originally designed for graphics rendering have emerged as massively-parallel “co-processors” to the central processing unit (CPU). Small-footprint multi-GPU workstations with hundreds of processing elements can accelerate compute-intensive simulation science applications substantially. In this study, we describe the implementation of an incompressible flow Navier–Stokes solver for multi-GPU workstation platforms. A shared-memory parallel code with identical numerical methods is also developed for multi-core CPUs to provide a fair comparison between CPUs and GPUs. Specifically, we adopt NVIDIA’s Compute Unified Device Architecture (CUDA) programming model to implement the discretized form of the governing equations on a single GPU. Pthreads are then used to enable communication across multiple GPUs on a workstation. We use separate CUDA kernels to implement the projection algorithm to solve the incompressible fluid flow equations. Kernels are implemented on different memory spaces on the GPU depending on their arithmetic intensity. The memory hierarchy specific implementation produces significantly faster performance. We present a systematic analysis of speedup and scaling using two generations of NVIDIA GPU architectures and provide a comparison of single and double precision computational performance on the GPU. Using a quad-GPU platform for single precision computations, we observe two orders of magnitude speedup relative to a serial CPU implementation. Our results demonstrate that multi-GPU workstations can serve as a cost-effective small-footprint parallel computing platform to accelerate computational fluid dynamics (CFD) simulations substantially. 相似文献

11.

Efficient approximate leave-one-out cross-validation for kernel logistic regression

Gavin C. Cawley Nicola L. C. Talbot 《Machine Learning》2008,71(2-3):243-264

Kernel logistic regression (KLR) is the kernel learning method best suited to binary pattern recognition problems where estimates of a-posteriori probability of class membership are required. Such problems occur frequently in practical applications, for instance because the operational prior class probabilities or equivalently the relative misclassification costs are variable or unknown at the time of training the model. The model parameters are given by the solution of a convex optimization problem, which may be found via an efficient iteratively re-weighted least squares (IRWLS) procedure. The generalization properties of a kernel logistic regression machine are however governed by a small number of hyper-parameters, the values of which must be determined during the process of model selection. In this paper, we propose a novel model selection strategy for KLR, based on a computationally efficient closed-form approximation of the leave-one-out cross-validation procedure. Results obtained on a variety of synthetic and real-world benchmark datasets are given, demonstrating that the proposed model selection procedure is competitive with a more conventional k-fold cross-validation based approach and also with Gaussian process (GP) classifiers implemented using the Laplace approximation and via the Expectation Propagation (EP) algorithm. 相似文献

12.

Topological and Variational Properties of a Model for the Reconstruction of Three-Dimensional Transparent Images with Self-Occlusions

Giovanni Bellettini Valentina Beorchia Maurizio Paolini 《Journal of Mathematical Imaging and Vision》2008,32(3):265-291

We introduce and study a two-dimensional variational model for the reconstruction of a smooth generic solid shape E, which may handle the self-occlusions and that can be considered as an improvement of the 2.1D sketch of Nitzberg and Mumford (Proceedings of the Third International Conference on Computer Vision, Osaka, 1990). We characterize from the topological viewpoint the apparent contour of E, namely, we characterize those planar graphs that are apparent contours of some shape E. This is the classical problem of recovering a three-dimensional layered shape from its apparent contour, which is of interest in theoretical computer vision. We make use of the so-called Huffman labeling (Machine Intelligence, vol. 6, Am. Elsevier, New York, 1971), see also the papers of Williams (Ph.D. Dissertation, 1994 and Int. J. Comput. Vis. 23:93–108, 1997) and the paper of Karpenko and Hughes (Preprint, 2006) for related results. Moreover, we show that if E and F are two shapes having the same apparent contour, then E and F differ by a global homeomorphism which is strictly increasing on each fiber along the direction of the eye of the observer. These two topological theorems allow to find the domain of the functional ℱ describing the model. Compactness, semicontinuity and relaxation properties of ℱ are then studied, as well as connections of our model with the problem of completion of hidden contours.

Maurizio PaoliniEmail:

相似文献

13.

An implementation of parallel file distribution in an agent hierarchy

Munehiro Fukuda Jumpei Miyauchi 《The Journal of supercomputing》2009,47(3):255-285

PC grid is a cost-effective grid-computing platform that attracts users by allocating to their massively parallel applications as many desktop computers as requested. However, a challenge is how to distribute necessary files to remote computing nodes that may be unconnected to the same network file system, equipped with insufficient disk space to keep entire files, and even powered off asynchronously. Targeting PC grid, the AgentTeamwork grid-computing middleware deploys a hierarchy of mobile agents to remote desktops so as to launch, monitor, check-point, and resume a parallel and distributed computing job. To achieve high-speed file distribution, AgentTeamwork takes advantage of its agent hierarchy. The system partitions files into stripes at the tree root if they are random-access files, duplicates them at each tree level if they are shared among all remote nodes, fragments them into smaller messages if they are too large to relay to a lower tree level, aggregates such messages in a larger fragment if they are in transit to the same subtree, and returns output files to the user along multi-paths established within the tree. To achieve fault-tolerant file delivery, each agent periodically takes a snapshot of in-transit and on-memory file messages with its user job, and thus resumes them from the latest snapshot when they crash accidentally. This paper presents an implementation and its competitive performance of AgentTeamwork’s file-distribution algorithm including file partitioning, transfer, check-pointing, and consistency maintenance.

Jumpei MiyauchiEmail:

相似文献

14.

Design and implementation of a workflow-based resource broker with information system on computational grids 总被引：1，自引：0，他引：1

Chao-Tung Yang Kuan-Chou Lai Po-Chi Shih 《The Journal of supercomputing》2009,47(1):76-109

The grid is a promising infrastructure that can allow scientists and engineers to access resources among geographically distributed environments. Grid computing is a new technology which focuses on aggregating resources (e.g., processor cycles, disk storage, and contents) from a large-scale computing platform. Making grid computing a reality requires a resource broker to manage and monitor available resources. This paper presents a workflow-based resource broker whose main functions are matching available resources with user requests and considering network information statuses during matchmaking in computational grids. The resource broker provides a graphic user interface for accessing available and the appropriate resources via user credentials. This broker uses the Ganglia and NWS tools to monitor resource status and network-related information, respectively. Then we propose a history-based execution time estimation model to predict the execution time of parallel applications, according to previous execution results. The experimental results show that our model can accurately predict the execution time of embarrassingly parallel applications. We also report on using the Globus Toolkit to construct a grid platform called the TIGER project that integrates resources distributed across five universities in Taichung city, Taiwan, where the resource broker was developed.

Po-Chi ShihEmail:

相似文献

15.

Energy efficient scheduling of parallel tasks on multiprocessor computers 总被引：1，自引：1，他引：1

Keqin Li 《The Journal of supercomputing》2012,60(2):223-247

In this paper, scheduling parallel tasks on multiprocessor computers with dynamically variable voltage and speed are addressed as combinatorial optimization problems. Two problems are defined, namely, minimizing schedule length with energy consumption constraint and minimizing energy consumption with schedule length constraint. The first problem has applications in general multiprocessor and multicore processor computing systems where energy consumption is an important concern and in mobile computers where energy conservation is a main concern. The second problem has applications in real-time multiprocessing systems and environments where timing constraint is a major requirement. Our scheduling problems are defined such that the energy-delay product is optimized by fixing one factor and minimizing the other. It is noticed that power-aware scheduling of parallel tasks has rarely been discussed before. Our investigation in this paper makes some initial attempt to energy-efficient scheduling of parallel tasks on multiprocessor computers with dynamic voltage and speed. Our scheduling problems contain three nontrivial subproblems, namely, system partitioning, task scheduling, and power supplying. Each subproblem should be solved efficiently, so that heuristic algorithms with overall good performance can be developed. The above decomposition of our optimization problems into three subproblems makes design and analysis of heuristic algorithms tractable. A unique feature of our work is to compare the performance of our algorithms with optimal solutions analytically and validate our results experimentally, not to compare the performance of heuristic algorithms among themselves only experimentally. The harmonic system partitioning and processor allocation scheme is used, which divides a multiprocessor computer into clusters of equal sizes and schedules tasks of similar sizes together to increase processor utilization. A three-level energy/time/power allocation scheme is adopted for a given schedule, such that the schedule length is minimized by consuming given amount of energy or the energy consumed is minimized without missing a given deadline. The performance of our heuristic algorithms is analyzed, and accurate performance bounds are derived. Simulation data which validate our analytical results are also presented. It is found that our analytical results provide very accurate estimation of the expected normalized schedule length and the expected normalized energy consumption and that our heuristic algorithms are able to produce solutions very close to optimum. 相似文献

16.

Exact Minkowksi Sums of Polyhedra and Exact and Efficient Decomposition of Polyhedra into Convex Pieces

Peter Hachenberger 《Algorithmica》2009,55(2):329-345

We present the first exact and robust implementation of the 3D Minkowski sum of two non-convex polyhedra. Our implementation decomposes the two polyhedra into convex pieces, performs pairwise Minkowski sums on the convex pieces, and constructs their union. We achieve exactness and the handling of all degeneracies by building upon 3D Nef polyhedra as provided by Cgal. The implementation also supports open and closed polyhedra. This allows the handling of degenerate scenarios like the tight passage problem in robot motion planning. The bottleneck of our approach is the union step. We address efficiency by optimizing this step by two means: we implement an efficient decomposition that yields a small number of convex pieces, and develop, test and optimize multiple strategies for uniting the partial sums by consecutive binary union operations. The decomposition that we implemented as part of the Minkowski sum is interesting in its own right. It is the first robust implementation of a decomposition of polyhedra into convex pieces that yields at most O(r ²) pieces, where r is the number of edges whose adjacent facets comprise an angle of more than 180 degrees with respect to the interior of the polyhedron. This work was partially supported by the Netherlands’ Organisation for Scientific Research (NWO) under project no. 639.023.301. 相似文献

17.

StreamJacobi： Efficient implementation of 2-D Jacobi on a stream processor

ZHANG Ying LI Gen YANG Xue-jun ZHANG Wei FANG Xu-dong ZHANG He-ying 《通讯和计算机》2009,6(1):6-15

Stream processors, with the stream programming model, have demonstrated significant performance advantages in the domains signal processing, multimedia and graphics applications. In this paper we examine the applicability of a stream processor to 2-D Jacobi iteration which is widely used to solve partial differential equations, an important class of scientific programs. We first map 2-D Jacobi iteration in FORTRAN version to the stream processor in a straightforward way. In a stream processor system, the management of system resources is the programmers＇ responsibility. We then present several optimizations, which avail the stream program for 2-D Jacobi iteration, called StreamJacobi, of various aspects of the stream processor architecture. Finally, we analyze the performance of StreamJacobi, with different scales, and the presented optimizations. The final stream program StreamJacobi is from 2.31 to 6.42 times faster than the corresponding FORTRAN programs on a Xeon 5100 processor, with the optimizations playing an important role in realizing the performance improvement. 相似文献

18.

Numerical Simulation of a Weakly Nonlinear Model for Water Waves with Viscosity

Maria Kakleas David P. Nicholls 《Journal of scientific computing》2010,42(2):274-290

The potential flow equations which govern the free-surface motion of an ideal fluid (the water wave problem) are notoriously difficult to solve for a number of reasons. First, they are a classical free-boundary problem where the domain shape is one of the unknowns to be found. Additionally, they are strongly nonlinear (with derivatives appearing in the nonlinearity) without a natural dissipation mechanism so that spurious high-frequency modes are not damped. In this contribution we address the latter of these difficulties using a surface formulation (which addresses the former complication) supplemented with physically-motivated viscous effects recently derived by Dias et al. (Phys. Lett. A 372:1297–1302, 2008). The novelty of our approach is to derive a weakly nonlinear model from the surface formulation of Zakharov (J. Appl. Mech. Tech. Phys. 9:190–194, 1968) and Craig and Sulem (J. Comput. Phys. 108:73–83, 1993), complemented with the viscous effects mentioned above. Our new model is simple to implement while being both faithful to the physics of the problem and extremely stable numerically. 相似文献

19.

The Flexible, Extensible and Efficient Toolbox of Level Set Methods

Ian M. Mitchell 《Journal of scientific computing》2008,35(2-3):300-329

Level set methods are a popular and powerful class of numerical algorithms for dynamic implicit surfaces and solution of Hamilton-Jacobi PDEs. While the advanced level set schemes combine both efficiency and accuracy, their implementation complexity makes it difficult for the community to reproduce new results and make quantitative comparisons between methods. This paper describes the Toolbox of Level Set Methods, a collection of Matlab routines implementing the basic level set algorithms on fixed Cartesian grids for rectangular domains in arbitrary dimension. The Toolbox’s code and interface are designed to permit flexible combinations of different schemes and PDE forms, allow easy extension through the addition of new algorithms, and achieve efficient execution despite the fact that the code is entirely written as m-files. The current contents of the Toolbox and some coding patterns important to achieving its flexibility, extensibility and efficiency are briefly explained, as is the process of adding two new algorithms. Code for both the Toolbox and the new algorithms is available from the Web. 相似文献

20.

Searching for Complex Human Activities with No Visual Examples

Nazlı İkizler David A. Forsyth 《International Journal of Computer Vision》2008,80(3):337-357

We describe a method of representing human activities that allows a collection of motions to be queried without examples, using a simple and effective query language. Our approach is based on units of activity at segments of the body, that can be composed across space and across the body to produce complex queries. The presence of search units is inferred automatically by tracking the body, lifting the tracks to 3D and comparing to models trained using motion capture data. Our models of short time scale limb behaviour are built using labelled motion capture set. We show results for a large range of queries applied to a collection of complex motion and activity. We compare with discriminative methods applied to tracker data; our method offers significantly improved performance. We show experimental evidence that our method is robust to view direction and is unaffected by some important changes of clothing. 相似文献