共查询到20条相似文献,搜索用时 15 毫秒
1.
Khan A. Wahid Muhammad Martuza Mousumi Das Carl McCrosky 《Journal of Real-Time Image Processing》2013,8(4):403-410
The current trend of digital convergence leads to the need of the video decoder that should support multiple video standards such as, H.264/AVC, JPEG, MPEG-2/4, VC-1, and AVS on a single platform. In this paper, we present a cost-sharing architecture of multiple transforms to support all five popular video codecs. The architecture is based on a new multi-dimensional delta mapping. Here the inverse transform matrix of the Discrete Cosine Transform (DCT) of AVS, that has the lowest computational unit, is taken as the base to compute the inverse DCT matrices of the other four codecs. The proposed architecture uses only adders and shifters on a shared basis to reduce the hardware cost significantly. The shared architecture is implemented on FPGA and later synthesized in CMOS 0.18 μm technology. The results show that the proposed design satisfies the requirement of all five codecs with a maximum decoding capability of 60 fps of a full HD video. The scheme is also suitable for low-cost implementation in modern multi-codec systems. 相似文献
2.
Known algorithms capable of scheduling implicit-deadline sporadic tasks over identical processors at up to 100% utilisation
invariably involve numerous preemptions and migrations. To the challenge of devising a scheduling scheme with as few preemptions
and migrations as possible, for a given guaranteed utilisation bound, we respond with the algorithm NPS-F. It is configurable
with a parameter, trading off guaranteed schedulable utilisation (up to 100%) vs preemptions. For any possible configuration,
NPS-F introduces fewer preemptions than any other known algorithm matching its utilisation bound. 相似文献
3.
Previous studies have shown that the classification accuracy of a Naïve Bayes classifier in the domain of text-classification can often be improved using binary decompositions such as error-correcting output codes (ECOC). The key contribution of this short note is the realization that ECOC and, in fact, all class-based decomposition schemes, can be efficiently implemented in a Naïve Bayes classifier, so that—because of the additive nature of the classifier—all binary classifiers can be trained in a single pass through the data. In contrast to the straight-forward implementation, which has a complexity of O(n?t?g), the proposed approach improves the complexity to O((n+t)?g). Large-scale learning of ensemble approaches with Naïve Bayes can benefit from this approach, as the experimental results shown in this paper demonstrate. 相似文献
4.
《国际计算机数学杂志》2012,89(1-2):53-61
We have recently developed a new conjugate gradient type method, the Generalized Polak-Ribière (GPR) method, for unconstrained minimization. It is based on search directions that are parallel to the Newton direction of the restriction of the objective function f on the two dimensional subspace span{?g p}, with p a suitable direction in span{? g,s?}, where g and s ? are the current gradient and previous search direction respectively. The new approach proved to be considerably more efficient than the original Polak-Ribière method. In this paper, various implementations of the GPR method are compared with a currently available standard NAG software routine and also with the Nocedal, Buckley-LeNir and Shanno's limited memory algorithms. The results demonstrate the general effectiveness of the new algorithm. We also give a very brief discussion of extensions of the GPR method that generate search directions parallel to Newton directions in subspaces of dimension greater than two. 相似文献
5.
For many years, the hidden Markov model (HMM) has been one of the most popular tools for analysing sequential data. One frequently used special case is the left-right model, in which the order of the hidden states is known. If knowledge of the duration of a state is available it is not possible to represent it explicitly with an HMM. Methods for modelling duration with HMM’s do exist (Rabiner in Proc. IEEE 77(2):257–286, [1989]), but they come at the price of increased computational complexity. Here we present an efficient and robust algorithm for modelling duration in HMM’s, and this algorithm is successfully used to control autonomous computer actors in a theatrical play. 相似文献
6.
We consider a semi on-line version of the multiprocessor scheduling problem on three processors, where the total size of the
tasks is known in advance. We prove a lower bound
on the competitive ratio of any algorithm and propose a simple algorithm with competitive ratio equal to 1.5. The performance
is improved to
by a preprocessing strategy. The latter algorithm is only 2% away from the lower bound.
Z. Tuza is supported in part by the Hungarian Scientific Research Fund, OTKA grant T-049613. 相似文献
7.
Christophe Vandekerckhove Ioannis Kevrekidis Dirk Roose 《Journal of scientific computing》2009,39(2):167-188
The long-term dynamic behavior of many dynamical systems evolves on a low-dimensional, attracting, invariant slow manifold,
which can be parameterized by only a few variables (“observables”). The explicit derivation of such a slow manifold (and thus,
the reduction of the long-term system dynamics) is often extremely difficult or practically impossible. For this class of
problems, the equation-free framework has been developed to enable performing coarse-grained computations, based on short
full model simulations. Each full model simulation should be initialized so that the full model state is consistent with the
values of the observables and close to the slow manifold. To compute such an initial full model state, a class of constrained
runs functional iterations was proposed (Gear and Kevrekidis, J. Sci. Comput. 25(1), 17–28, 2005; Gear et al., SIAM J. Appl. Dyn. Syst. 4(3), 711–732, 2005). The schemes in this class only use the full model simulator and converge, under certain conditions, to an approximation
of the desired state on the slow manifold. In this article, we develop an implementation of the constrained runs scheme that
is based on a (preconditioned) Newton-Krylov method rather than on a simple functional iteration. The functional iteration
and the Newton-Krylov method are compared in detail using a lattice Boltzmann model for one-dimensional reaction-diffusion
as the full model simulator. Depending on the parameters of the lattice Boltzmann model, the functional iteration may converge
slowly or even diverge. We show that both issues are largely resolved by using the Newton-Krylov method, especially when a
coarse grid correction preconditioner is incorporated. 相似文献
8.
Graphics processor units (GPU) that are originally designed for graphics rendering have emerged as massively-parallel “co-processors”
to the central processing unit (CPU). Small-footprint multi-GPU workstations with hundreds of processing elements can accelerate
compute-intensive simulation science applications substantially. In this study, we describe the implementation of an incompressible
flow Navier–Stokes solver for multi-GPU workstation platforms. A shared-memory parallel code with identical numerical methods
is also developed for multi-core CPUs to provide a fair comparison between CPUs and GPUs. Specifically, we adopt NVIDIA’s
Compute Unified Device Architecture (CUDA) programming model to implement the discretized form of the governing equations
on a single GPU. Pthreads are then used to enable communication across multiple GPUs on a workstation. We use separate CUDA
kernels to implement the projection algorithm to solve the incompressible fluid flow equations. Kernels are implemented on
different memory spaces on the GPU depending on their arithmetic intensity. The memory hierarchy specific implementation produces
significantly faster performance. We present a systematic analysis of speedup and scaling using two generations of NVIDIA
GPU architectures and provide a comparison of single and double precision computational performance on the GPU. Using a quad-GPU
platform for single precision computations, we observe two orders of magnitude speedup relative to a serial CPU implementation.
Our results demonstrate that multi-GPU workstations can serve as a cost-effective small-footprint parallel computing platform
to accelerate computational fluid dynamics (CFD) simulations substantially. 相似文献
9.
Kernel logistic regression (KLR) is the kernel learning method best suited to binary pattern recognition problems where estimates of a-posteriori probability of class membership are required. Such problems occur frequently in practical applications, for instance because the operational prior class probabilities or equivalently the relative misclassification costs are variable or unknown at the time of training the model. The model parameters are given by the solution of a convex optimization problem, which may be found via an efficient iteratively re-weighted least squares (IRWLS) procedure. The generalization properties of a kernel logistic regression machine are however governed by a small number of hyper-parameters, the values of which must be determined during the process of model selection. In this paper, we propose a novel model selection strategy for KLR, based on a computationally efficient closed-form approximation of the leave-one-out cross-validation procedure. Results obtained on a variety of synthetic and real-world benchmark datasets are given, demonstrating that the proposed model selection procedure is competitive with a more conventional k-fold cross-validation based approach and also with Gaussian process (GP) classifiers implemented using the Laplace approximation and via the Expectation Propagation (EP) algorithm. 相似文献
10.
PC grid is a cost-effective grid-computing platform that attracts users by allocating to their massively parallel applications
as many desktop computers as requested. However, a challenge is how to distribute necessary files to remote computing nodes
that may be unconnected to the same network file system, equipped with insufficient disk space to keep entire files, and even
powered off asynchronously.
Targeting PC grid, the AgentTeamwork grid-computing middleware deploys a hierarchy of mobile agents to remote desktops so
as to launch, monitor, check-point, and resume a parallel and distributed computing job. To achieve high-speed file distribution,
AgentTeamwork takes advantage of its agent hierarchy. The system partitions files into stripes at the tree root if they are
random-access files, duplicates them at each tree level if they are shared among all remote nodes, fragments them into smaller
messages if they are too large to relay to a lower tree level, aggregates such messages in a larger fragment if they are in
transit to the same subtree, and returns output files to the user along multi-paths established within the tree. To achieve
fault-tolerant file delivery, each agent periodically takes a snapshot of in-transit and on-memory file messages with its
user job, and thus resumes them from the latest snapshot when they crash accidentally.
This paper presents an implementation and its competitive performance of AgentTeamwork’s file-distribution algorithm including
file partitioning, transfer, check-pointing, and consistency maintenance.
相似文献
Jumpei MiyauchiEmail: |
11.
Giovanni Bellettini Valentina Beorchia Maurizio Paolini 《Journal of Mathematical Imaging and Vision》2008,32(3):265-291
We introduce and study a two-dimensional variational model for the reconstruction of a smooth generic solid shape E, which may handle the self-occlusions and that can be considered as an improvement of the 2.1D sketch of Nitzberg and Mumford
(Proceedings of the Third International Conference on Computer Vision, Osaka, 1990). We characterize from the topological viewpoint the apparent contour of E, namely, we characterize those planar graphs that are apparent contours of some shape E. This is the classical problem of recovering a three-dimensional layered shape from its apparent contour, which is of interest
in theoretical computer vision. We make use of the so-called Huffman labeling (Machine Intelligence, vol. 6, Am. Elsevier,
New York, 1971), see also the papers of Williams (Ph.D. Dissertation, 1994 and Int. J. Comput. Vis. 23:93–108, 1997) and the paper of Karpenko and Hughes (Preprint, 2006) for related results. Moreover, we show that if E and F are two shapes having the same apparent contour, then E and F differ by a global homeomorphism which is strictly increasing on each fiber along the direction of the eye of the observer.
These two topological theorems allow to find the domain of the functional ℱ describing the model. Compactness, semicontinuity
and relaxation properties of ℱ are then studied, as well as connections of our model with the problem of completion of hidden
contours.
相似文献
Maurizio PaoliniEmail: |
12.
Keqin Li 《The Journal of supercomputing》2012,60(2):223-247
In this paper, scheduling parallel tasks on multiprocessor computers with dynamically variable voltage and speed are addressed
as combinatorial optimization problems. Two problems are defined, namely, minimizing schedule length with energy consumption
constraint and minimizing energy consumption with schedule length constraint. The first problem has applications in general
multiprocessor and multicore processor computing systems where energy consumption is an important concern and in mobile computers
where energy conservation is a main concern. The second problem has applications in real-time multiprocessing systems and
environments where timing constraint is a major requirement. Our scheduling problems are defined such that the energy-delay
product is optimized by fixing one factor and minimizing the other. It is noticed that power-aware scheduling of parallel
tasks has rarely been discussed before. Our investigation in this paper makes some initial attempt to energy-efficient scheduling
of parallel tasks on multiprocessor computers with dynamic voltage and speed. Our scheduling problems contain three nontrivial
subproblems, namely, system partitioning, task scheduling, and power supplying. Each subproblem should be solved efficiently,
so that heuristic algorithms with overall good performance can be developed. The above decomposition of our optimization problems
into three subproblems makes design and analysis of heuristic algorithms tractable. A unique feature of our work is to compare
the performance of our algorithms with optimal solutions analytically and validate our results experimentally, not to compare
the performance of heuristic algorithms among themselves only experimentally. The harmonic system partitioning and processor
allocation scheme is used, which divides a multiprocessor computer into clusters of equal sizes and schedules tasks of similar
sizes together to increase processor utilization. A three-level energy/time/power allocation scheme is adopted for a given
schedule, such that the schedule length is minimized by consuming given amount of energy or the energy consumed is minimized
without missing a given deadline. The performance of our heuristic algorithms is analyzed, and accurate performance bounds
are derived. Simulation data which validate our analytical results are also presented. It is found that our analytical results
provide very accurate estimation of the expected normalized schedule length and the expected normalized energy consumption
and that our heuristic algorithms are able to produce solutions very close to optimum. 相似文献
13.
Design and implementation of a workflow-based resource broker with information system on computational grids 总被引:1,自引:0,他引:1
The grid is a promising infrastructure that can allow scientists and engineers to access resources among geographically distributed
environments. Grid computing is a new technology which focuses on aggregating resources (e.g., processor cycles, disk storage,
and contents) from a large-scale computing platform. Making grid computing a reality requires a resource broker to manage
and monitor available resources. This paper presents a workflow-based resource broker whose main functions are matching available
resources with user requests and considering network information statuses during matchmaking in computational grids. The resource
broker provides a graphic user interface for accessing available and the appropriate resources via user credentials. This
broker uses the Ganglia and NWS tools to monitor resource status and network-related information, respectively. Then we propose
a history-based execution time estimation model to predict the execution time of parallel applications, according to previous
execution results. The experimental results show that our model can accurately predict the execution time of embarrassingly
parallel applications. We also report on using the Globus Toolkit to construct a grid platform called the TIGER project that
integrates resources distributed across five universities in Taichung city, Taiwan, where the resource broker was developed.
相似文献
Po-Chi ShihEmail: |
14.
Peter Hachenberger 《Algorithmica》2009,55(2):329-345
We present the first exact and robust implementation of the 3D Minkowski sum of two non-convex polyhedra. Our implementation
decomposes the two polyhedra into convex pieces, performs pairwise Minkowski sums on the convex pieces, and constructs their
union. We achieve exactness and the handling of all degeneracies by building upon 3D Nef polyhedra as provided by Cgal. The implementation also supports open and closed polyhedra. This allows the handling of degenerate scenarios like the tight
passage problem in robot motion planning.
The bottleneck of our approach is the union step. We address efficiency by optimizing this step by two means: we implement
an efficient decomposition that yields a small number of convex pieces, and develop, test and optimize multiple strategies
for uniting the partial sums by consecutive binary union operations.
The decomposition that we implemented as part of the Minkowski sum is interesting in its own right. It is the first robust
implementation of a decomposition of polyhedra into convex pieces that yields at most O(r
2) pieces, where r is the number of edges whose adjacent facets comprise an angle of more than 180 degrees with respect to the interior of the
polyhedron.
This work was partially supported by the Netherlands’ Organisation for Scientific Research (NWO) under project no. 639.023.301. 相似文献
15.
The potential flow equations which govern the free-surface motion of an ideal fluid (the water wave problem) are notoriously
difficult to solve for a number of reasons. First, they are a classical free-boundary problem where the domain shape is one
of the unknowns to be found. Additionally, they are strongly nonlinear (with derivatives appearing in the nonlinearity) without
a natural dissipation mechanism so that spurious high-frequency modes are not damped. In this contribution we address the
latter of these difficulties using a surface formulation (which addresses the former complication) supplemented with physically-motivated
viscous effects recently derived by Dias et al. (Phys. Lett. A 372:1297–1302, 2008). The novelty of our approach is to derive a weakly nonlinear model from the surface formulation of Zakharov (J. Appl. Mech.
Tech. Phys. 9:190–194, 1968) and Craig and Sulem (J. Comput. Phys. 108:73–83, 1993), complemented with the viscous effects mentioned above. Our new model is simple to implement while being both faithful to
the physics of the problem and extremely stable numerically. 相似文献
16.
Stream processors, with the stream programming model, have demonstrated significant performance advantages in the domains signal processing, multimedia and graphics applications. In this paper we examine the applicability of a stream processor to 2-D Jacobi iteration which is widely used to solve partial differential equations, an important class of scientific programs. We first map 2-D Jacobi iteration in FORTRAN version to the stream processor in a straightforward way. In a stream processor system, the management of system resources is the programmers' responsibility. We then present several optimizations, which avail the stream program for 2-D Jacobi iteration, called StreamJacobi, of various aspects of the stream processor architecture. Finally, we analyze the performance of StreamJacobi, with different scales, and the presented optimizations. The final stream program StreamJacobi is from 2.31 to 6.42 times faster than the corresponding FORTRAN programs on a Xeon 5100 processor, with the optimizations playing an important role in realizing the performance improvement. 相似文献
17.
Ian M. Mitchell 《Journal of scientific computing》2008,35(2-3):300-329
Level set methods are a popular and powerful class of numerical algorithms for dynamic implicit surfaces and solution of Hamilton-Jacobi PDEs. While the advanced level set schemes combine both efficiency and accuracy, their implementation complexity makes it difficult for the community to reproduce new results and make quantitative comparisons between methods. This paper describes the Toolbox of Level Set Methods, a collection of Matlab routines implementing the basic level set algorithms on fixed Cartesian grids for rectangular domains in arbitrary dimension. The Toolbox’s code and interface are designed to permit flexible combinations of different schemes and PDE forms, allow easy extension through the addition of new algorithms, and achieve efficient execution despite the fact that the code is entirely written as m-files. The current contents of the Toolbox and some coding patterns important to achieving its flexibility, extensibility and efficiency are briefly explained, as is the process of adding two new algorithms. Code for both the Toolbox and the new algorithms is available from the Web. 相似文献
18.
We describe a method of representing human activities that allows a collection of motions to be queried without examples,
using a simple and effective query language. Our approach is based on units of activity at segments of the body, that can
be composed across space and across the body to produce complex queries. The presence of search units is inferred automatically
by tracking the body, lifting the tracks to 3D and comparing to models trained using motion capture data. Our models of short
time scale limb behaviour are built using labelled motion capture set. We show results for a large range of queries applied
to a collection of complex motion and activity. We compare with discriminative methods applied to tracker data; our method
offers significantly improved performance. We show experimental evidence that our method is robust to view direction and is
unaffected by some important changes of clothing. 相似文献
19.
Yi He Fengman Liu Fengze Hou Peng Wu Jun Li Liqiang Cao Dongkai Shangguan 《Microsystem Technologies》2014,20(12):2295-2300
In this paper, we present a complete 700–2,600 MHz RF SiP module for micro base station. This RF SiP design integrates transmitter, receiver, feedback module, ADC/DAC and CLK module. The module consists of two multi-layer organic substrates that are vertically stacked using BGA interconnections. With the integration of 33 chips and about 600 passive components, this RF SiP module retains small form factor and measures 5.25 cm × 5.25 cm × 0.7 cm. The RF input signal transmission path insertion loss is less than 0.34 dB and the return loss is less than ?14 dB at 2.6 GHz. Full load thermal simulation result indicates that each chip junction temperature is below 100 °C. We summarize the RF SiP module design, assembly and simulated thermal characteristics. The proposed RF SiP can generally be characterized by small size, low cost and short development cycle. 相似文献
20.
Chang-Yeol Jung 《Journal of scientific computing》2009,41(1):13-48
We investigate the evolution of the probability distribution function in time for some wave and Maxwell equations in random
media for which the parameters, e.g. permeability, permittivity, fluctuate randomly in space; more precisely, two different
media interface randomly in space. We numerically compute the probability distribution and density for output solutions. The
underlying numerical and statistical techniques are the so-called polynomial chaos Galerkin projection, which has been extensively
used for simulating partial differential equations with uncertainties, and the Monte Carlo simulations. 相似文献