期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Efficient image processing algorithms on the scan line arrayprocessor

Helman D. JaJa J. 《IEEE transactions on pattern analysis and machine intelligence》1995,17(1):47-56

Develops efficient algorithms for low and intermediate level image processing on the scan line array processor, a SIMD machine consisting of a linear array of cells that processes images in a scan line fashion. For low level processing, the authors present algorithms for block DFT, block DCT, convolution, template matching, shrinking, and expanding which run in real-time. By real-time, the authors mean that, if the required processing is based on neighborhoods of size m×m, then the output lines are generated at a rate of O(m) operations per line and a latency of O(m) scan lines, which is the best that can be achieved on this model. The authors also develop an algorithm for median filtering which runs in almost real-time at a cost of O(m log m) time per scan line and a latency of [m/2] scan lines. For intermediate level processing, the authors present optimal algorithms for translation, histogram computation, scaling, and rotation. The authors also develop efficient algorithms for labelling the connected components and determining the convex hulls of multiple figures which run in O(n log n) and O(n log²n) time, respectively. The latter algorithms are significantly simpler and easier to implement than those already reported in the literature for linear arrays 相似文献

2.

A fast algorithm for constructing inverted files on heterogeneous platforms

Zheng Wei Joseph JaJa 《Journal of Parallel and Distributed Computing》2012

Given a collection of documents residing on a disk, we develop a new strategy for processing these documents and building the inverted files extremely quickly. Our approach is tailored for a heterogeneous platform consisting of multicore CPUs and highly multithreaded GPUs. Our algorithm is based on a number of novel techniques, including a high-throughput pipelined strategy, a hybrid trie and B-tree dictionary data structure, dynamic work allocation to CPU and GPU threads, and optimized CUDA indexer implementation. We have performed extensive tests of our algorithm on a single node (two Intel Xeon X5560 Quad-core CPUs) with two NVIDIA Tesla C1060 GPUs attached to it, and were able to achieve a throughput of more than 262 MB/s on the ClueWeb09 dataset. Similar results were obtained for widely different datasets. The throughput of our algorithm is superior to the best known algorithms reported in the literature even when compared to those run on large clusters. 相似文献

3.

Interactive high-resolution isosurface ray casting on multicore processors

Wang Q JaJa J 《IEEE transactions on visualization and computer graphics》2008,14(3):603-614

We present a new method for the interactive rendering of isosurfaces using ray casting on multi-core processors. This method consists of a combination of an object-order traversal that coarsely identifies possible candidate 3D data blocks for each small set of contiguous pixels, and an isosurface ray casting strategy tailored for the resulting limited-size lists of candidate 3D data blocks. While static screen partitioning is widely used in the literature, our scheme performs dynamic allocation of groups of ray casting tasks to ensure almost equal loads among the different threads running on multi-cores while maintaining spatial locality. We also make careful use of memory management environment commonly present in multi-core processors. We test our system on a two-processor Clovertown platform, each consisting of a Quad-Core 1.86-GHz Intel Xeon Processor, for a number of widely different benchmarks. The detailed experimental results show that our system is efficient and scalable, and achieves high cache performance and excellent load balancing, resulting in an overall performance that is superior to any of the previous algorithms. In fact, we achieve an interactive isosurface rendering on a 1024(2) screen for all the datasets tested up to the maximum size of the main memory of our platform. 相似文献

4.

Techniques to audit and certify the long-term integrity of digital archives

Sangchul Song Joseph JaJa 《International Journal on Digital Libraries》2009,10(2-3):123-131

A fundamental requirement for a digital archive is to set up mechanisms that will ensure the authenticity of its holdings in the long term. In this article, we develop a new methodology to address the long-term integrity of digital archives using rigorous cryptographic techniques. Our approach involves the generation of a small-size integrity token for each object, some cryptographic summary information, and a framework that enables cost-effective regular and periodic auditing of the archive’s holdings depending on the policy set by the archive. Our scheme is very general, architecture and platform independent, and can detect with high probability any alteration to an object, including malicious alterations introduced by the archive or by an external intruder. The scheme can be shown to be mathematically correct as long as a small amount of cryptographic information, in the order of 100 KB/year, can be kept intact. Using this approach, a prototype system called ACE (Auditing Control Environment) has been built and tested in an operational large scale archiving environment. 相似文献

5.

Optimal and near-optimal algorithms for generalized intersection reporting on pointer machines

Qingmin Shi Joseph JaJa 《Information Processing Letters》2005,95(3):382-388

We develop efficient algorithms for a number of generalized intersection reporting problems, including orthogonal and general segment intersection, 2D range searching, rectangular point enclosure, and rectangle intersection search. Our results for orthogonal and general segment intersection, 3-sided 2D range searching, and rectangular pointer enclosure problems match the lower bounds for their corresponding standard versions under the pointer machine model. Our results for the remaining problems improve upon the best known previous algorithms. 相似文献

6.

Optimal algorithms on the pipelined hypercube and related networks

JaJa J. Ryu K.W. 《Parallel and Distributed Systems, IEEE Transactions on》1993,4(5):582-591

Parallel algorithms for several important combinatorial problems such as the all nearest smaller values problem, triangulating a monotone polygon, and line packing are presented. These algorithms achieve linear speedups on the pipelined hypercube, and provably optimal speedups on the shuffle-exchange and the cube-connected-cycles for any number p of processors satisfying 1⩽p⩽n/((log³n)(loglog n)²), where n is the input size. The lower bound results are established under no restriction on how the input is mapped into the local memories of the different processors 相似文献

7.

Isosurface extraction and spatial filtering using Persistent OcTree (POT)

Shi Q JaJa J 《IEEE transactions on visualization and computer graphics》2006,12(5):1283-1290

We propose a novel Persistent OcTree (POT) indexing structure for accelerating isosurface extraction and spatial filtering from volumetric data. This data structure efficiently handles a wide range of visualization problems such as the generation of view-dependent isosurfaces, ray tracing, and isocontour slicing for high dimensional data. POT can be viewed as a hybrid data structure between the interval tree and the Branch-On-Need Octree (BONO) in the sense that it achieves the asymptotic bound of the interval tree for identifying the active cells corresponding to an isosurface and is more efficient than BONO for handling spatial queries. We encode a compact octree for each isovalue. Each such octree contains only the corresponding active cells, in such a way that the combined structure has linear space. The inherent hierarchical structure associated with the active cells enables very fast filtering of the active cells based on spatial constraints. We demonstrate the effectiveness of our approach by performing view-dependent isosurfacing on a wide variety of volumetric data sets and 4D isocontour slicing on the time-varying Richtmyer-Meshkov instability dataset. 相似文献

8.

Scalable data parallel algorithms for texture synthesis using Gibbsrandom fields

Bader D.A. JaJa J. Chellappa R. 《IEEE transactions on image processing》1995,4(10):1456-1460

This article introduces scalable data parallel algorithms for image processing. Focusing on Gibbs and Markov random field model representation for textures, we present parallel algorithms for texture synthesis, compression, and maximum likelihood parameter estimation, currently implemented on Thinking Machines CM-2 and CM-5. The use of fine-grained, data parallel processing techniques yields real-time algorithms for texture synthesis and compression that are substantially faster than the previously known sequential implementations. Although current implementations are on Connection Machines, the methodology presented enables machine-independent scalable algorithms for a number of problems in image processing and analysis. 相似文献

9.

Optimized FFT computations on heterogeneous platforms with application to the Poisson equation

Jing Wu Joseph JaJa 《Journal of Parallel and Distributed Computing》2014

We develop optimized multi-dimensional FFT implementations on CPU–GPU heterogeneous platforms for the case when the input is too large to fit on the GPU global memory, and use the resulting techniques to develop a fast Poisson solver. The solver involves memory bound computations for which the large 3D data may have to be transferred over the PCIe bus several times during the computation. We develop a new strategy to decompose and allocate the computation between the GPU and the CPU such that the 3D data is transferred only once to the device memory, and the executions of the GPU kernels are almost completely overlapped with the PCI data transfer. We were able to achieve significantly better performance than what has been reported in previous related work, including over 145 GFLOPS for the three periodic boundary conditions (single precision version), and over 105 GFLOPS for the two periodic, one Neumann boundary conditions (single precision version). The effective bidirectional PCIe bus bandwidth achieved is 9–10 GB/s, which is close to the best possible on our platform. For all the cases tested, the single 3D data PCIe transfer time, which constitutes a lower bound on what is possible on our platform, takes almost 70% of the total execution time of the Poisson solver. 相似文献

10.

An efficient and scalable parallel algorithm for out-of-core isosurface extraction and rendering

Qin Wang Joseph JaJa Amitabh Varshney 《Journal of Parallel and Distributed Computing》2007

We consider the problem of isosurface extraction and rendering for large scale time-varying data. Such data sets have been appearing at an increasing rate especially from physics-based simulations, and can range in size from hundreds of gigabytes to tens of terabytes. Isosurface extraction and rendering is one of the most widely used visualization techniques to explore and analyze such data sets. A common strategy for isosurface extraction involves the determination of the so-called active cells followed by a triangulation of these cells based on linear interpolation, and ending with a rendering of the triangular mesh. We develop a new simple indexing scheme for out-of-core processing of large scale data sets, which enables the identification of the active cells extremely quickly, using more compact indexing structure and more effective bulk data movement than previous schemes. Moreover, our scheme leads to an efficient and scalable implementation on multiprocessor environments in which each processor has access to its own local disk. In particular, our parallel algorithm provably achieves load balancing across the processors independent of the isovalue, with almost no overhead in the total amount of work relative to the sequential algorithm. We conduct a large number of experimental tests on the University of Maryland Visualization Cluster using the Richtmyer–Meshkov instability data set, and obtain results that consistently validate the efficiency and the scalability of our algorithm. 相似文献