期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Reengineering for parallelism in heterogeneous parallel platforms

J. Daniel García Kevin Hammond Lutz Schubert 《The Journal of supercomputing》2018,74(11):5625-5627

相似文献

2.

Stack and queue integrity on hostile platforms

Devanbu P.T. Stubblebine S.G. 《IEEE transactions on pattern analysis and machine intelligence》2002,28(1):100-108

When computationally intensive tasks have to be carried out on trusted, but limited platforms such as smart cards, it becomes necessary to compensate for the limited resources (memory, CPU speed) by off-loading implementations of data structures onto an available (but insecure, untrusted) fast coprocessor. However, data structures such as stacks, queues, RAMs, and hash tables can be corrupted (and made to behave incorrectly) by a potentially hostile implementation platform or by an adversary knowing or choosing data structure operations. The paper examines approaches that can detect violations of data structure invariants, while placing limited demands on the resources of the secure computing platform 相似文献

3.

MPECA：面向多平台的高效密码算法

李珂悫冯景亚韦永壮赵琪《计算机应用研究》2023,40(8)

目前基于AESNI指令集设计面向6G系统高速加密算法备受业界关注,如ROCCA算法。然而,如何在不支持AESNI指令集环境下设计高速实现的密码算法是目前的研究难点之一。基于AES轮函数特点,设计了一个面向多平台的高效加密算法——MPECA。该算法轮函数采用四轮AES迭代操作,使其在不支持AESNI低端环境下仍可利用固定切片技术进行高效软件实现。特别地,MPECA比ROCCA在初始化和生成标签阶段少花销32个AES轮函数操作和128个XOR操作,使得其在支持AESNI指令的环境下实现更快捷。实验结果表明：与ROCCA相比,Intel平台上,在不支持AESNI环境下,MPECA加密速度提高了3.05倍,在支持AESNI环境下,MPECA加密速度提高了30.64%;在ARM平台上,MPECA加密速度提高了近2.37倍。此外,MPECA密码算法具有较高的安全强度,足以抵抗差分密码攻击、积分攻击及密钥恢复攻击等。相似文献

4.

An efficient parallelization technique for x264 encoder on heterogeneous platforms consisting of CPUs and GPUs

Youngsub Ko Youngmin Yi Soonhoi Ha 《Journal of Real-Time Image Processing》2014,9(1):5-18

H.264/AVC video encoders have been widely used for its high coding efficiency. Since the computational demand proportional to the frame resolution is constantly increasing, it has been of great interest to accelerate H.264/AVC by parallel processing. Recently, graphics processing units (GPUs) have emerged as a viable target for accelerating general purpose applications by exploiting fine-grain data parallelisms. Despite extensive research efforts to use GPUs to accelerate the H.264/AVC algorithm, it has not been successful to achieve any speed-up over the x264 algorithm that is known as the fastest CPU implementation, mainly due to significant communication overhead between the host CPU and the GPU and intra-frame dependency in the algorithm. In this paper, we propose a novel motion-estimation (ME) algorithm tailored for NVIDIA GPU implementation. It is accompanied by a novel pipelining technique, called sub-frame ME processing, to effectively hide the communication overhead between the host CPU and the GPU. Further, we incorporate frame-level parallelization technique to improve the overall throughput. Experimental results show that our proposed H.264 encoder has higher performance than x264 encoder. 相似文献

5.

Large-scale parallelism for constraint-based local search: the costas array case study

Yves Caniou Philippe Codognet Florian Richoux Daniel Diaz Salvador Abreu 《Constraints》2015,20(1):30-56

相似文献

6.

DcaNAS: efficient convolutional network Design for Desktop CPU platforms

Chen Dong Shen Hao Shen Yuchen 《Applied Intelligence》2021,51(7):4353-4366

Applied Intelligence - The hardware platform is a significant consideration in efficient CNN model design. Most lightweight networks are based on GPUs and mobile devices. However, they are usually... 相似文献

7.

A computationally efficient technique for data-clustering

M. Narasimha Murty G. Krishna 《Pattern recognition》1980,12(3):153-158

A computationally efficient agglomerative clustering algorithm based on multilevel theory is presented. Here, the data set is divided randomly into a number of partitions. The samples of each such partition are clustered separately using hierarchical agglomerative clustering algorithm to form sub-clusters. These are merged at higher levels to get the final classification. This algorithm leads to the same classification as that of hierarchical agglomerative clustering algorithm when the clusters are well separated. The advantages of this algorithm are short run time and small storage requirement. It is observed that the savings, in storage space and computation time, increase nonlinearly with the sample size. 相似文献

8.

High performance computing on networks of workstations through the exploitation of function parallelism

Yung-Lin Liu Hau-Yang Cheng Chung-Ta King 《Journal of Systems Architecture》1999,45(15):1307-1321

Network of workstations (NOW) has become a widely accepted form of high-performance parallel computing. As in conventional multicomputers, parallel programs running on such a platform are often written in an SPMD form to exploit data parallelism. Each workstation in a NOW is treated similarly to a processing element in a multicomputer system. However, workstations are far more powerful and flexible than the processing elements in conventional multicomputers. In this paper, we discuss how workstations in a NOW can be used to exploit more parallelism in an SPMD program, especially those induced from concurrent activities. 相似文献

9.

Automated analysis of operators on state tables: A technique for intelligent search

Thomas Kramer 《Journal of Automated Reasoning》1986,2(2):127-153

When searching lists, the current situation is represented by a state table, and changes in the situation are brought about by operators. An automated technique for selection of the right sequence of operators is described, based on a knowledge of the last operator or two, and deriving only a limited selection of subsequent operators, without prior knowledge of the current state. A table is output with, for each feasible sequence of one or two operators, a list of operators that might be tried next. Application of the technique to a robot moving blocks is described, and a LISP implementation is provided. The work described in this paper was performed under Grant No. 70NANB4H0006 from the National Bureau of Standards to the Catholic University of America. 相似文献

10.

DRESS: dimensionality reduction for efficient sequence search

Alexios Kotsifakos Alexandra Stefan Vassilis Athitsos Gautam Das Panagiotis Papapetrou 《Data mining and knowledge discovery》2015,29(5):1280-1311

相似文献

11.

DOTMIX-Pro: faster and more efficient variants of DOTMIX for dynamic-multithreading platforms

Ritchie Robert Bibak Khodakhast 《The Journal of supercomputing》2022,78(1):945-961

The Journal of Supercomputing - Many concurrency platforms offer a processor oblivious model of computation, where the scheduler dynamically distributes work across threads. While this is... 相似文献

12.

A line search improvement of efficient MPC 总被引：1，自引：0，他引：1

Basil Kouvaritakis Mark Cannon 《Automatica》2010,46(11):1920-1924

A recent efficient Model Predictive Control (MPC) strategy uses a univariate Newton-Raphson procedure to solve a dual problem, but is not amenable to warm starting or early termination. By solving a primal problem, the current note proposes a strategy which is more efficient than the Newton-Raphson method and which enables warm starting and early termination. Performance improvements are demonstrated over the Newton-Raphson method and alternative approaches based on quadratic programming or semidefinite programming. 相似文献

13.

FastMatch:an efficient algorithm for XML keyword search

CUI Jian ZHOU Jun-feng GUO Jing-feng 《计算机应用研究》2012,29(6)

现有的XML关键字查询方法包括两步:确定满足特定语义的节点;构建满足特定条件的子树.这种处理方式需要多次扫描关键字倒排表,效率低下.针对这一问题,提出快速分组方法来减少扫描倒排表次数,进而基于快速分组方法提出FastMatch算法.该算法仅需扫描一次关键字倒排表就能构建满足特定条件的子树,从而提高了查询效率.最后通过实验验证了该方法的高效性. 相似文献

14.

An efficient multi balanced cuckoo search K-means technique for segmentation and compression of compound images

Manju V. N. Lenin Fred A. 《Multimedia Tools and Applications》2019,78(11):14897-14915

The images comprise not only photographic images but also graphic and text images, they are determined in magazines, brochures and websites. The segmentation and compression of compound images (for instance, computer-generated images, scanned documents and so on) are tough to the procedure.The existing segmentation and compression techniques do not provide a complete comprehensive solution. To solve the problems in existing techniques, here we segmented the compound images via an optimization depended on K-means clustering technique along with AC (Alternate Current) coefficient method for the dynamic segmentation and then compressed individually. The AC coefficient based segmentation method results in detachment of smooth (background) and non-smooth (text, image and overlapping) areas. Further, the non-smooth part is segmented via the optimization depended on K-means clustering technique. Also, the density of segmented objects is headed applying different compression strategies such as the Huffman coder, arithmetic coder, and Jpeg coders. With the being approaches, the entire projected architecture is implemented in MATLAB and the function of the scheme is measured and equated. Our proposed system achieves better compression ratio (21.16), and also improves the performance for image quality index (0.931574), PSNR (Peak Signal to Noise Ratio) (34.91338), RMSE (Root Mean Square Error) (0.931574), SSIM (Structural Similarity) (0.546882), and SDME (Second Derivative-like Measure of Enhancement) (44.91293) than the available CS K-means algorithm.

相似文献

15.

FastMatch:一种高效的XML关键字查询算法

崔健周军锋郭景峰《计算机应用研究》2012,29(6):2184-2187

现有的XML关键字查询方法包括两步:确定满足特定语义的节点;构建满足特定条件的子树。这种处理方式需要多次扫描关键字倒排表,效率低下。针对这一问题,提出快速分组方法来减少扫描倒排表次数,进而基于快速分组方法提出FastMatch算法。该算法仅需扫描一次关键字倒排表就能构建满足特定条件的子树,从而提高了查询效率。最后通过实验验证了该方法的高效性。相似文献

16.

A refined search tree technique for Dominating Set on planar graphs

Jochen Alber Hongbing Fan Henning Fernau Rolf Niedermeier Fran Rosamond 《Journal of Computer and System Sciences》2005,71(4):385-405

We establish a refined search tree technique for the parameterized DOMINATING SET problem on planar graphs. Here, we are given an undirected graph and we ask for a set of at most k vertices such that every other vertex has at least one neighbor in this set. We describe algorithms with running times O(8^kn) and O(8^kk+n³), where n is the number of vertices in the graph, based on bounded search trees. We describe a set of polynomial time data-reduction rules for a more general “annotated” problem on black/white graphs that asks for a set of k vertices (black or white) that dominate all the black vertices. An intricate argument based on the Euler formula then establishes an efficient branching strategy for reduced inputs to this problem. In addition, we give a family examples showing that the bound of the branching theorem is optimal with respect to our reduction rules. Our final search tree algorithm is easy to implement; its analysis, however, is involved. 相似文献

17.

Virtual-Cache: A cache-line borrowing technique for efficient GPU cache architectures

《Microprocessors and Microsystems》2021

GPUs provide megabytes of registers and shared memories to maintain the contexts for thousands of threads and enable fast data sharing amongst threads of a thread block, respectively. Besides, GPUs employ L1 cache to provide the high bandwidth service for memory requests. However, the average L1 cache capacity per thread is very limited, resulting in cache thrashing which in turn impairs the performance. Meanwhile, many registers and shared memories are unassigned to any warps or thread blocks. Moreover, registers and shared memories that are assigned can be idle when warps or thread blocks are finished. Exploiting the above insights, we propose Virtual-Cache to cost-effectively increase the effective size of L1 cache by utilizing the unassigned and released registers and shared memories as cache-lines in this paper. Specifically, we leverage the unassigned registers and shared memories to serve cache requests directly. Regarding the registers assigned to a warp, they can work as cache-lines after the warp completes the execution and before they are accessed again by a new launched warp. Regarding the shared memories of a thread block, they are enabled to serve cache requests when the thread block is finished till they are referenced by shared memory instructions of the relaunched thread block. The register file, shared memory and L1 cache are physically independent but logically unified as a large virtual cache with redesigned cache-line management. We develop the control and data path for the register file, making the register file accessible for cache requests by borrowing an operand collector to serve the cache requests. We also expand the control and data path for the shared memory to serve the cache requests. Our evaluation results show that Virtual-Cache makes the performance improved by 28% over the previously proposed cache management technique for cache-sensitive applications. 相似文献

18.

A basic-cycle calculation technique for efficient dynamic dataredistribution

Yeh-Ching Chung Ching-Hsien Hsu Sheng-Wen Bai 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(4):359-377

Array redistribution is usually required to enhance algorithm performance in many parallel programs on distributed memory multicomputers. Since it is performed at run-time, there is a performance trade-off between the efficiency of the new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present a basic-cycle calculation technique to efficiently perform BLOCK-CYCLIC(S) to BLOCK-CYCLIC(t) redistribution. The main idea of the basic-cycle calculation technique is, first, to develop closed forms for computing source/destination processors of some specific array elements in a basic-cycle, which is defined as icm(s,t)/gcd(s,t). These closed forms are then used to efficiently determine the communication sets of a basic-cycle. From the source/destination processor/data sets of a basic-cycle, we can efficiently perform a BLOCK-CYCLIC(s) to BLOCK-CYCLIC(t) redistribution. To evaluate the performance of the basic-cycle calculation technique, we have implemented this technique on an IBM SP2 parallel machine, along with the PITFALLS method and the multiphase method. The cost models for these three methods are also presented. The experimental results show that the basic-cycle calculation technique outperforms the PITFALLS method and the multiphase method for most test samples 相似文献

19.

A synchronized design technique for efficient data distribution

《Computers in human behavior》2014

One of the important features of database fragmentation and allocation techniques is the fact that they depend not only on the entries of a database relation, but also on their empirical frequencies of use. Distributed processing is an effective way to improve performance of database systems. However, for a Distributed Database System (DDBS) to function efficiently, fragments of the database need to be allocated carefully at various sites across the relevant communications network. Therefore, fragmentation and proper allocation of fragments across network sites is considered as a key research area in distributed database environment. However, fragments allocation to the most appropriate sites is not an easy task to perform. This paper proposes a synchronized horizontal fragmentation, replication and allocation model that adopts a new approach to horizontally fragment a database relation based on attribute retrieval and update frequency to find an optimal solution for the allocation problem. A heuristic technique to satisfy horizontal fragmentation and allocation using a cost model to minimize the total cost of distribution is developed. Experimental results are consistent with the hypothesis and confirm that the proposed model can efficiently solve dynamic fragmentation and allocation problem in a distributed relational database environment. 相似文献

20.

Hash-bucket search: A fast technique for searching an english spelling dictionary

Douglas Comer Vincent Y. Shen 《Software》1982,12(7):669-682

When a document is prepared using a computer system, it can be checked for spelling errors automatically and efficiently. This paper reviews and compares several methods for searching an English spelling dictionary. It also presents a new technique, hash-bucket search, for searching a static table in general, and a dictionary in particular. Analysis shows that with only a small amount of space beyond that required to store the keys, the hash-bucket search method has many advantages over existing methods. Experimental results with a sample dictionary using double hashing and the hash-bucket techniques are presented. 相似文献