首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
In the part 2 of advanced Audio Video coding Standard (AVS-P2), many efficient coding tools are adopted in motion compensation, such as new motion vector prediction, symmetric matching, quarter precision interpolation, etc. However, these new features enormously increase the computational complexity and the memory bandwidth requirement, which make motion compensation a difficult component in the implementation of the AVS HDTV decoder. This paper proposes an efficient motion compensation architecture for AVS-P2 video standard up to the Level 6.2 of the Jizhun Profile. It has a macroblock-level pipelined structure which consists of MV predictor unit, reference fetch unit and pixel interpolation unit. The proposed architecture exploits the parallelism in the AVS motion compensation algorithm to accelerate the speed of operations and uses the dedicated design to optimize the memory access. And it has been integrated in a prototype chip which is fabricated with TSMC 0.18-#m CMOS technology, and the experimental results show that this architecture can achieve the real time AVS-P2 decoding for the HDTV 1080i (1920 - 1088 4 : 2 : 0 60field/s) video. The efficient design can work at the frequency of 148.5MHz and the total gate count is about 225K.  相似文献   

2.
Image processing is a type of memory-access-intensive application and is applied in many fields.Logic operations are very simple ones in image processing.During these operations,memory access takes a majority of the total time consumed,which puts a great pressure on memory access speed and bandwidth.However,in traditional von Neumann architecture,memory access is the inherent bottleneck of the system;that is,the speed of memory’s data supply is far lower than the data request of processor.Memristor is considered to be the fourth circuit element after resistor,capacitor and inductor.It has the capacity of both processing and memory,which supplies a new idea for solving the"memory wall"problem.In this paper,memristor is used to build an architecture combining computing and memory,where the memory has the ability to handle some simple image processing operations.This architecture can reduce readings and writings of memory effectively,which saves memory bandwidth thus improving the efficiency of the system.Logic operations of images are considered in this paper to validate the architecture.The experimental results and theoretical analysis indicate that the architecture can reduce memory access effectively.  相似文献   

3.
A Frame Based Architecture for Information Integration in CIMS   总被引:1,自引:0,他引:1       下载免费PDF全文
This paper foumulates and architecture for information integration in computer integrated manufacturing systems(CIMS).The architecture takes the frame structure as single link among applications and between applications and physical storage.All the advantages in form features based intgrated systems can be found in the frame-based architecture as the frame structrue here takes from features as its primitives.But other advantage,e.g.,default knowledge and dynamic domain knowledge can be attached to frames and the frame structure is easy to be changed and extended,which cannot be found ing form reatures based systems,can also be showed in frame based architectures as the frame structure is a typical knowledge representation scheme in artificial intelligence and many researches and interests have put on it.  相似文献   

4.
Graphics processing units (GPUs) have an SIMD architecture and have been widely used recently as powerful general-purpose co-processors for the CPU. In this paper, we investigate efficient GPU-based data cubing because the most frequent operation in data cube computation is aggregation, which is an expensive operation well suited for SIMD parallel processors. H-tree is a hyper-linked tree structure used in both top-k H-cubing and the stream cube. Fast H-tree construction, update and real-time query response are crucial in many OLAP applications. We design highly efficient GPU-based parallel algorithms for these H-tree based data cube operations. This has been made possible by taking effective methods, such as parallel primitives for segmented data and efficient memory access patterns, to achieve load balance on the GPU while hiding memory access latency. As a result, our GPU algorithms can often achieve more than an order of magnitude speedup when compared with their sequential counterparts on a single CPU. To the best of our knowledge, this is the first attempt to develop parallel data cubing algorithms on graphics processors.  相似文献   

5.
In this paper,a survey of octree representation and its applications in CAD is presented.The octree representation may be categorized as pure octree representation and polytree(or extended octree),and the latter is actually a boundary representation decomposed by octree.Linear octree which is a variant of regular octree representation has the advantage of saving memory space.The mapping between Cartesian coordinates and node addresses in linear octree is discussed.Then,algorithms for converting a boundary representation of 3D object into an octree are investiged and major approaches for transforming an octree encoded object are presented.After that,some of the applications of octree representation in CAD are listed,in particular,the applications in solid modeling,in accelerating ray tracing and in generating meshes for FEM.  相似文献   

6.
CM (contelat management) is a strategic discipline that should support the information assets of a company. Although there are some technological instruments structuring this work, the methods that have been used must have improvement to a better use. In this sense, IA (information architecture), as a process that helps users to manage and find information, can collaborate with the organization of this assets allowing the best identification and categorization of information, as well as providing improvements in the website navigation in Intranets and Internet. This article introduces part of a research that deals with the use of IA for developing and structuring the project of Dataprev about the content management for Brazilian social security system.  相似文献   

7.
The combination of growing transistor counts and limited power budget within a silicon die leads to the utilization wall problem (a.k.a. "Dark Silicon"), that is only a small fraction of chip can run at full speed during a period of time. Designing accelerators for specific applications or algorithms is considered to be one of the most promising approaches to improving energy-efficiency. However, most current design methods for accelerators are dedicated for certain applications or algorithms, which greatly constrains their applicability. In this paper, we propose a novel general-purpose many-accelerator architecture. Our contributions are two-fold. Firstly, we propose to cluster dataflow graphs (DFGs) of hotspot basic blocks (BBs) in applications. The DFG clusters are then used for accelerators design. This is because a DFC is the largest program unit which is not specific to a certain application. We analyze 17 benchmarks in SPEC CPU 2006, acquire over 300 DFGs hotspots by using LLVM compiler tool, and divide them into 15 clusters based on graph similarity. Secondly, we introduce a function instruction set architecture (FISC) and illustrate how DFG accelerators can be integrated with a processor core and how they can be used by applications. Our results show that the proposed DFG clustering and FISC design can speed up SPEC benchmarks 6.2X on average.  相似文献   

8.
This paper proposes a value compression memory architecture for QRS detection in ultra-low-power ECG sensor nodes. Based on the exploration of value spatial locality in the most critical preprocessing stage of the ECG algorithm, a cost efficient compression strategy, which reorganizes several adjacent sample values into a base value with several displacements, is proposed. The displacements will be half or quarter scale quantifications; as a result, the storage size is reduced. The memory architecture saves memory space by storing compressed data with value spatial locality into a compressed memory section and by using a small, uncompressed memory section as backup to store the uncompressed data when a value spatial locality miss occurs. Furthermore,a low-power accession strategy is proposed to achieve low-power accession. An embodiment of the proposed memory architecture has been evaluated using the MIT/BIH database, the proposed memory architecture and a low-power accession strategy to achieve memory space savings of 32.5% and to achieve a 68.1% power reduction with a negligible performance reduction of 0.2%.  相似文献   

9.
The decades-old synchronous memory bus interface has restricted many innovations in the memory system, which is facing various challenges (or walls) in the era of multi-core and big data. In this paper, we argue that a message- based interface should be adopted to replace the traditional bus-based interface in the memory system. A novel message interface based memory system called MIMS is proposed. The key innovation of MIMS is that processors communicate with the memory system through a universal and flexible message packet interface. Each message packet is allowed to encapsulate multiple memory requests (or commands) and additional semantic information. The memory system is more intelligent and active by equipping with a local buffer scheduler, which is responsible for processing packets, scheduling memory requests, preparing responses, and executing specific commands with the help of semantic information. Under the MIMS framework, many previous innovations on memory architecture as well as new optimization opportunities such as address compression and continuous requests combination can be naturally incorporated. The experimental results on a 16-core cycle-detailed simulation system show that: with accurate granularity message, MIMS can improve system performance by 53.21% and reduce energy delay product (EDP) by 55.90%. Furthermore, it can improve effective bandwidth utilization by 62.42% and reduce memory access latency by 51% on average.  相似文献   

10.
The quantity of computer applications is increasing dramatically as the computer industry prospers. Meanwhile, even for one application, it has different requirements of performance and power in different scenarios. Although various processors with different architectures emerge to fit for the various applications in different scenarios, it is impossible to design a dedicated processor to meet all the requirements. Furthermore, dealing with uncertain processors significantly aggravates the burden of programmers and system integrators to achieve specific performance/power. In this paper, we propose elastic architecture (EA) to provide a uniform computing platform with high elasticity, i.e., the ratio of worst-case to best-case performance/power/performance-power trade-off, which can meet different requirements for different applications. It is achieved by dynamically adjusting architecture parameters (instruction set, branch predictor, data path, memory hierarchy, concurrency, status~zcontrol, and so on) on demand. The elasticity of our prototype implementation of EA, as Sim-EA, ranges from 3.31 to 14.34, with 5.41 in arithmetic average, for SPEC CPU2000 benchmark suites, which provides great flexibility to fulfill the different performance and power requirements in different scenarios. Moreover, Sim-EA can reduce the EDP (energy-delay product) for 31.14% in arithmetic average compared with a baseline fixed architecture. Besides, some subsequent experiments indicate a negative correlation between application intervals' lengths and their elasticities.  相似文献   

11.
In this paper, we proposed a multi-core processor, which is based on system-on-chip (SoC) architecture and established by configurable processor via Tensilica Xtensa LX2. The purpose of this paper is to describe the heterogeneous configurable dual-core processor, in which one core is responsible host operating control for the system and the other is as an extension of digital signal processing applications. However, the designed core not only owns its local memory, but also shares a common data-memory. We also put virtual memory in this proposed processors, this addition memory allows processor easily to handle a more complex application programs while two cores are able to share a unified data-memory in different kinds tasks, simultaneously. The advantages of the proposed structure can avoid and reduce many hard-wired of memory and interface respectively. For bus managing, a single bus as interface is proved. In this bus system an arbitration mechanism is added to handle the communication between cores and to distribute the priority of access request, in order to ensure that those cores operation under synchronously.  相似文献   

12.
The many-accelerator architecture, mostly composed of general-purpose cores and accelerator-like function units (FUs), becomes a great alternative to homogeneous chip multiprocessors (CMPs) for its superior power-efficiency. However, the emerging many-accelerator processor shows a much more complicated memory accessing pattern than general purpose processors (GPPs) because the abundant on-chip FUs tend to generate highly-concurrent memory streams with distinct locality and bandwidth demand. The disordered memory streams issued by diverse accelerators exhibit a mutual- interference behavior and cannot be efficiently handled by the orthodox main memory interface that provides an inflexible data fetching mode. Unlike the traditional DRAM memory, our proposed Aggregation Memory System (AMS) can function adaptively to the characterized memory streams from different FUs, because it provides the FUs with different data fetching sizes and protects their locality in memory access by intelligently interleaving their data to memory devices through sub-rank binding. Moreover, AMS can batch the requests without sub-rank conflict into a read burst with our optimized memory scheduling policy. Experimental results from trace-based simulation show both conspicuous performance boost and energy saving brought by AMS.  相似文献   

13.
This paper presents a novel architecture of iterative receivers with two layers of iterations for turbo coded multiple-input and multiple-output orthogonal frequency-division multiplexing(MIMO-OFDM)systems,where soft messages are passed not only between the MIMO detector and the turbo decoder,but also between the two component decoders within the turbo decoder.We first derive the factor graph representation of a turbo coded system as a basic building block for developing the iterative receivers.Then,a new soft message passing schedule over the factor graph is proposed,resulting in the proposed dual-turbo receiver architecture(DTRA).In DTRA,the MIMO detector and the turbo decoder work concurrently,and the soft messages for both layers of iterations are updated instantaneously,instead of the block-based exchange of soft messages in the conventional iterative receivers.In so doing,the processing latency can be greatly reduced while low computational complexity can be achieved.  相似文献   

14.
We present a new data structure for the representation of an integrated circuit layout. It is a modified HV/VH tree using arrays as the primary container in bisector lists and leaf nodes. By grouping and sorting objects within these arrays together with a customized binary search algorithm, our new data structure provides excellent performance in both memory usage and region query speed. Experimental results show that in comparison with the original HV/VH tree, which has been regarded as the best layout data structure to date, the new data structure uses much less memory and can become 30% faster on region query.  相似文献   

15.
It has been shown that remote monitoring of pulmonary activity can be achieved using ultra-wideband (UWB) systems,which shows promise in home healthcare, rescue, and security applications. In this paper, we first present a multi-ray propagation model for UWB signal, which is traveling through the human thorax and is reflected on the air/dry-skin/fat/muscle interfaces. A geometry-based statistical channel model is then developed for simulating the reception of UWB signals in the indoor propagation environment. This model enables replication of time-varying multipath profiles due to the displacement of a human chest. Subsequently,a UWB distributed cognitive radar system (UWB-DCRS) is developed for the robust detection of chest cavity motion and the accurate estimation of respiration rate. The analytical framework can serve as a basis in the planning and evaluation of future measurement programs. We also provide a case study on how the antenna beamwidth affects the estimation of respiration rate based on the proposed propagation models and system architecture.  相似文献   

16.
A family of piecewise rational quintic interpolation is presented. Each interpolation of the family, which is identified uniquely by the value of a parameter αi, is of C2 continuity without solving a system of consistency equations for the derivative values at the knots, and can be expressed by the basis functions. Interpolant is of O(hr) accuracy when f(x)?Cr[a,b], and the errors have only a small floating for a big change of the parameter αi, it means the interpolation is stable for the parameter. The interpolation can preserve the shape properties of the given data, such as monotonicity and convexity, and a proper choice of parameter αi is given.  相似文献   

17.
The management of memory coherence is an important problem in distributed shared memory(DSM)system.In a cache-based coherence DSM system using linked list structure,the key to maintaining the coherence and improving system performance is how to manage the owner in the linked list.This paper presents the design of a new management protocol-NONH(New-Owner New-Head)and its performance evaluation.The analysis results show that this protocol can improve the scalability and performence of a coherent DSM system using linked list.It is also suitable for managing the cache coherency in tree-like hierarchical architecture.  相似文献   

18.
VoD (video on demand) service is regard as one of the most important services in next decade. It requires high speed and huge bandwidth to guarantee the QoS (quality of service). EPON (Ethernet passive optical network) is regarded as one of the best solutions on access network, due to high speed and low cost. The star-ring EPON architecture is an evolution of EPON which provides better local transmission and fault tolerance capability. In addition, the CDN (content delivery network) mechanism, in which the video content is cached at a location closer to the user, is a widely used methodology to reduce the latency of VoD service. Therefore, in this paper, we propose a mechanism which combines the advantage of star-ring based EPON architecture and CDN mechanism to improve QoS. We design a new Sub-OLT (optical line terminal) which includes storage to store video files and serve at local. Thus, it can reduce the bandwidth between OLT and ONU (optical network unit). Simulation results have shown that our proposed mechanism can improve the system performance and QoS in terms of packet delay and jitter.  相似文献   

19.
This paper presents a GPU-based real-time raycasting algorithm for piecewise algebraic surfaces in terms of tensor product B-splines.3DDDA and depth peeling algorithms are employed to traverse the piecewise surface patches along each ray.The intersection between the ray and the patch is reduced to the root-finding problem of the univariate Bernstein polynomial.The polynomial is obtained via Chebyshev sampling points interpolation.An iterative and unconditionally convergent algorithm called B′ezier point insertion is proposed to find the roots of the univariate polynomials.The B′ezier point insertion is robust and suitable for the SIMD architecture of GPU.Experimental results show that the proposed root-finding algorithm performs better than other root-finding algorithms,such as B′ezier clipping and B-spline knot insertion.Our rendering algorithm can display thousands of piecewise algebraic patches of degrees 6–9 in real time and can achieve the semi-transparent rendering interactively.  相似文献   

20.
Motivated by the converse Lyapunov technique for investigating converse results of semistable switched systems in control theory,this paper utilizes a constructive induction method to identify a cost function for performance gauge of an average,multi-cue multi-choice(MCMC),cognitive decision making model over a switching time interval.It shows that such a constructive cost function can be evaluated through an abstract energy called Lyapunov function at initial conditions.Hence,the performance gauge problem for the average MCMC model becomes the issue of finding such a Lyapunov function,leading to a possible way for designing corresponding computational algorithms via iterative methods such as adaptive dynamic programming.In order to reach this goal,a series of technical results are presented for the construction of such a Lyapunov function and its mathematical properties are discussed in details.Finally,a major result of guaranteeing the existence of such a Lyapunov function is rigorously proved.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号