首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
高性能多核 DSP 的通信以及并行执行是多核系统设计的关键.文章分析了视频目标跟踪算法各模块的资源消耗,对各部分算法提出了并行计算的思路;提出改进的二值化掩膜法提取背景图像;提出辅助并行结构以使负载均衡;研究了 DSP多核通信的进程间通信(IPC)同步机制,运用流水线并行结构,实现三核同步并行处理系统.通过实验,测试了通信延迟时间,并把目标跟踪程序合理地划分到3个 DSP核中,实现并行处理,达到了实时性要求.  相似文献   

3.
This study presents a design of two-dimensional (2D) discrete cosine transform (DCT) hardware architecture dedicated for High Efficiency Video Coding (HEVC) in field programmable gate array (FPGA) platforms. The proposed methodology efficiently proceeds 2D-DCT computation to fit internal components and characteristics of FPGA resources. A four-stage circuit architecture is developed to implement the proposed methodology. This architecture supports variable size of DCT computation, including 4 × 4, 8 × 8, 16 × 16, and 32 × 32. The proposed architecture has been implemented in System Verilog and synthesized in various FPGA platforms. Compared with existing related works in literature, this proposed architecture demonstrates significant advantages in hardware cost and performance improvement. The proposed architecture is able to sustain 4 K@30 fps ultra high definition (UHD) TV real-time encoding applications with a reduction of 31–64% in hardware cost.  相似文献   

4.
In this paper, the backward-propagation neural network (BPNN) technique and just-noticeable difference (JND) model are incorporated into a block-wise discrete cosine transform (DCT)-based scheme to achieve effective blind image watermarking. To form a block structure in the DCT domain, we partition a host image into non-overlapped blocks of size 8 × 8 and then apply DCT to each block separately. By referring to certain DCT coefficients over a 3 × 3 grid of blocks, the BPNN can offer adequate predictions of designated coefficients inside the central block. The watermarking turns out to be a process of adjusting the relationship between the intended coefficients and their BPNN predictions subject to the JND. Experimental results show that the proposed scheme is able to withstand a variety of image processing attacks. Compared with two other schemes that also utilize inter-block correlations, the proposed one apparently exhibits superior robustness and imperceptibility under the same payload capacity.  相似文献   

5.
多核系统是当今处理器发展的主方向,如何合理高效进行任务调度,确保全部处理核心处于有效工作状态是当今多核系统研究的一个重要方向.多核任务调度的关键难点在于发掘任务并行性,为解决这一问题,本文借鉴指令级多线程思想,结合多核系统中任务的粗粒度特性,提出了一种新型的粗粒度多线程多核体系结构,建立了多线程取指策略、资源分配策略和线程切换机制,同步完成了这一结构多线程调度器电路设计.围绕此调度器构建了一个粗粒度多核计算平台,并在FPGA芯片上进行硬件实现,实验结果表明,该设计方案相对于单线程使多核计算平台的任务并行度平均提高约34.29%.  相似文献   

6.
Block matching motion estimation is the heart of video coding system. It leads to a high compression ratio, whereas it is time consuming and calculation intensive. Many fast search block matching motion estimation algorithms have been developed in order to minimize search positions and speed up computation but they do not take into account how they can be effectively implemented by hardware. In this paper, we propose an efficient hardware architecture of the fast line diamond parallel search (LDPS) algorithm with variable block size motion estimation (VBSME) for H.264/AVC video coding system. The design is described in VHDL language, synthesized to Altera Stratix III FPGA and to TSMC 0.18 μm standard-cells. The throughput of the hardware architecture reaches a processing rate up to 78 millions of pixels per second at 83.5 MHz frequency clock and uses only 28 kgates when mapped to standard-cells. Finally, a system on a programmable chip (SoPC) implementation and validation of the proposed design as an IP core is presented using the embedded video system.  相似文献   

7.
The dual-loop shunt regulator using current-sensing feedback techniques is proposed in this paper. This architecture adopts a voltage and current loops to increase the transient response of the proposed shunt regulator. The maximum output current of the proposed shunt regulator is 180 mA at a 1.8 V output. Moreover the architecture of the proposed shunt regulator can suppress the stray effect which is from power supply. The prototype of the proposed shunt regulator is fabricated by the Taiwan Semiconductor Manufacturing Corporation (TSMC) 0.35-μm CMOS 2P4M process. The active area is only 579×355 μm2.  相似文献   

8.
Enabled by the emerging three-dimensional (3D) integration technologies, 3D integrated computing platforms that stack high-density DRAM die(s) with a logic circuit die appear to be attractive for memory-hungry applications such as multimedia signal processing. This paper considers the design of motion estimation accelerator under a 3D logic-DRAM integrated heterogeneous multi-core system framework. In this work, we develop one specific DRAM organization and image frame storage strategy geared to motion estimation. This design strategy can seamlessly support various motion estimation algorithms and variable block size with high energy efficiency. With a DRAM performance modeling/estimation tool and ASIC design at 65 nm, we demonstrate the energy efficiency of such 3D integrated motion estimation accelerators with a case study on HDTV multi-frame motion estimation.  相似文献   

9.
For mobile intelligent robot applications, an 81.6 GOPS object recognition processor is implemented. Based on an analysis of the target application, the chip architecture and hardware features are decided. The proposed processor aims to support both task-level and data-level parallelism. Ten processing elements are integrated for the task-level parallelism and single instruction multiple data (SIMD) instruction is added to exploit the data-level parallelism. The Memory-Centric network-on-chip7 (NoC) is proposed to support efficient pipelined task execution using the ten processing elements. It also provides coherence and consistency schemes tailored for 1-to-N and M-to-1 data transactions in a task-level pipeline. For further performance gain, the visual image processing memory is also implemented. The chip is fabricated in a 0.18- $mu$m CMOS technology and computes the key-point localization stage of the SIFT object recognition twice faster than the 2.3 GHz Core 2 Duo processor.   相似文献   

10.
Local processing, which is a dominant type of processing in image and video applications, requires a huge computational power to be performed in real-time. However, processing locality, in space and/or in time, allows to exploit data parallelism and data reusing. Although it is possible to exploit these properties to achieve high performance image and video processing in multi-core processors, it is necessary to develop suitable models and parallel algorithms, in particular for non-shared memory architectures. This paper proposes an efficient and simple model for local image and video processing on non-shared memory multi-core architectures. This model adopts a single program multiple data approach, where data is distributed, processed and reused in an optimal way, regarding the data size, the number of cores and the local memory capacity. The model was experimentally evaluated by developing video local processing algorithms and programming the Cell Broadband Engine multi-core processor, namely for advanced video motion estimation and in-loop deblocking filtering. Furthermore, based on these experiences it is also addressed the main challenges of vectorization, and the reduction of branch mispredictions and computational load imbalances. The limits and advantages of the regular and adaptive algorithms are also discussed. Experimental results show the adequacy of the proposed model to perform local video processing, and that real-time is achieved even to process the most demanding parts of advanced video coding. Full-pixel motion estimation is performed over high resolution video (720×576 pixels) at a rate of 30 frames per second, by considering large search areas and five reference frames.  相似文献   

11.
Two-dimensional discrete cosine transforms are used in the core transformations in all profiles of the H.264/Advanced video coding (AVC) standard. In this paper, implementing the resource sharing of high throughput 4 × 4 and 8 × 8 forward and inverse integer transforms for high definition H.264 is presented. It is shown that the 4 × 4 forward/inverse transform can be obtained from 8 × 8 forward/inverse transform using selective data input and data arrangement at intermediate stages. Fast 8 × 8 forward and inverse transform is implemented using matrix decomposition and matrix operation such as Kronecker product and direct sum. The proposed implementation does not require any transpose memory and has a dual clocked pipeline structure. Compared with existing designs, the gate count is reduced by 27.7% in the proposed design. The maximum operating frequency of the proposed system is approx. 1.3 GHz, while the throughput is 7 G and 18.7 G pixels/s for 4 × 4 and 8 × 8 forward integer transforms, respectively. The proposed design can be used for real time H.264/AVC high definition processing owing to its high throughput and low hardware cost.  相似文献   

12.
Constructing on-chip or inter-silicon (inter-die/inter-chip) networks to connect multiple processors extends the system capability and scalability. It is a key issue to implement a flexible router that can fit into various application scenarios. This paper proposes a multi-mode adaptable router that can support both circuit and wormhole switching with supplying flexible working strategies for specific traffic patterns in diverse applications. The limitation of mono-mode switched routers is shown at first, followed by algorithm exploration in the proposed router for choosing the proper working strategy in a specific network. We then present the performance improvement when applying the mixed circuit/wormhole switching mode to different applications, and analyze the image decoding as a case study. The multi-mode router has been implemented with different configurations in a 65 nm CMOS technology. The one with 8-bit flit width is demonstrated together with a multi-core processor to show the feasibility. Working at 350 MHz, the average power consumption of the whole system is 22 mW.  相似文献   

13.
This paper proposes a method for finger alphabet recognition from backhand images with signer-independence. Input images that are divided into fist sign and non-fist sign groups should be analyzed and processed in different ways. Finger alphabets in the fist group are represented by a one-dimensional signal that represents the external hand boundaries. Its low and high frequency components are then extracted by discrete wavelet transform, which are key features for recognition. The non-fist sign images, which are radically digitized into a 20 × 20 block mask in terms of the hand geometry, due to the hand’s physical structure, can be recognized by the patterns of the occupied blocks. The experimental results show that the proposed method has a high likelihood of differentiating twenty-three static finger alphabets of backhand images. The proposed method reaches an improvement of 27.86% in recognition accuracy on a significant dataset of fist signs that includes multiple users, while the statistical distribution of the area level run length algorithm outperforms previous forehand approaches by 89.38% in recognition accuracy.  相似文献   

14.
Due to the wide diffusion of JPEG coding standard, the image forensic community has devoted significant attention to the development of double JPEG (DJPEG) compression detectors through the years. The ability of detecting whether an image has been compressed twice provides paramount information toward image authenticity assessment. Given the trend recently gained by convolutional neural networks (CNN) in many computer vision tasks, in this paper we propose to use CNNs for aligned and non-aligned double JPEG compression detection. In particular, we explore the capability of CNNs to capture DJPEG artifacts directly from images. Results show that the proposed CNN-based detectors achieve good performance even with small size images (i.e., 64 × 64), outperforming state-of-the-art solutions, especially in the non-aligned case. Besides, good results are also achieved in the commonly-recognized challenging case in which the first quality factor is larger than the second one.  相似文献   

15.
Conventional data-aware structure SRAMs consume unnecessary dynamic power during the read phase due to the read-half-select issue. In this paper, a 9T-based read-half-select disturb-free SRAM architecture with the cross-point data-aware write strategy is proposed. Based on the proposed write-half-select and read-half-select disturb-free strategy, our 9T bitcell structure improves the read and write SNM by 2.5X and 2.4X compared to traditional bitcells. Furthermore, the proposed strategy and 9T bitcell structure can reduce the read power dissipation on bitline of the SRAM array by 5.14X compared with traditional SRAMs. Based on the proposed architecture, a 16Kb SRAM is fabricated in a 130 nm CMOS which is fully functional from 1.2 V down to 0.33 V. The minimal energy per cycle is 11.8pJ at 0.35 V. The power consumption at 0.33 V is 2.5 µW with 175 kHz. The proposed SRAM has 1.5X and 4.2X less total power and leakage power than other works.  相似文献   

16.
《Optical Fiber Technology》2013,19(4):325-329
We present a novel co-axial dual core large-mode-area (LMA) fiber design for refractive index sensing. In a dual-core fiber there is resonant coupling between the two cores, which is strongly affected by the refractive index (RI) of the outermost region. The transmittance of the fiber, therefore, varies sharply with the refractive index of surrounding medium. This characteristic of the proposed structure has been utilized to design a RI sensor. We have analyzed the structure by using the transfer matrix method. Our numerical results show that the proposed sensor is highly sensitive with the resolution of 2.0 × 10−6 around nex = 1.44376. Effect of design parameters on sensitivity of the proposed sensor has also been investigated.  相似文献   

17.
Exploiting specific properties of the algorithm, a high-throughput pipelined architecture is introduced to implement the H.264/AVC deblocking filter. The architecture was synthesized in 0.18 μm technology and the clock frequency and area are 400 MHz and 16.8 Kgates, respectively. Also, it is able to filter 217 and 55 Frames per second (Fps) for Full- and Ultra-HD videos, respectively. The introduced architecture outperforms similar ones in terms of frequency (1.8× up to 4×), throughput, (1.5× up to 3.8×), and Fps. Moreover, extensions to support different sample bit-depths and chroma formats are included. Also, experimental results for different FPGA families are offered.  相似文献   

18.
In this paper an image tamper localisation scheme is proposed in which authentication bits of a 2 × 2 image block are generated using the chaotic maps. Further the scheme is improved by including a self-recovery method to recover the tampered regions. To improve the quality of the recovered image, two different sets of restoration bits of a block are generated and each one is embedded into randomly selected distinct blocks. The proposed tamper detection scheme performs better than some of the recent schemes proposed by the researchers. The experimental results demonstrate the accuracy and fragility of the tamper detection scheme, and the efficacy of the recovery method.  相似文献   

19.
《Microelectronics Journal》2007,38(4-5):620-624
Reconstructed surfaces on Sb-irradiated GaAs(0 0 1) formed by molecular beam epitaxy have been studied by in-situ scanning tunneling microscopy (STM). The reflection high-energy electron diffraction patterns showed (2×3) [or weak (4×3)] structure. The step density was about five times higher than that of GaAs(0 0 1)-c(4×4) surface. It was found that there were swinging dimer rows along to the [1  0] direction, which seemed not to consist of a specified reconstruction. We proposed two (2×3)-structure models for these swinging dimers. By first-principles calculation, we found that the proposed models were stable and with energy difference was 0.17 eV, indicating the coexistence of the two structures. Moreover, we proposed three (4×3) reconstruction models based on these (2×3) models. The electron counting rule was applied for these models, indicating that there was an excessive amount of electrons. By two bias-alternative STM images, it was found that the many spots appear only in empty-state. These might be segregated Ga or Sb cluster and strongly relate to the excessive amount of electrons.  相似文献   

20.
Intuitively, integrating information from multiple visual cues, such as texture, stereo disparity, and image motion, should improve performance on perceptual tasks, such as object detection. On the other hand, the additional effort required to extract and represent information from additional cues may increase computational complexity. In this work, we show that using biologically inspired integrated representation of texture and stereo disparity information for a multi-view facial detection task leads to not only improved detection performance, but also reduced computational complexity. Disparity information enables us to filter out 90% of image locations as being less likely to contain faces. Performance is improved because the filtering rejects 32% of the false detections made by a similar monocular detector at the same recall rate. Despite the additional computation required to compute disparity information, our binocular detector takes only 42 ms to process a pair of 640×480 images, 35% of the time required by the monocular detector. We also show that this integrated detector is computationally more efficient than a detector with similar performance where texture and stereo information is processed separately.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号