期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

MU-GAN: Facial Attribute Editing Based on Multi-Attention Mechanism

Ke Zhang Yukun Su Xiwang Guo Liang Qi Zhenbing Zhao 《IEEE/CAA Journal of Automatica Sinica》2021,8(9):1614-1626

Facial attribute editing has mainly two objectives: 1) translating image from a source domain to a target one, and 2) only changing the facial regions related to a target attribute and preserving the attribute-excluding details. In this work, we propose a multi-attention U-Net-based generative adversarial network (MU-GAN). First, we replace a classic convolutional encoder-decoder with a symmetric U-Net-like structure in a generator, and then apply an additive attention mechanism to build attention-based U-Net connections for adaptively transferring encoder representations to complement a decoder with attribute-excluding detail and enhance attribute editing ability. Second, a self-attention (SA) mechanism is incorporated into convolutional layers for modeling long-range and multi-level dependencies across image regions. Experimental results indicate that our method is capable of balancing attribute editing ability and details preservation ability, and can decouple the correlation among attributes. It outperforms the state-of-the-art methods in terms of attribute manipulation accuracy and image quality. Our code is available at https://github.com/SuSir1996/MU-GAN. 相似文献

2.

A Hybrid Ensemble Deep Learning Approach for Early Prediction of Battery Remaining Useful Life

下载免费PDF全文

Qing Xu Min Wu Edwin Khoo Zhenghua Chen Xiaoli Li 《IEEE/CAA Journal of Automatica Sinica》2023,10(1):177-187

Accurate estimation of the remaining useful life (RUL) of lithium-ion batteries is critical for their large-scale deployment as energy storage devices in electric vehicles and stationary storage. A fundamental understanding of the factors affecting RUL is crucial for accelerating battery technology development. However, it is very challenging to predict RUL accurately because of complex degradation mechanisms occurring within the batteries, as well as dynamic operating conditions in practical applications. Moreover, due to insignificant capacity degradation in early stages, early prediction of battery life with early cycle data can be more difficult. In this paper, we propose a hybrid deep learning model for early prediction of battery RUL. The proposed method can effectively combine handcrafted features with domain knowledge and latent features learned by deep networks to boost the performance of RUL early prediction. We also design a non-linear correlation-based method to select effective domain knowledge-based features. Moreover, a novel snapshot ensemble learning strategy is proposed to further enhance model generalization ability without increasing any additional training cost. Our experimental results show that the proposed method not only outperforms other approaches in the primary test set having a similar distribution as the training set, but also generalizes well to the secondary test set having a clearly different distribution with the training set. The PyTorch implementation of our proposed approach is available athttps://github.com/batteryrul/battery_rul_early_prediction. 相似文献

3.

A Survey on Negative Transfer

Wen Zhang Lingfei Deng Lei Zhang Dongrui Wu 《IEEE/CAA Journal of Automatica Sinica》2023,10(2):305-329

Transfer learning(TL) utilizes data or knowledge from one or more source domains to facilitate learning in a target domain. It is particularly useful when the target domain has very few or no labeled data, due to annotation expense, privacy concerns,etc. Unfortunately, the effectiveness of TL is not always guaranteed. Negative transfer(NT), i.e., leveraging source domain data/knowledge undesirably reduces learning performance in the target domain, and has been a long-standing and challenging pro... 相似文献

4.

Multi-UAVs Collaborative Path Planning in the Cramped Environment

Siyuan Feng Linzhi Zeng Jining Liu Yi Yang Wenjie Song 《IEEE/CAA Journal of Automatica Sinica》2024,11(2):529-538

Due to its flexibility and complementarity, the multi-UAVs system is well adapted to complex and cramped workspaces, with great application potential in the search and rescue (SAR) and indoor goods delivery fields. However, safe and effective path planning of multiple unmanned aerial vehicles (UAVs) in the cramped environment is always challenging: conflicts with each other are frequent because of high-density flight paths, collision probability increases because of space constraints, and the search space increases significantly, including time scale, 3D scale and model scale. Thus, this paper proposes a hierarchical collaborative planning framework with a conflict avoidance module at the high level and a path generation module at the low level. The enhanced conflict-base search (ECBS) in our framework is improved to handle the conflicts in the global path planning and avoid the occurrence of local deadlock. And both the collision and kinematic models of UAVs are considered to improve path smoothness and flight safety. Moreover, we specifically designed and published the cramped environment test set containing various unique obstacles to evaluating our framework performance thoroughly. Experiments are carried out relying on Rviz, with multiple flight missions: random, opposite, and staggered, which showed that the proposed method can generate smooth cooperative paths without conflict for at least 60 UAVs in a few minutes. The benchmark and source code are released in

https://github.com/inin-xingtian/multi-UAVs-path-planner

.

相似文献

5.

Document segmentation and classification into musical scores and text

Fabrizio Pedersoli George Tzanetakis 《International Journal on Document Analysis and Recognition》2016,19(4):289-304

A new algorithm for segmenting documents into regions containing musical scores and text is proposed. Such segmentation is a required step prior to applying optical character recognition and optical music recognition on scanned pages that contain both music notation and text. Our segmentation technique is based on the bag-of-visual-words representation followed by random block voting (RBV) in order to detect the bounding boxes containing the musical score and text within a document image. The RBV procedure consists of extracting a fixed number of blocks whose position and size are sampled from a discrete uniform distribution that “over”-covers the input image. Each block is automatically classified as either coming from musical score or text and votes with a particular posterior probability of classification in its spatial domain. An initial coarse segmentation is obtained by summarizing all the votes in a single image. Subsequently, the final segmentation is obtained by subdividing the image in microblocks and classifying them using a N-nearest neighbor classifier which is trained using the coarse segmentation. We demonstrate the potential of the proposed method by experiments on two different datasets. One is on a challenging dataset of images collected and artificially combined and manipulated for this project. The other is a music dataset obtained by the scanning of two music books. The results are reported using precision/recall metrics of the overlapping area with respect to the ground truth. The proposed system achieves an overall averaged F-measure of 85 %. The complete source code package and associated data are available at https://github.com/fpeder/mscr under the FreeBSD license to support reproducibility. 相似文献

6.

Estimating the State of Health for Lithium-ion Batteries: A Particle Swarm Optimization-Assisted Deep Domain Adaptation Approach

Guijun Ma Zidong Wang Weibo Liu Jingzhong Fang Yong Zhang Han Ding Ye Yuan 《IEEE/CAA Journal of Automatica Sinica》2023,10(7):1530-1543

The state of health (SOH) is a critical factor in evaluating the performance of the lithium-ion batteries (LIBs). Due to various end-user behaviors, the LIBs exhibit different degradation modes, which makes it challenging to estimate the SOHs in a personalized way. In this article, we present a novel particle swarm optimization-assisted deep domain adaptation (PSO-DDA) method to estimate the SOH of LIBs in a personalized manner, where a new domain adaptation strategy is put forward to reduce cross-domain distribution discrepancy. The standard PSO algorithm is exploited to automatically adjust the chosen hyperparameters of developed DDA-based method. The proposed PSO-DDA method is validated by extensive experiments on two LIB datasets with different battery chemistry materials, ambient temperatures and charge-discharge configurations. Experimental results indicate that the proposed PSO-DDA method surpasses the convolutional neural network-based method and the standard DDA-based method. The PyTorch implementation of the proposed PSO-DDA method is available at

https://github.com/mxt0607/PSO-DDA

.

相似文献

7.

Fusion that matters: convolutional fusion networks for visual recognition

Yu Liu Yanming Guo Theodoros Georgiou Michael S. Lew 《Multimedia Tools and Applications》2018,77(22):29407-29434

In recent years, deep learning has been successfully applied to diverse multimedia research areas, with the aim of learning powerful and informative representations for a variety of visual recognition tasks. In this work, we propose convolutional fusion networks (CFN) to integrate multi-level deep features and fuse a richer visual representation. Despite recent advances in deep fusion networks, they still have limitations due to expensive parameters and weak fusion modules. Instead, CFN uses 1 × 1 convolutional layers and global average pooling to generate side branches with few parameters, and employs a locally-connected fusion module, which can learn adaptive weights for different side branches and form a better fused feature. Specifically, we introduce three key components of the proposed CFN, and discuss its differences from other deep models. Moreover, we propose fully convolutional fusion networks (FCFN) that are an extension of CFN for pixel-level classification applied to several tasks, such as semantic segmentation and edge detection. Our experiments demonstrate that CFN (and FCFN) can achieve promising performance by consistent improvements for both image-level and pixel-level classification tasks, compared to a plain CNN. We release our codes on https://github.com/yuLiu24/CFN. Also, we make a live demo (goliath.liacs.nl) using a CFN model trained on the ImageNet dataset. 相似文献

8.

PAPS: Progressive Attention-Based Pan-sharpening

Yanan Jia Qiming Hu Renwei Dian Jiayi Ma Xiaojie Guo 《IEEE/CAA Journal of Automatica Sinica》2024,11(2):391-404

Pan-sharpening aims to seek high-resolution multispectral (HRMS) images from paired multispectral images of low resolution (LRMS) and panchromatic (PAN) images, the key to which is how to maximally integrate spatial and spectral information from PAN and LRMS images. Following the principle of gradual advance, this paper designs a novel network that contains two main logical functions, i.e., detail enhancement and progressive fusion, to solve the problem. More specifically, the detail enhancement module attempts to produce enhanced MS results with the same spatial sizes as corresponding PAN images, which are of higher quality than directly up-sampling LRMS images. Having a better MS base (enhanced MS) and its PAN, we progressively extract information from the PAN and enhanced MS images, expecting to capture pivotal and complementary information of the two modalities for the purpose of constructing the desired HRMS. Extensive experiments together with ablation studies on widely-used datasets are provided to verify the efficacy of our design, and demonstrate its superiority over other state-of-the-art methods both quantitatively and qualitatively. Our code has been released at

https://github.com/JiaYN1/PAPS

.

相似文献

9.

Fragile video watermarking technique by motion field embedding with rate-distortion minimization

KUO Tien-ying LO Yi-chung 《通讯和计算机》2009,6(1):16-23

In this paper, we proposed a fragile video watermarking technique to authenticate the H.264 video content. Our watermark information is embedded in motion vectors as which have strong fragile nature. The proposed method finds out the best locations of motion vectors to embed the information to achieve the fragility, where are based on the statistical analysis of the motion vector by the H.264/AVC rate-distortion cost function. This scheme does not require the original video sequence for watermark detection. Experimental results show that the proposed watermarking technique can keep the perceptual quality at best effort and still has good fragility. 相似文献

10.

InstanceFusion: Real-time Instance-level 3D Reconstruction Using a Single RGBD Camera

Feixiang Lu Haotian Peng Hongyu Wu Jun Yang Xinhang Yang Ruizhi Cao Liangjun Zhang Ruigang Yang Bin Zhou 《Computer Graphics Forum》2020,39(7):433-445

We present InstanceFusion, a robust real-time system to detect, segment, and reconstruct instance-level 3D objects of indoor scenes with a hand-held RGBD camera. It combines the strengths of deep learning and traditional SLAM techniques to produce visually compelling 3D semantic models. The key success comes from our novel segmentation scheme and the efficient instance-level data fusion, which are both implemented on GPU. Specifically, for each incoming RGBD frame, we take the advantages of the RGBD features, the 3D point cloud, and the reconstructed model to perform instance-level segmentation. The corresponding RGBD data along with the instance ID are then fused to the surfel-based models. In order to sufficiently store and update these data, we design and implement a new data structure using the OpenGL Shading Language. Experimental results show that our method advances the state-of-the-art (SOTA) methods in instance segmentation and data fusion by a big margin. In addition, our instance segmentation improves the precision of 3D reconstruction, especially in the loop closure. InstanceFusion system runs 20.5Hz on a consumer-level GPU, which supports a number of augmented reality (AR) applications (e.g., 3D model registration, virtual interaction, AR map) and robot applications (e.g., navigation, manipulation, grasping). To facilitate future research and reproduce our system more easily, the source code, data, and the trained model are released on Github: https://github.com/Fancomi2017/InstanceFusion . 相似文献

11.

The Train Benchmark: cross-technology performance evaluation of continuous model queries

Gábor Szárnyas Benedek Izsó István Ráth Dániel Varró 《Software and Systems Modeling》2018,17(4):1365-1393

In model-driven development of safety-critical systems (like automotive, avionics or railways), well-formedness of models is repeatedly validated in order to detect design flaws as early as possible. In many industrial tools, validation rules are still often implemented by a large amount of imperative model traversal code which makes those rule implementations complicated and hard to maintain. Additionally, as models are rapidly increasing in size and complexity, efficient execution of validation rules is challenging for the currently available tools. Checking well-formedness constraints can be captured by declarative queries over graph models, while model update operations can be specified as model transformations. This paper presents a benchmark for systematically assessing the scalability of validating and revalidating well-formedness constraints over large graph models. The benchmark defines well-formedness validation scenarios in the railway domain: a metamodel, an instance model generator and a set of well-formedness constraints captured by queries, fault injection and repair operations (imitating the work of systems engineers by model transformations). The benchmark focuses on the performance of query evaluation, i.e. its execution time and memory consumption, with a particular emphasis on reevaluation. We demonstrate that the benchmark can be adopted to various technologies and query engines, including modeling tools; relational, graph and semantic databases. The Train Benchmark is available as an open-source project with continuous builds from https://github.com/FTSRG/trainbenchmark. 相似文献

12.

Point Cloud Classification Using Content-Based Transformer via Clustering in Feature Space

Yahui Liu Bin Tian Yisheng Lv Lingxi Li Fei-Yue Wang 《IEEE/CAA Journal of Automatica Sinica》2024,11(1):231-239

Recently, there have been some attempts of Transformer in 3D point cloud classification. In order to reduce computations, most existing methods focus on local spatial attention, but ignore their content and fail to establish relationships between distant but relevant points. To overcome the limitation of local spatial attention, we propose a point content-based Transformer architecture, called PointConT for short. It exploits the locality of points in the feature space (content-based), which clusters the sampled points with similar features into the same class and computes the self-attention within each class, thus enabling an effective trade-off between capturing long-range dependencies and computational complexity. We further introduce an inception feature aggregator for point cloud classification, which uses parallel structures to aggregate high-frequency and low-frequency information in each branch separately. Extensive experiments show that our PointConT model achieves a remarkable performance on point cloud shape classification. Especially, our method exhibits 90.3% Top-1 accuracy on the hardest setting of ScanObjectNN. Source code of this paper is available at https://github.com/yahuiliu99/PointConT. 相似文献

13.

An efficient spherical mapping algorithm and its application on spherical harmonics

WAN ShengHua YE TengFei LI MaoQing ZHANG HongChao LI Xin 《中国科学F辑(英文版)》2013,(9):24-33

The sphere is a natural and seamless parametric domain for closed genus-0 surfaces. We introduce an efficient hierarchical optimization approach for the computation of spherical parametrization for closed genus-0 surfaces by minimizing a nonlinear energy balancing angle and area distortions. The mapping results are bijective and lowly distorted. Our algorithm converges efficiently and is suitable to manipulate large-scale geometric models. We demonstrate and analyze the effectiveness of our mapping in spherical harmonics decomposition. 相似文献

14.

More Than Lightening: A Self-Supervised Low-Light Image Enhancement Method Capable for Multiple Degradations

Han Xu Jiayi Ma Yixuan Yuan Hao Zhang Xin Tian Xiaojie Guo 《IEEE/CAA Journal of Automatica Sinica》2024,11(3):622-637

Low-light images suffer from low quality due to poor lighting conditions, noise pollution, and improper settings of cameras. To enhance low-light images, most existing methods rely on normal-light images for guidance but the collection of suitable normal-light images is difficult. In contrast, a self-supervised method breaks free from the reliance on normal-light data,resulting in more convenience and better generalization. Existing self-supervised methods primarily focus on illumination adjustm... 相似文献

15.

MAUN: Memory-Augmented Deep Unfolding Network for Hyperspectral Image Reconstruction

Qian Hu Jiayi Ma Yuan Gao Junjun Jiang Yixuan Yuan 《IEEE/CAA Journal of Automatica Sinica》2024,11(5):1139-1150

Spectral compressive imaging has emerged as a powerful technique to collect the 3D spectral information as 2D measurements. The algorithm for restoring the original 3D hyperspectral images (HSIs) from compressive measurements is pivotal in the imaging process. Early approaches painstakingly designed networks to directly map compressive measurements to HSIs, resulting in the lack of interpretability without exploiting the imaging priors. While some recent works have introduced the deep unfolding framework for explainable reconstruction, the performance of these methods is still limited by the weak information transmission between iterative stages. In this paper, we propose a Memory-Augmented deep Unfolding Network, termed MAUN, for explainable and accurate HSI reconstruction. Specifically, MAUN implements a novel CNN scheme to facilitate a better extrapolation step of the fast iterative shrinkage-thresholding algorithm, introducing an extra momentum incorporation step for each iteration to alleviate the information loss. Moreover, to exploit the high correlation of intermediate images from neighboring iterations, we customize a cross-stage transformer (CSFormer) as the deep denoiser to simultaneously capture self-similarity from both in-stage and cross-stage features, which is the first attempt to model the long-distance dependencies between iteration stages. Extensive experiments demonstrate that the proposed MAUN is superior to other state-of-the-art methods both visually and metrically. Our code is publicly available at

https://github.com/HuQ1an/MAUN

.

相似文献

16.

A Multi-Detector Security Architecture with Local Feature-Level Fusion for Multimodal Biometrics

Sorin Soviany Sorin Puscoci Cristina Soviany 《通讯和计算机》2013,(9):1200-1218

The hierarchical identification model with multiple detectors is an innovative approach for biometric systems design which improves the identification accuracy while ensuring the computational complexity reduction. This complexity reduction provides additional advantages in terms of execution time and recognition accuracy. The model is different from the actual solutions for biometric data classification because it essentially uses a special kind of classifiers （detectors） and the identification decision is issued in a hierarchical way according to the users importance; this makes it suitable for various security requirements applications （users with different authorization levels）. The model includes a local feature-level fusion for each of the integrated biometrics. The paper defines and explains the multi-detector security architecture with its basic functions. The achieved experimental results are discussed to reveal the proposed method advantages and further potential enhancements for particular use cases. 相似文献

17.

基于隐条件随机场的自适应视频分割算法 总被引：3，自引：0，他引：3

褚一平张引叶修梓张三元《自动化学报》2007,33(12):1252-1258

视频目标分割是视频监视与视频目标跟踪、视频目标识别以及视频编辑的基础. 本文提出了一种基于隐条件随机场 (Hidden conditional random fields, HCRF) 的自适应视频分割算法, 利用 HCRF 模型对视频序列中的时空邻域关系建模. 使用在线学习的方式对相应的参数进行调整, 实现对时空邻域约束关系的权重调整, 提高视频目标分割细节上的效果. 大量的数据测试表明, 与高斯混合模型 (Gaussian mixture model, GMM) 和联合时空的马尔可夫随机场 (Markov random fields, MRF) 等算法相比, 该算法的分割错误率分别降低了23\%和19\%. 相似文献

18.

A comprehensive review of significant researches on content based indexing and retrieval of visual information

R. PRIYA T. N. SHANMUGAM 《Frontiers of Computer Science in China》2013,(5):782-799

相似文献

19.

Fine-Grained Multi-human Parsing

Zhao Jian Li Jianshu Liu Hengzhu Yan Shuicheng Feng Jiashi 《International Journal of Computer Vision》2020,128(8-9):2185-2203

Despite the noticeable progress in perceptual tasks like detection, instance segmentation and human parsing, computers still perform unsatisfactorily on visually understanding humans in crowded scenes, such as group behavior analysis, person re-identification, e-commerce, media editing, video surveillance, autonomous driving and virtual reality, etc. To perform well, models need to comprehensively perceive the semantic information and the differences between instances in a multi-human image, which is recently defined as the multi-human parsing task. In this paper, we first present a new large-scale database “Multi-human Parsing (MHP v2.0)” for algorithm development and evaluation to advance the research on understanding humans in crowded scenes. MHP v2.0 contains 25,403 elaborately annotated images with 58 fine-grained semantic category labels and 16 dense pose key point labels, involving 2–26 persons per image captured in real-world scenes from various viewpoints, poses, occlusion, interactions and background. We further propose a novel deep Nested Adversarial Network (NAN) model for multi-human parsing. NAN consists of three Generative Adversarial Network-like sub-nets, respectively performing semantic saliency prediction, instance-agnostic parsing and instance-aware clustering. These sub-nets form a nested structure and are carefully designed to learn jointly in an end-to-end way. NAN consistently outperforms existing state-of-the-art solutions on our MHP and several other datasets, including MHP v1.0, PASCAL-Person-Part and Buffy. NAN serves as a strong baseline to shed light on generic instance-level semantic part prediction and drive the future research on multi-human parsing. With the above innovations and contributions, we have organized the CVPR 2018 Workshop on Visual Understanding of Humans in Crowd Scene (VUHCS 2018) and the Fine-Grained Multi-human Parsing and Pose Estimation Challenge. These contributions together significantly benefit the community. Code and pre-trained models are available at https://github.com/ZhaoJ9014/Multi-Human-Parsing_MHP.

相似文献

20.

Improved pedestrian detection using motion segmentation and silhouette orientation

Suman Kumar Choudhury Pankaj Kumar Sa Ram Prasad Padhy Saurav Sharma Sambit Bakshi 《Multimedia Tools and Applications》2018,77(11):13075-13114

相似文献