首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Based on the theory of modal acoustic emission (AE), when the convolutional neural network (CNN) is used to identify rotor rub-impact faults, the training data has a small sample size, and the AE sound segment belongs to a single channel signal with less pixel-level information and strong local correlation. Due to the convolutional pooling operations of CNN, coarse-grained and edge information are lost, and the top-level information dimension in CNN network is low, which can easily lead to overfitting. To solve the above problems, we first propose the use of sound spectrograms and their differential features to construct multi-channel image input features suitable for CNN and fully exploit the intrinsic characteristics of the sound spectra. Then, the traditional CNN network structure is improved, and the outputs of all convolutional layers are connected as one layer constitutes a fused feature that contains information at each layer, and is input into the network’s fully connected layer for classification and identification. Experiments indicate that the improved CNN recognition algorithm has significantly improved recognition rate compared with CNN and dynamical neural network (DNN) algorithms.  相似文献   

2.
Nowadays, the amount of wed data is increasing at a rapid speed, which presents a serious challenge to the web monitoring. Text sentiment analysis, an important research topic in the area of natural language processing, is a crucial task in the web monitoring area. The accuracy of traditional text sentiment analysis methods might be degraded in dealing with mass data. Deep learning is a hot research topic of the artificial intelligence in the recent years. By now, several research groups have studied the sentiment analysis of English texts using deep learning methods. In contrary, relatively few works have so far considered the Chinese text sentiment analysis toward this direction. In this paper, a method for analyzing the Chinese text sentiment is proposed based on the convolutional neural network (CNN) in deep learning in order to improve the analysis accuracy. The feature values of the CNN after the training process are nonuniformly distributed. In order to overcome this problem, a method for normalizing the feature values is proposed. Moreover, the dimensions of the text features are optimized through simulations. Finally, a method for updating the learning rate in the training process of the CNN is presented in order to achieve better performances. Experiment results on the typical datasets indicate that the accuracy of the proposed method can be improved compared with that of the traditional supervised machine learning methods, e.g., the support vector machine method.  相似文献   

3.
The two-stream convolutional neural network exhibits excellent performance in the video action recognition. The crux of the matter is to use the frames already clipped by the videos and the optical flow images pre-extracted by the frames, to train a model each, and to finally integrate the outputs of the two models. Nevertheless, the reliance on the pre-extraction of the optical flow impedes the efficiency of action recognition, and the temporal and the spatial streams are just simply fused at the ends, with one stream failing and the other stream succeeding. We propose a novel hidden twostream collaborative (HTSC) learning network that masks the steps of extracting the optical flow in the network and greatly speeds up the action recognition. Based on the two-stream method, the two-stream collaborative learning model captures the interaction of the temporal and spatial features to greatly enhance the accuracy of recognition. Our proposed method is highly capable of achieving the balance of efficiency and precision on large-scale video action recognition datasets.  相似文献   

4.
为了克服人脸识别中存在的遮挡等闭塞问题,本文提出了Gabor特征结合Metaface学习的扩展稀疏表示人脸识别算法(GMFL)。考虑到Gabor局部特征对光照、表情和姿态等变化的鲁棒性,该算法首先提取图像的Gabor特征集;然后对Gabor特征集进行Metaface字典学习得到具有更强稀疏表示能力的新字典,同时引入Gabor闭塞字典来编码表示图像中的闭塞部分,并与新字典联合构造一组过完备字典基;最后利用过完备字典基求解稀疏系数重构样本,根据样本与重构样本之间的残差最小原则对人脸图像进行分类识别。在AR人脸库和FERET数据库上的实验结果验证了本文算法的可行性和有效性。  相似文献   

5.
Human Action Recognition (HAR) is a current research topic in the field of computer vision that is based on an important application known as video surveillance. Researchers in computer vision have introduced various intelligent methods based on deep learning and machine learning, but they still face many challenges such as similarity in various actions and redundant features. We proposed a framework for accurate human action recognition (HAR) based on deep learning and an improved features optimization algorithm in this paper. From deep learning feature extraction to feature classification, the proposed framework includes several critical steps. Before training fine-tuned deep learning models – MobileNet-V2 and Darknet53 – the original video frames are normalized. For feature extraction, pre-trained deep models are used, which are fused using the canonical correlation approach. Following that, an improved particle swarm optimization (IPSO)-based algorithm is used to select the best features. Following that, the selected features were used to classify actions using various classifiers. The experimental process was performed on six publicly available datasets such as KTH, UT-Interaction, UCF Sports, Hollywood, IXMAS, and UCF YouTube, which attained an accuracy of 98.3%, 98.9%, 99.8%, 99.6%, 98.6%, and 100%, respectively. In comparison with existing techniques, it is observed that the proposed framework achieved improved accuracy.  相似文献   

6.
本文提出了一种应用于SAR图像目标识别的动态字典学习算法,该算法通过在字典学习过程中自动删除和增加字典条目来调整字典表示性能与尺寸.删除操作是在删除代价的约束下针对相关度高或利用率低的字典条目进行,而增加操作是在增加代价的约束下针对信号表示的残留误差的主分量进行,通过交替执行删除和增加操作来不断优化字典,使其表示能力达到最佳.在MSTAR数据集上的实验验证了算法性能,并给出了相应的参数调整建议.从实验结果和分析可看出,该算法具有识别率高、算法稳定等特点.  相似文献   

7.
Human gait recognition (HGR) has received a lot of attention in the last decade as an alternative biometric technique. The main challenges in gait recognition are the change in in-person view angle and covariant factors. The major covariant factors are walking while carrying a bag and walking while wearing a coat. Deep learning is a new machine learning technique that is gaining popularity. Many techniques for HGR based on deep learning are presented in the literature. The requirement of an efficient framework is always required for correct and quick gait recognition. We proposed a fully automated deep learning and improved ant colony optimization (IACO) framework for HGR using video sequences in this work. The proposed framework consists of four primary steps. In the first step, the database is normalized in a video frame. In the second step, two pre-trained models named ResNet101 and InceptionV3 are selected and modified according to the dataset's nature. After that, we trained both modified models using transfer learning and extracted the features. The IACO algorithm is used to improve the extracted features. IACO is used to select the best features, which are then passed to the Cubic SVM for final classification. The cubic SVM employs a multiclass method. The experiment was carried out on three angles (0, 18, and 180) of the CASIA B dataset, and the accuracy was 95.2, 93.9, and 98.2 percent, respectively. A comparison with existing techniques is also performed, and the proposed method outperforms in terms of accuracy and computational time.  相似文献   

8.
9.
Background—Human Gait Recognition (HGR) is an approach based on biometric and is being widely used for surveillance. HGR is adopted by researchers for the past several decades. Several factors are there that affect the system performance such as the walking variation due to clothes, a person carrying some luggage, variations in the view angle. Proposed—In this work, a new method is introduced to overcome different problems of HGR. A hybrid method is proposed or efficient HGR using deep learning and selection of best features. Four major steps are involved in this work-preprocessing of the video frames, manipulation of the pre-trained CNN model VGG-16 for the computation of the features, removing redundant features extracted from the CNN model, and classification. In the reduction of irrelevant features Principal Score and Kurtosis based approach is proposed named PSbK. After that, the features of PSbK are fused in one materix. Finally, this fused vector is fed to the One against All Multi Support Vector Machine (OAMSVM) classifier for the final results. Results—The system is evaluated by utilizing the CASIA B database and six angles 00°, 18°, 36°, 54°, 72°, and 90° are used and attained the accuracy of 95.80%, 96.0%, 95.90%, 96.20%, 95.60%, and 95.50%, respectively. Conclusion—The comparison with recent methods show the proposed method work better.  相似文献   

10.
Medical image fusion is considered the best method for obtaining one image with rich details for efficient medical diagnosis and therapy. Deep learning provides a high performance for several medical image analysis applications. This paper proposes a deep learning model for the medical image fusion process. This model depends on Convolutional Neural Network (CNN). The basic idea of the proposed model is to extract features from both CT and MR images. Then, an additional process is executed on the extracted features. After that, the fused feature map is reconstructed to obtain the resulting fused image. Finally, the quality of the resulting fused image is enhanced by various enhancement techniques such as Histogram Matching (HM), Histogram Equalization (HE), fuzzy technique, fuzzy type Π, and Contrast Limited Histogram Equalization (CLAHE). The performance of the proposed fusion-based CNN model is measured by various metrics of the fusion and enhancement quality. Different realistic datasets of different modalities and diseases are tested and implemented. Also, real datasets are tested in the simulation analysis.  相似文献   

11.
Artificial intelligence aids for healthcare have received a great deal of attention. Approximately one million patients with gastrointestinal diseases have been diagnosed via wireless capsule endoscopy (WCE). Early diagnosis facilitates appropriate treatment and saves lives. Deep learning-based techniques have been used to identify gastrointestinal ulcers, bleeding sites, and polyps. However, small lesions may be misclassified. We developed a deep learning-based best-feature method to classify various stomach diseases evident in WCE images. Initially, we use hybrid contrast enhancement to distinguish diseased from normal regions. Then, a pretrained model is fine-tuned, and further training is done via transfer learning. Deep features are extracted from the last two layers and fused using a vector length-based approach. We improve the genetic algorithm using a fitness function and kurtosis to select optimal features that are graded by a classifier. We evaluate a database containing 24,000 WCE images of ulcers, bleeding sites, polyps, and healthy tissue. The cubic support vector machine classifier was optimal; the average accuracy was 99%.  相似文献   

12.
刘照邦  袁明辉 《包装工程》2020,41(1):149-155
目的为快速统计货架商品信息,提出一种基于深度神经网络的货架商品自动识别方法。方法摄像头采集的货架商品图像经过深度神经网络算法处理,得到了图像中商品的SKU和位置。针对货架商品识别这种密集检测场景,文中方法改进了通用深度神经网络目标检测算法,将算法分为检测和分类2个阶段且重新设计了部分网络结构。最后,将文中方法和传统货架商品识别方法以及通用深度神经网络目标检测方法进行了比较。结果实验证明该方法的检测阶段的模型平均正确率达到96.5%,分类阶段的分类准确率达到99.9%。整图测试的查准率为97.56%,查全率为99.26%。结论相较于以往使用传统的目标检测模型进行货架商品识别以及使用SIFT等人工算子提取特征并分类识别商品具体SKU,文中方法的商品检出率和分类准确率都有了大幅度的提升,具有很好的应用潜力。  相似文献   

13.
淡卫波  朱勇建  黄毅 《包装工程》2023,44(1):133-140
目的 提取烟包图像数据训练深度学习目标检测模型,提升烟包流水线拣包效率和准确性。方法 基于深度学习建立一种烟包识别分类模型,对原始YOLOv3模型进行改进,在原网络中加入设计的多空间金字塔池化结构(M–SPP),将64×64尺度的特征图下采样与32×32尺度的特征图进行拼接,并去除16×16尺度的预测特征层,提高模型的检测准确率和速度,并采用K–means++算法对先验框参数进行优化。结果 实验表明该目标检测模型平均准确率达到99.68%,检测速度达到70.82帧/s。结论 基于深度学习建立的图像识别分类模型准确率高且检测速度快,有效满足烟包流水线自动化实时检测。  相似文献   

14.
Compressive strength of concrete is a significant factor to assess building structure health and safety. Therefore, various methods have been developed to evaluate the compressive strength of concrete structures. However, previous methods have several challenges in costly, time-consuming, and unsafety. To address these drawbacks, this paper proposed a digital vision based concrete compressive strength evaluating model using deep convolutional neural network (DCNN). The proposed model presented an alternative approach to evaluating the concrete strength and contributed to improving efficiency and accuracy. The model was developed with 4,000 digital images and 61,996 images extracted from video recordings collected from concrete samples. The experimental results indicated a root mean square error (RMSE) value of 3.56 (MPa), demonstrating a strong feasibility that the proposed model can be utilized to predict the concrete strength with digital images of their surfaces and advantages to overcome the previous limitations. This experiment contributed to provide the basis that could be extended to future research with image analysis technique and artificial neural network in the diagnosis of concrete building structures.  相似文献   

15.
Image retrieval for food ingredients is important work, tremendously tiring, uninteresting, and expensive. Computer vision systems have extraordinary advancements in image retrieval with CNNs skills. But it is not feasible for small-size food datasets using convolutional neural networks directly. In this study, a novel image retrieval approach is presented for small and medium-scale food datasets, which both augments images utilizing image transformation techniques to enlarge the size of datasets, and promotes the average accuracy of food recognition with state-of-the-art deep learning technologies. First, typical image transformation techniques are used to augment food images. Then transfer learning technology based on deep learning is applied to extract image features. Finally, a food recognition algorithm is leveraged on extracted deep-feature vectors. The presented image-retrieval architecture is analyzed based on a small-scale food dataset which is composed of forty-one categories of food ingredients and one hundred pictures for each category. Extensive experimental results demonstrate the advantages of image-augmentation architecture for small and medium datasets using deep learning. The novel approach combines image augmentation, ResNet feature vectors, and SMO classification, and shows its superiority for food detection of small/medium-scale datasets with comprehensive experiments.  相似文献   

16.
Handwritten character recognition systems are used in every field of life nowadays, including shopping malls, banks, educational institutes, etc. Urdu is the national language of Pakistan, and it is the fourth spoken language in the world. However, it is still challenging to recognize Urdu handwritten characters owing to their cursive nature. Our paper presents a Convolutional Neural Networks (CNN) model to recognize Urdu handwritten alphabet recognition (UHAR) offline and online characters. Our research contributes an Urdu handwritten dataset (aka UHDS) to empower future works in this field. For offline systems, optical readers are used for extracting the alphabets, while diagonal-based extraction methods are implemented in online systems. Moreover, our research tackled the issue concerning the lack of comprehensive and standard Urdu alphabet datasets to empower research activities in the area of Urdu text recognition. To this end, we collected 1000 handwritten samples for each alphabet and a total of 38000 samples from 12 to 25 age groups to train our CNN model using online and offline mediums. Subsequently, we carried out detailed experiments for character recognition, as detailed in the results. The proposed CNN model outperformed as compared to previously published approaches.  相似文献   

17.
Violence recognition is crucial because of its applications in activities related to security and law enforcement. Existing semi-automated systems have issues such as tedious manual surveillances, which causes human errors and makes these systems less effective. Several approaches have been proposed using trajectory-based, non-object-centric, and deep-learning-based methods. Previous studies have shown that deep learning techniques attain higher accuracy and lower error rates than those of other methods. However, the their performance must be improved. This study explores the state-of-the-art deep learning architecture of convolutional neural networks (CNNs) and inception V4 to detect and recognize violence using video data. In the proposed framework, the keyframe extraction technique eliminates duplicate consecutive frames. This keyframing phase reduces the training data size and hence decreases the computational cost by avoiding duplicate frames. For feature selection and classification tasks, the applied sequential CNN uses one kernel size, whereas the inception v4 CNN uses multiple kernels for different layers of the architecture. For empirical analysis, four widely used standard datasets are used with diverse activities. The results confirm that the proposed approach attains 98% accuracy, reduces the computational cost, and outperforms the existing techniques of violence detection and recognition.  相似文献   

18.
Scene text recognition is one of the most important techniques in pattern recognition and machine intelligence due to its numerous practical applications. Scene text recognition is also a sequence model task. Recurrent neural network (RNN) is commonly regarded as the default starting point for sequential models. Due to the non-parallel prediction and the gradient disappearance problem, the performance of the RNN is difficult to improve substantially. In this paper, a new TRDD network architecture which base on dilated convolution and residual block is proposed, using Convolutional Neural Networks (CNN) instead of RNN realizes the recognition task of sequence texts. Our model has the following three advantages in comparison to existing scene text recognition methods: First, the text recognition speed of the TRDD network is much fast than the state-of-the-art scene text recognition network based recurrent neural networks (RNN). Second, TRDD is easier to train, avoiding the problem of exploding and vanishing, which is major issue for RNN. Third, both using larger dilated factors and increasing the filter size are all viable ways to change receptive field size. We benchmark the TRDD on four standard datasets, it has higher recognition accuracy and faster recognition speed based on the smaller model. It is hopefully used in the real-time application.  相似文献   

19.
With the development of deep learning and Convolutional Neural Networks (CNNs), the accuracy of automatic food recognition based on visual data have significantly improved. Some research studies have shown that the deeper the model is, the higher the accuracy is. However, very deep neural networks would be affected by the overfitting problem and also consume huge computing resources. In this paper, a new classification scheme is proposed for automatic food-ingredient recognition based on deep learning. We construct an up-to-date combinational convolutional neural network (CBNet) with a subnet merging technique. Firstly, two different neural networks are utilized for learning interested features. Then, a well-designed feature fusion component aggregates the features from subnetworks, further extracting richer and more precise features for image classification. In order to learn more complementary features, the corresponding fusion strategies are also proposed, including auxiliary classifiers and hyperparameters setting. Finally, CBNet based on the well-known VGGNet, ResNet and DenseNet is evaluated on a dataset including 41 major categories of food ingredients and 100 images for each category. Theoretical analysis and experimental results demonstrate that CBNet achieves promising accuracy for multi-class classification and improves the performance of convolutional neural networks.  相似文献   

20.

Lip reading is typically regarded as visually interpreting the speaker’s lip movements during the speaking. This is a task of decoding the text from the speaker’s mouth movement. This paper proposes a lip-reading model that helps deaf people and persons with hearing problems to understand a speaker by capturing a video of the speaker and inputting it into the proposed model to obtain the corresponding subtitles. Using deep learning technologies makes it easier for users to extract a large number of different features, which can then be converted to probabilities of letters to obtain accurate results. Recently proposed methods for lip reading are based on sequence-to-sequence architectures that are designed for natural machine translation and audio speech recognition. However, in this paper, a deep convolutional neural network model called the hybrid lip-reading (HLR-Net) model is developed for lip reading from a video. The proposed model includes three stages, namely, pre-processing, encoder, and decoder stages, which produce the output subtitle. The inception, gradient, and bidirectional GRU layers are used to build the encoder, and the attention, fully-connected, activation function layers are used to build the decoder, which performs the connectionist temporal classification (CTC). In comparison with the three recent models, namely, the LipNet model, the lip-reading model with cascaded attention (LCANet), and attention-CTC (A-ACA) model, on the GRID corpus dataset, the proposed HLR-Net model can achieve significant improvements, achieving the CER of 4.9%, WER of 9.7%, and Bleu score of 92% in the case of unseen speakers, and the CER of 1.4%, WER of 3.3%, and Bleu score of 99% in the case of overlapped speakers.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号