首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Human Activity Recognition (HAR) is an active research area due to its applications in pervasive computing, human-computer interaction, artificial intelligence, health care, and social sciences. Moreover, dynamic environments and anthropometric differences between individuals make it harder to recognize actions. This study focused on human activity in video sequences acquired with an RGB camera because of its vast range of real-world applications. It uses two-stream ConvNet to extract spatial and temporal information and proposes a fine-tuned deep neural network. Moreover, the transfer learning paradigm is adopted to extract varied and fixed frames while reusing object identification information. Six state-of-the-art pre-trained models are exploited to find the best model for spatial feature extraction. For temporal sequence, this study uses dense optical flow following the two-stream ConvNet and Bidirectional Long Short Term Memory (BiLSTM) to capture long-term dependencies. Two state-of-the-art datasets, UCF101 and HMDB51, are used for evaluation purposes. In addition, seven state-of-the-art optimizers are used to fine-tune the proposed network parameters. Furthermore, this study utilizes an ensemble mechanism to aggregate spatial-temporal features using a four-stream Convolutional Neural Network (CNN), where two streams use RGB data. In contrast, the other uses optical flow images. Finally, the proposed ensemble approach using max hard voting outperforms state-of-the-art methods with 96.30% and 90.07% accuracies on the UCF101 and HMDB51 datasets.  相似文献   

2.
Human motion recognition plays a crucial role in the video analysis framework. However, a given video may contain a variety of noises, such as an unstable background and redundant actions, that are completely different from the key actions. These noises pose a great challenge to human motion recognition. To solve this problem, we propose a new method based on the 3-Dimensional (3D) Bag of Visual Words (BoVW) framework. Our method includes two parts: The first part is the video action feature extractor, which can identify key actions by analyzing action features. In the video action encoder, by analyzing the action characteristics of a given video, we use the deep 3D CNN pre-trained model to obtain expressive coding information. A classifier with subnetwork nodes is used for the final classification. The extensive experiments demonstrate that our method leads to an impressive effect on complex video analysis. Our approach achieves state-of-the-art performance on the datasets of UCF101 (85.3%) and HMDB51 (54.5%).  相似文献   

3.
In the last decade, there has been a significant increase in medical cases involving brain tumors. Brain tumor is the tenth most common type of tumor, affecting millions of people. However, if it is detected early, the cure rate can increase. Computer vision researchers are working to develop sophisticated techniques for detecting and classifying brain tumors. MRI scans are primarily used for tumor analysis. We proposed an automated system for brain tumor detection and classification using a saliency map and deep learning feature optimization in this paper. The proposed framework was implemented in stages. In the initial phase of the proposed framework, a fusion-based contrast enhancement technique is proposed. In the following phase, a tumor segmentation technique based on saliency maps is proposed, which is then mapped on original images based on active contour. Following that, a pre-trained CNN model named EfficientNetB0 is fine-tuned and trained in two ways: on enhanced images and on tumor localization images. Deep transfer learning is used to train both models, and features are extracted from the average pooling layer. The deep learning features are then fused using an improved fusion approach known as Entropy Serial Fusion. The best features are chosen in the final step using an improved dragonfly optimization algorithm. Finally, the best features are classified using an extreme learning machine (ELM). The experimental process is conducted on three publically available datasets and achieved an improved accuracy of 95.14, 94.89, and 95.94%, respectively. The comparison with several neural nets shows the improvement of proposed framework.  相似文献   

4.
Human gait recognition (HGR) has received a lot of attention in the last decade as an alternative biometric technique. The main challenges in gait recognition are the change in in-person view angle and covariant factors. The major covariant factors are walking while carrying a bag and walking while wearing a coat. Deep learning is a new machine learning technique that is gaining popularity. Many techniques for HGR based on deep learning are presented in the literature. The requirement of an efficient framework is always required for correct and quick gait recognition. We proposed a fully automated deep learning and improved ant colony optimization (IACO) framework for HGR using video sequences in this work. The proposed framework consists of four primary steps. In the first step, the database is normalized in a video frame. In the second step, two pre-trained models named ResNet101 and InceptionV3 are selected and modified according to the dataset's nature. After that, we trained both modified models using transfer learning and extracted the features. The IACO algorithm is used to improve the extracted features. IACO is used to select the best features, which are then passed to the Cubic SVM for final classification. The cubic SVM employs a multiclass method. The experiment was carried out on three angles (0, 18, and 180) of the CASIA B dataset, and the accuracy was 95.2, 93.9, and 98.2 percent, respectively. A comparison with existing techniques is also performed, and the proposed method outperforms in terms of accuracy and computational time.  相似文献   

5.
In the area of medical image processing, stomach cancer is one of the most important cancers which need to be diagnose at the early stage. In this paper, an optimized deep learning method is presented for multiple stomach disease classification. The proposed method work in few important steps—preprocessing using the fusion of filtering images along with Ant Colony Optimization (ACO), deep transfer learning-based features extraction, optimization of deep extracted features using nature-inspired algorithms, and finally fusion of optimal vectors and classification using Multi-Layered Perceptron Neural Network (MLNN). In the feature extraction step, pre-trained Inception V3 is utilized and retrained on selected stomach infection classes using the deep transfer learning step. Later on, the activation function is applied to Global Average Pool (GAP) for feature extraction. However, the extracted features are optimized through two different nature-inspired algorithms—Particle Swarm Optimization (PSO) with dynamic fitness function and Crow Search Algorithm (CSA). Hence, both methods’ output is fused by a maximal value approach and classified the fused feature vector by MLNN. Two datasets are used to evaluate the proposed method—CUI WahStomach Diseases and Combined dataset and achieved an average accuracy of 99.5%. The comparison with existing techniques, it is shown that the proposed method shows significant performance.  相似文献   

6.
Identifying fruit disease manually is time-consuming, expert-required, and expensive; thus, a computer-based automated system is widely required. Fruit diseases affect not only the quality but also the quantity. As a result, it is possible to detect the disease early on and cure the fruits using computer-based techniques. However, computer-based methods face several challenges, including low contrast, a lack of dataset for training a model, and inappropriate feature extraction for final classification. In this paper, we proposed an automated framework for detecting apple fruit leaf diseases using CNN and a hybrid optimization algorithm. Data augmentation is performed initially to balance the selected apple dataset. After that, two pre-trained deep models are fine-tuning and trained using transfer learning. Then, a fusion technique is proposed named Parallel Correlation Threshold (PCT). The fused feature vector is optimized in the next step using a hybrid optimization algorithm. The selected features are finally classified using machine learning algorithms. Four different experiments have been carried out on the augmented Plant Village dataset and yielded the best accuracy of 99.8%. The accuracy of the proposed framework is also compared to that of several neural nets, and it outperforms them all.  相似文献   

7.
8.
Activity recognition is a challenging task in computer vision that finds widespread applications in various fields, such as motion capture, video retrieval, security, and video surveillance. The objective of this work is to present a technique for recognizing human activities in videos using Dragon Deep Belief Network (DDBN) and hybrid features, which comprises of features like shape, coverage factor, and Space-Time Interest (STI) points. Initially, the keyframes from the input video sequence are extracted using Structural Similarity (SSIM) measure. Then, the features, such as shape, coverage factor, and STI points, are extracted from the keyframes. Based on the feature vector extracted, the proposed DDBN classifier, which is designed by the effective combination of DBN and Dragonfly Algorithm (DA), a classification on human activities, such as walk, bend, etc. in videos. In DDBN, the weights in the network are selected optimally using DA. The weight update using the DA for each incoming feature improves the performance of the DDBN classifier. Further it improves the accuracy in classification of actions. The proposed DDBN classifier is experimented using KTH and Weizmann datasets based on three evaluation parameters, such as accuracy, sensitivity, and specificity. From the performance evaluation, the proposed DDBN classifier could attain better performance with the probability of 98.5% accuracy, 0.96 sensitivity, and 0.959 specificity, respectively.  相似文献   

9.
Classification of human actions under video surveillance is gaining a lot of attention from computer vision researchers. In this paper, we have presented methodology to recognize human behavior in thin crowd which may be very helpful in surveillance. Research have mostly focused the problem of human detection in thin crowd, overall behavior of the crowd and actions of individuals in video sequences. Vision based Human behavior modeling is a complex task as it involves human detection, tracking, classifying normal and abnormal behavior. The proposed methodology takes input video and applies Gaussian based segmentation technique followed by post processing through presenting hole filling algorithm i.e., fill hole inside objects algorithm. Human detection is performed by presenting human detection algorithm and then geometrical features from human skeleton are extracted using feature extraction algorithm. The classification task is achieved using binary and multi class support vector machines. The proposed technique is validated through accuracy, precision, recall and F-measure metrics.  相似文献   

10.
罗春梅  张风雷 《声学技术》2021,40(4):503-507
为提高神经网络在说话人识别应用中的识别性能,提出基于高斯增值矩阵特征和改进深度卷积神经网络的说话人识别算法.算法首先通过最大后验概率提取基于梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)特征的高斯均值矩阵,并对特征进行噪声适应性补偿,以增强信号的帧间关联和说话人特征信...  相似文献   

11.
针对3D-CNN能够较好地提取视频中时空特征但对计算量和内存要求很高的问题,本文设计了高效3D卷积块替换原来计算量大的3×3×3卷积层,进而提出了一种融合3D卷积块的密集残差网络(3D-EDRNs)用于人体行为识别。高效3D卷积块由获取视频空间特征的1×3×3卷积层和获取视频时间特征的3×1×1卷积层组合而成。将高效3D卷积块组合在密集残差网络的多个位置中,不但利用了残差块易于优化和密集连接网络特征复用等优点,而且能够缩短训练时间,提高网络的时空特征提取效率和性能。在经典数据集UCF101、HMDB51和动态多视角复杂3D人体行为数据库(DMV action3D)上验证了结合3D卷积块的3D-EDRNs能够显著降低模型复杂度,有效提高网络的分类性能,同时具有计算资源需求少、参数量小和训练时间短等优点。  相似文献   

12.
Background—Human Gait Recognition (HGR) is an approach based on biometric and is being widely used for surveillance. HGR is adopted by researchers for the past several decades. Several factors are there that affect the system performance such as the walking variation due to clothes, a person carrying some luggage, variations in the view angle. Proposed—In this work, a new method is introduced to overcome different problems of HGR. A hybrid method is proposed or efficient HGR using deep learning and selection of best features. Four major steps are involved in this work-preprocessing of the video frames, manipulation of the pre-trained CNN model VGG-16 for the computation of the features, removing redundant features extracted from the CNN model, and classification. In the reduction of irrelevant features Principal Score and Kurtosis based approach is proposed named PSbK. After that, the features of PSbK are fused in one materix. Finally, this fused vector is fed to the One against All Multi Support Vector Machine (OAMSVM) classifier for the final results. Results—The system is evaluated by utilizing the CASIA B database and six angles 00°, 18°, 36°, 54°, 72°, and 90° are used and attained the accuracy of 95.80%, 96.0%, 95.90%, 96.20%, 95.60%, and 95.50%, respectively. Conclusion—The comparison with recent methods show the proposed method work better.  相似文献   

13.
李建明  杨挺  王惠栋 《包装工程》2020,41(7):175-184
目的针对目前工业自动化生产中基于人工特征提取的包装缺陷检测方法复杂、专业知识要求高、通用性差、在多目标和复杂背景下难以应用等问题,研究基于深度学习的实时包装缺陷检测方法。方法在样本数据较少的情况下,提出一种基于深度学习的Inception-V3图像分类算法和YOLO-V3目标检测算法相结合的缺陷检测方法,并设计完整的基于计算机视觉的在线包装缺陷检测系统。结果实验结果显示,该方法的识别准确率为99.49%,方差为0.0000506,只使用Inception-V3算法的准确率为97.70%,方差为0.000251。结论相比一般基于人工特征提取的包装缺陷检测方法,避免了复杂的特征提取过程。相比只应用图像分类算法进行包装缺陷检测,该方法在包装缺陷区域占比较小的情况下能较明显地提高包装缺陷检测精度和稳定性,在复杂检测背景和多目标场景中体现优势。该缺陷检测系统和检测方法可以很容易地迁移到其他类似在线检测问题上。  相似文献   

14.
Violence recognition is crucial because of its applications in activities related to security and law enforcement. Existing semi-automated systems have issues such as tedious manual surveillances, which causes human errors and makes these systems less effective. Several approaches have been proposed using trajectory-based, non-object-centric, and deep-learning-based methods. Previous studies have shown that deep learning techniques attain higher accuracy and lower error rates than those of other methods. However, the their performance must be improved. This study explores the state-of-the-art deep learning architecture of convolutional neural networks (CNNs) and inception V4 to detect and recognize violence using video data. In the proposed framework, the keyframe extraction technique eliminates duplicate consecutive frames. This keyframing phase reduces the training data size and hence decreases the computational cost by avoiding duplicate frames. For feature selection and classification tasks, the applied sequential CNN uses one kernel size, whereas the inception v4 CNN uses multiple kernels for different layers of the architecture. For empirical analysis, four widely used standard datasets are used with diverse activities. The results confirm that the proposed approach attains 98% accuracy, reduces the computational cost, and outperforms the existing techniques of violence detection and recognition.  相似文献   

15.
This proposal aims to enhance the accuracy of a dermoscopic skin cancer diagnosis with the aid of novel deep learning architecture. The proposed skin cancer detection model involves four main steps: (a) preprocessing, (b) segmentation, (c) feature extraction, and (d) classification. The dermoscopic images initially subjected to a preprocessing step that includes image enhancement and hair removal. After preprocessing, the segmentation of lesion is deployed by an optimized region growing algorithm. In the feature extraction phase, local features, color morphology features, and morphological transformation-based features are extracted. Moreover, the classification phase uses a modified deep learning algorithm by merging the optimization concept into recurrent neural network (RNN). As the main contribution, the region growing and RNN improved by the modified deer hunting optimization algorithm (DHOA) termed as Fitness Adaptive DHOA (FA-DHOA). Finally, the analysis has been performed to verify the effectiveness of the proposed method.  相似文献   

16.
Visual tracking is a challenging issue in the field of computer vision due to the objects’ intricate appearance variation. To adapt the change of the appearance, multiple channel features which could provide more information are used. However, the low level feature could not represent the structure of the object. In this paper, a superpixel-based adaptive tracking algorithm by using color histogram and haar-like feature is proposed, whose feature is classified into the middle level. Based on the superpixel representation of video frames, the haar-like feature is extracted at the superpixel level as the local feature, and the color histogram feature is applied with the combination of background subtraction method as the frame feature. Then, local features are clustered and weighted according to the target label and the location center. Superpixel-based appearance model is measured by using the sum of the voting map, and the candidate with the highest score is selected as the tracking result. Finally, an efficient template updating scheme is introduced to obtain the robust results and improve the computational efficiency. The proposed algorithm is evaluated on eight challenging video sequences and experimental results demonstrate that the proposed method can get better performance on occlusion, illumination variation and transformation.  相似文献   

17.
李涛  曹辉  郭乐乐 《声学技术》2018,37(4):367-371
为了提升连续语音识别系统性能,将深度自编码器神经网络应用于语音信号特征提取。通过堆叠稀疏自编码器组成深度自编码器(Deep Auto-Encoding,DAE),经过预训练和微调两个步骤提取语音信号的本质特征,使用与上下文相关的三音素模型,以音素错误率大小为系统性能的评判标准。仿真结果表明相对于传统梅尔频率倒谱系数(Mel-Frequency Cepstral Coefficient,MFCC)特征以及优化后的MFCC特征,基于深度自编码器提取的深度特征更具优越性。  相似文献   

18.
Magnetic resonance imaging (MRI) of brain needs an impeccable analysis to investigate all its structure and pattern. This analysis may be a sharp visual analysis by an experienced medical professional or by a computer aided diagnosis system that can help to predict, what may be the recent condition. Similarly, on the basis of various information and technique, a system can be designed to detect whether a patient is prone to Alzheimer's disease or not. And this task of detection of abnormalities at an initial stage from brain MRI is a major challenge in the field of neurosciences. The main idea behind our research is to utilize the deep layers feature extraction benefited from deep neural network architecture, without extensive hardware resource training, and classifying the image on a basis of simple machine-learning algorithm with selected best features in order to reduce work load, classification error and hardware utilization time. We have utilized convolution neural network (CNN) layer using similar architecture like that of Alexnet with some parametric change, for the automatic extraction of features of images obtained from slice extraction of whole brain MRI whereas 13 manual features based on gray level co-occurrence matrix were also extracted to test the impact of this features on ranking. If we had only classified using CNN network, the misclassification rate was much higher. So, feature selection is achieved with feature ranking algorithms like Mutinffs, ReliefF, Laplacian and UDFS and so on and also tested with different machine-learning techniques like Support Vector Machine, K-Nearest Neighbor and Subspace Ensemble under different testing condition. The performance of the result is satisfactory with classification accuracy around 98% to 99% with 7:3 ratio of random holdout partition of training to testing image sets and also with fivefolds of cross-validation on the same set using a standardized template.  相似文献   

19.
20.
Automatic gastrointestinal (GI) tract disease recognition is an important application of biomedical image processing. Conventionally, microscopic analysis of pathological tissue is used to detect abnormal areas of the GI tract. The procedure is subjective and results in significant inter-/intra-observer variations in disease detection. Moreover, a huge frame rate in video endoscopy is an overhead for the pathological findings of gastroenterologists to observe every frame with a detailed examination. Consequently, there is a huge demand for a reliable computer-aided diagnostic system (CADx) for diagnosing GI tract diseases. In this work, a CADx was proposed for the diagnosis and classification of GI tract diseases. A novel framework is presented where preprocessing (LAB color space) is performed first; then local binary patterns (LBP) or texture and deep learning (inceptionNet, ResNet50, and VGG-16) features are fused serially to improve the prediction of the abnormalities in the GI tract. Additionally, principal component analysis (PCA), entropy, and minimum redundancy and maximum relevance (mRMR) feature selection methods were analyzed to acquire the optimized characteristics, and various classifiers were trained using the fused features. Open-source color image datasets (KVASIR, NERTHUS, and stomach ULCER) were used for performance evaluation. The study revealed that the subspace discriminant classifier provided an efficient result with 95.02% accuracy on the KVASIR dataset, which proved to be better than the existing state-of-the-art approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号