首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The two-stream convolutional neural network exhibits excellent performance in the video action recognition. The crux of the matter is to use the frames already clipped by the videos and the optical flow images pre-extracted by the frames, to train a model each, and to finally integrate the outputs of the two models. Nevertheless, the reliance on the pre-extraction of the optical flow impedes the efficiency of action recognition, and the temporal and the spatial streams are just simply fused at the ends, with one stream failing and the other stream succeeding. We propose a novel hidden twostream collaborative (HTSC) learning network that masks the steps of extracting the optical flow in the network and greatly speeds up the action recognition. Based on the two-stream method, the two-stream collaborative learning model captures the interaction of the temporal and spatial features to greatly enhance the accuracy of recognition. Our proposed method is highly capable of achieving the balance of efficiency and precision on large-scale video action recognition datasets.  相似文献   

2.
Video prediction is the problem of generating future frames by exploiting the spatiotemporal correlation from the past frame sequence. It is one of the crucial issues in computer vision and has many real-world applications, mainly focused on predicting future scenarios to avoid undesirable outcomes. However, modeling future image content and object is challenging due to the dynamic evolution and complexity of the scene, such as occlusions, camera movements, delay and illumination. Direct frame synthesis or optical-flow estimation are common approaches used by researchers. However, researchers mainly focused on video prediction using one of the approaches. Both methods have limitations, such as direct frame synthesis, usually face blurry prediction due to complex pixel distributions in the scene, and optical-flow estimation, usually produce artifacts due to large object displacements or obstructions in the clip. In this paper, we constructed a deep neural network Frame Prediction Network (FPNet-OF) with multiple-branch inputs (optical flow and original frame) to predict the future video frame by adaptively fusing the future object-motion with the future frame generator. The key idea is to jointly optimize direct RGB frame synthesis and dense optical flow estimation to generate a superior video prediction network. Using various real-world datasets, we experimentally verify that our proposed framework can produce high-level video frame compared to other state-of-the-art framework.  相似文献   

3.
Human motion recognition plays a crucial role in the video analysis framework. However, a given video may contain a variety of noises, such as an unstable background and redundant actions, that are completely different from the key actions. These noises pose a great challenge to human motion recognition. To solve this problem, we propose a new method based on the 3-Dimensional (3D) Bag of Visual Words (BoVW) framework. Our method includes two parts: The first part is the video action feature extractor, which can identify key actions by analyzing action features. In the video action encoder, by analyzing the action characteristics of a given video, we use the deep 3D CNN pre-trained model to obtain expressive coding information. A classifier with subnetwork nodes is used for the final classification. The extensive experiments demonstrate that our method leads to an impressive effect on complex video analysis. Our approach achieves state-of-the-art performance on the datasets of UCF101 (85.3%) and HMDB51 (54.5%).  相似文献   

4.
Generally, the risks associated with malicious threats are increasing for the Internet of Things (IoT) and its related applications due to dependency on the Internet and the minimal resource availability of IoT devices. Thus, anomaly-based intrusion detection models for IoT networks are vital. Distinct detection methodologies need to be developed for the Industrial Internet of Things (IIoT) network as threat detection is a significant expectation of stakeholders. Machine learning approaches are considered to be evolving techniques that learn with experience, and such approaches have resulted in superior performance in various applications, such as pattern recognition, outlier analysis, and speech recognition. Traditional techniques and tools are not adequate to secure IIoT networks due to the use of various protocols in industrial systems and restricted possibilities of upgradation. In this paper, the objective is to develop a two-phase anomaly detection model to enhance the reliability of an IIoT network. In the first phase, SVM and Naïve Bayes, are integrated using an ensemble blending technique. K-fold cross-validation is performed while training the data with different training and testing ratios to obtain optimized training and test sets. Ensemble blending uses a random forest technique to predict class labels. An Artificial Neural Network (ANN) classifier that uses the Adam optimizer to achieve better accuracy is also used for prediction. In the second phase, both the ANN and random forest results are fed to the model’s classification unit, and the highest accuracy value is considered the final result. The proposed model is tested on standard IoT attack datasets, such as WUSTL_IIOT-2018, N_BaIoT, and Bot_IoT. The highest accuracy obtained is 99%. A comparative analysis of the proposed model using state-of-the-art ensemble techniques is performed to demonstrate the superiority of the results. The results also demonstrate that the proposed model outperforms traditional techniques and thus improves the reliability of an IIoT network.  相似文献   

5.
针对3D-CNN能够较好地提取视频中时空特征但对计算量和内存要求很高的问题,本文设计了高效3D卷积块替换原来计算量大的3×3×3卷积层,进而提出了一种融合3D卷积块的密集残差网络(3D-EDRNs)用于人体行为识别。高效3D卷积块由获取视频空间特征的1×3×3卷积层和获取视频时间特征的3×1×1卷积层组合而成。将高效3D卷积块组合在密集残差网络的多个位置中,不但利用了残差块易于优化和密集连接网络特征复用等优点,而且能够缩短训练时间,提高网络的时空特征提取效率和性能。在经典数据集UCF101、HMDB51和动态多视角复杂3D人体行为数据库(DMV action3D)上验证了结合3D卷积块的3D-EDRNs能够显著降低模型复杂度,有效提高网络的分类性能,同时具有计算资源需求少、参数量小和训练时间短等优点。  相似文献   

6.
Malicious traffic detection over the internet is one of the challenging areas for researchers to protect network infrastructures from any malicious activity. Several shortcomings of a network system can be leveraged by an attacker to get unauthorized access through malicious traffic. Safeguard from such attacks requires an efficient automatic system that can detect malicious traffic timely and avoid system damage. Currently, many automated systems can detect malicious activity, however, the efficacy and accuracy need further improvement to detect malicious traffic from multi-domain systems. The present study focuses on the detection of malicious traffic with high accuracy using machine learning techniques. The proposed approach used two datasets UNSW-NB15 and IoTID20 which contain the data for IoT-based traffic and local network traffic, respectively. Both datasets were combined to increase the capability of the proposed approach in detecting malicious traffic from local and IoT networks, with high accuracy. Horizontally merging both datasets requires an equal number of features which was achieved by reducing feature count to 30 for each dataset by leveraging principal component analysis (PCA). The proposed model incorporates stacked ensemble model extra boosting forest (EBF) which is a combination of tree-based models such as extra tree classifier, gradient boosting classifier, and random forest using a stacked ensemble approach. Empirical results show that EBF performed significantly better and achieved the highest accuracy score of 0.985 and 0.984 on the multi-domain dataset for two and four classes, respectively.  相似文献   

7.
Advanced microscopy and/or spectroscopy tools play indispensable roles in nanoscience and nanotechnology research, as they provide rich information about material processes and properties. However, the interpretation of imaging data heavily relies on the “intuition” of experienced researchers. As a result, many of the deep graphical features obtained through these tools are often unused because of difficulties in processing the data and finding the correlations. Such challenges can be well addressed by deep learning. In this work, the optical characterization of 2D materials is used as a case study, and a neural-network-based algorithm is demonstrated for the material and thickness identification of 2D materials with high prediction accuracy and real-time processing capability. Further analysis shows that the trained network can extract deep graphical features such as contrast, color, edges, shapes, flake sizes, and their distributions, based on which an ensemble approach is developed to predict the most relevant physical properties of 2D materials. Finally, a transfer learning technique is applied to adapt the pretrained network to other optical identification applications. This artificial-intelligence-based material characterization approach is a powerful tool that would speed up the preparation, initial characterization of 2D materials and other nanomaterials, and potentially accelerate new material discoveries.  相似文献   

8.
Multiple ocular region segmentation plays an important role in different applications such as biometrics, liveness detection, healthcare, and gaze estimation. Typically, segmentation techniques focus on a single region of the eye at a time. Despite the number of obvious advantages, very limited research has focused on multiple regions of the eye. Similarly, accurate segmentation of multiple eye regions is necessary in challenging scenarios involving blur, ghost effects low resolution, off-angles, and unusual glints. Currently, the available segmentation methods cannot address these constraints. In this paper, to address the accurate segmentation of multiple eye regions in unconstrainted scenarios, a lightweight outer residual encoder-decoder network suitable for various sensor images is proposed. The proposed method can determine the true boundaries of the eye regions from inferior-quality images using the high-frequency information flow from the outer residual encoder-decoder deep convolutional neural network (called ORED-Net). Moreover, the proposed ORED-Net model does not improve the performance based on the complexity, number of parameters or network depth. The proposed network is considerably lighter than previous state-of-theart models. Comprehensive experiments were performed, and optimal performance was achieved using SBVPI and UBIRIS.v2 datasets containing images of the eye region. The simulation results obtained using the proposed OREDNet, with the mean intersection over union score (mIoU) of 89.25 and 85.12 on the challenging SBVPI and UBIRIS.v2 datasets, respectively.  相似文献   

9.
With the rapid development of computer technology, millions of images are produced everyday by different sources. How to efficiently process these images and accurately discern the scene in them becomes an important but tough task. In this paper, we propose a novel supervised learning framework based on proposed adaptive binary coding for scene classification. Specifically, we first extract some high-level features of images under consideration based on available models trained on public datasets. Then, we further design a binary encoding method called one-hot encoding to make the feature representation more efficient. Benefiting from the proposed adaptive binary coding, our method is free of time to train or fine-tune the deep network and can effectively handle different applications. Experimental results on three public datasets, i.e., UIUC sports event dataset, MIT Indoor dataset, and UC Merced dataset in terms of three different classifiers, demonstrate that our method is superior to the state-of-the-art methods with large margins.  相似文献   

10.
为了提高基于图像的物体识别准确率,提出一种改进双流卷积递归神经网络的RGB-D物体识别算法(Re-CRNN).将RGB图像与深度光学信息结合,基于残差学习对双流卷积神经网络(CNN)进行改进:增加顶层特征融合单元,在RGB图像和深度图像中学习联合特征,将提取的RGB和深度图像的高层次特征进行跨通道信息融合,继而使用So...  相似文献   

11.
Violence recognition is crucial because of its applications in activities related to security and law enforcement. Existing semi-automated systems have issues such as tedious manual surveillances, which causes human errors and makes these systems less effective. Several approaches have been proposed using trajectory-based, non-object-centric, and deep-learning-based methods. Previous studies have shown that deep learning techniques attain higher accuracy and lower error rates than those of other methods. However, the their performance must be improved. This study explores the state-of-the-art deep learning architecture of convolutional neural networks (CNNs) and inception V4 to detect and recognize violence using video data. In the proposed framework, the keyframe extraction technique eliminates duplicate consecutive frames. This keyframing phase reduces the training data size and hence decreases the computational cost by avoiding duplicate frames. For feature selection and classification tasks, the applied sequential CNN uses one kernel size, whereas the inception v4 CNN uses multiple kernels for different layers of the architecture. For empirical analysis, four widely used standard datasets are used with diverse activities. The results confirm that the proposed approach attains 98% accuracy, reduces the computational cost, and outperforms the existing techniques of violence detection and recognition.  相似文献   

12.
An intelligent mosquito net employing deep learning has been one of the hotspots in the field of Internet of Things as it can reduce significantly the spread of pathogens carried by mosquitoes, and help people live well in mosquito-infested areas. In this study, we propose an intelligent mosquito net that can produce and transmit data through the Internet of Medical Things. In our method, decision-making is controlled by a deep learning model, and the proposed method uses infrared sensors and an array of pressure sensors to collect data. Moreover the ZigBee protocol is used to transmit the pressure map which is formed by pressure sensors with the deep learning perception model, determining automatically the intention of the user to open or close the mosquito net. We used optical flow to extract pressure map features, and they were fed to a 3-dimensional convolutional neural network (3D-CNN) classification model subsequently. We achieved the expected results using a nested cross-validation method to evaluate our model. Deep learning has better adaptability than the traditional methods and also has better anti-interference by the different bodies of users. This research has the potential to be used in intelligent medical protection and large-scale sensor array perception of the environment.  相似文献   

13.
在无约束的开放空间中,由于面部姿态变化、背景环境复杂、运动模糊等,人脸检测仍是一个具有挑战性的任务。本文针对视频流中人脸检测存在的平面内旋转问题,将人脸关键点与金字塔光流相结合,提出了基于级联网络和金字塔光流的旋转不变人脸检测算法。首先利用级联渐进卷积神经网络对视频流中前一帧进行人脸位置和关键点的定位;其次为获取关键点与人脸候选框间光流映射,使用独立的关键点检测网络对当前帧进行再次定位;之后计算前后两帧之间关键点光流位移;最后通过关键点光流位移与人脸候选框的映射关系,对视频中检测到的人脸进行校正,从而完成平面内旋转人脸不变性检测。实验经FDDB公开数据集上测试,证明该方法精确度较高。并且,在Boston面部跟踪数据集上进行动态测试,证明该人脸检测算法能有效解决平面内旋转人脸检测问题。对比其它检测算法,该算法检测速度有较大优势,同时视频中窗口抖动问题得到了很好解决。  相似文献   

14.
PM2.5 concentration prediction is of great significance to environmental protection and human health. Achieving accurate prediction of PM2.5 concentration has become an important research task. However, PM2.5 pollutants can spread in the earth’s atmosphere, causing mutual influence between different cities. To effectively capture the air pollution relationship between cities, this paper proposes a novel spatiotemporal model combining graph attention neural network (GAT) and gated recurrent unit (GRU), named GAT-GRU for PM2.5 concentration prediction. Specifically, GAT is used to learn the spatial dependence of PM2.5 concentration data in different cities, and GRU is to extract the temporal dependence of the long-term data series. The proposed model integrates the learned spatio-temporal dependencies to capture long-term complex spatio-temporal features. Considering that air pollution is related to the meteorological conditions of the city, the knowledge acquired from meteorological data is used in the model to enhance PM2.5 prediction performance. The input of the GAT-GRU model consists of PM2.5 concentration data and meteorological data. In order to verify the effectiveness of the proposed GAT-GRU prediction model, this paper designs experiments on real-world datasets compared with other baselines. Experimental results prove that our model achieves excellent performance in PM2.5 concentration prediction.  相似文献   

15.
Machine learning (ML) algorithms are often used to design effective intrusion detection (ID) systems for appropriate mitigation and effective detection of malicious cyber threats at the host and network levels. However, cybersecurity attacks are still increasing. An ID system can play a vital role in detecting such threats. Existing ID systems are unable to detect malicious threats, primarily because they adopt approaches that are based on traditional ML techniques, which are less concerned with the accurate classification and feature selection. Thus, developing an accurate and intelligent ID system is a priority. The main objective of this study was to develop a hybrid intelligent intrusion detection system (HIIDS) to learn crucial features representation efficiently and automatically from massive unlabeled raw network traffic data. Many ID datasets are publicly available to the cybersecurity research community. As such, we used a spark MLlib (machine learning library)-based robust classifier, such as logistic regression (LR), extreme gradient boosting (XGB) was used for anomaly detection, and a state-of-the-art DL, such as a long short-term memory autoencoder (LSTMAE) for misuse attack was used to develop an efficient and HIIDS to detect and classify unpredictable attacks. Our approach utilized LSTM to detect temporal features and an AE to more efficiently detect global features. Therefore, to evaluate the efficacy of our proposed approach, experiments were conducted on a publicly existing dataset, the contemporary real-life ISCX-UNB dataset. The simulation results demonstrate that our proposed spark MLlib and LSTMAE-based HIIDS significantly outperformed existing ID approaches, achieving a high accuracy rate of up to 97.52% for the ISCX-UNB dataset respectively 10-fold cross-validation test. It is quite promising to use our proposed HIIDS in real-world circumstances on a large-scale.  相似文献   

16.
Metabolomics experiments involve the simultaneous detection of a high number of metabolites leading to large multivariate datasets and computer-based applications are required to extract relevant biological information. A high-throughput metabolic fingerprinting approach based on ultra performance liquid chromatography (UPLC) and high resolution time-of-flight (TOF) mass spectrometry (MS) was developed for the detection of wound biomarkers in the model plant Arabidopsis thaliana. High-dimensional data were generated and analysed with chemometric methods.Besides, machine learning classification algorithms constitute promising tools to decipher complex metabolic phenotypes but their application remains however scarcely reported in that research field. The present work proposes a comparative evaluation of a set of diverse machine learning schemes in the context of metabolomic data with respect to their ability to provide a deeper insight into the metabolite network involved in the wound response. Standalone classifiers, i.e. J48 (decision tree), kNN (instance-based learner), SMO (support vector machine), multilayer perceptron and RBF network (neural networks) and Naive Bayes (probabilistic method), or combinations of classification and feature selection algorithms, such as Information Gain, RELIEF-F, Correlation Feature-based Selection and SVM-based methods, are concurrently assessed and cross-validation resampling procedures are used to avoid overfitting.This study demonstrates that machine learning methods represent valuable tools for the analysis of UPLC-TOF/MS metabolomic data. In addition, remarkable performance was achieved, while the models' stability showed the robustness and the interpretability potential. The results allowed drawing attention to both temporal and spatial metabolic patterns in the context of stress signalling and highlighting relevant biomarkers not evidenced with standard data treatment.  相似文献   

17.
In the paper, a convolutional neural network based on quaternion transformation is proposed to detect median filtering for color images. Compared with conventional convolutional neural network, color images can be processed in a holistic manner in the proposed scheme, which makes full use of the correlation between RGB channels. And due to the use of convolutional neural network, it can effectively avoid the one-sidedness of artificial features. Experimental results have shown the scheme’s improvement over the state-of-the-art scheme on the accuracy of color image median filtering detection.  相似文献   

18.
Colon cancer has been reported to be one of the frequently diagnosed cancers and the leading cause of cancer deaths. Early detection and removal of malicious polyps, which are precursors of colon cancer, can enormously lessen the fatality rate. The detection and segmentation of polyps in colonoscopy is a challenging task even for an experienced colonoscopist, due to divergences in the size, shape, texture, and the close resemblance of polyps with the colon lining. Machine-assisted detection, localization, and segmentation of polyps in the screening procedure can profoundly help the clinicians. Autoencoder-based architectures used in polyp segmentation lack the efficiency in incorporating both local and long-range pixel dependencies. To address the challenges in the automatic segmentation of colon polyps we propose an autoencoder architecture, augmented with a feature attention module in the decoder part. The salient features from RGB colonoscopic images are extracted using the residual skip-connected autoencoder. The decoder attention module joins spatial subspace with feature subspace extracted from the deep residual convolutional neural network and enhances the feature weight for precise segmentation of polyp regions. Extensive experiments on four publicly available polyp datasets demonstrate that the proposed architecture provides very impressive performance in terms of segmentation metrics (Dice scores and Jaccard scores) when compared with the state-of-the-art polyp segmentation approaches.  相似文献   

19.
Recently, machine learning-based technologies have been developed to automate the classification of wafer map defect patterns during semiconductor manufacturing. The existing approaches used in the wafer map pattern classification include directly learning the image through a convolution neural network and applying the ensemble method after extracting image features. This study aims to classify wafer map defects more effectively and derive robust algorithms even for datasets with insufficient defect patterns. First, the number of defects during the actual process may be limited. Therefore, insufficient data are generated using convolutional auto-encoder (CAE), and the expanded data are verified using the evaluation technique of structural similarity index measure (SSIM). After extracting handcrafted features, a boosted stacking ensemble model that integrates the four base-level classifiers with the extreme gradient boosting classifier as a meta-level classifier is designed and built for training the model based on the expanded data for final prediction. Since the proposed algorithm shows better performance than those of existing ensemble classifiers even for insufficient defect patterns, the results of this study will contribute to improving the product quality and yield of the actual semiconductor manufacturing process.  相似文献   

20.
Dataset dependence affects many real-life applications of machine learning: the performance of a model trained on a dataset is significantly worse on samples from another dataset than on new, unseen samples from the original one. This issue is particularly acute for small and somewhat specific databases in medical applications; the automated recognition of melanoma from skin lesion images is a prime example. We document dataset dependence in dermoscopic skin lesion image classification using three publicly available medium size datasets. Standard machine learning techniques aimed at improving the predictive power of a model might enhance performance slightly, but the gain is small, the dataset dependence is not reduced, and the best combination depends on model details. We demonstrate that simple differences in image statistics account for only 5% of the dataset dependence. We suggest a solution with two essential ingredients: using an ensemble of heterogeneous models, and training on a heterogeneous dataset. Our ensemble consists of 29 convolutional networks, some of which are trained on features considered important by dermatologists; the networks' output is fused by a trained committee machine. The combined International Skin Imaging Collaboration dataset is suitable for training, as it is multi-source, produced by a collaboration of a number of clinics over the world. Building on the strengths of the ensemble, it is applied to a related problem as well: recognizing melanoma based on clinical (non-dermoscopic) images. This is a harder problem as both the image quality is lower than those of the dermoscopic ones and the available public datasets are smaller and scarcer. We explored various training strategies and showed that 79% balanced accuracy can be achieved for binary classification averaged over three clinical datasets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号