首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
    
Handwritten character recognition systems are used in every field of life nowadays, including shopping malls, banks, educational institutes, etc. Urdu is the national language of Pakistan, and it is the fourth spoken language in the world. However, it is still challenging to recognize Urdu handwritten characters owing to their cursive nature. Our paper presents a Convolutional Neural Networks (CNN) model to recognize Urdu handwritten alphabet recognition (UHAR) offline and online characters. Our research contributes an Urdu handwritten dataset (aka UHDS) to empower future works in this field. For offline systems, optical readers are used for extracting the alphabets, while diagonal-based extraction methods are implemented in online systems. Moreover, our research tackled the issue concerning the lack of comprehensive and standard Urdu alphabet datasets to empower research activities in the area of Urdu text recognition. To this end, we collected 1000 handwritten samples for each alphabet and a total of 38000 samples from 12 to 25 age groups to train our CNN model using online and offline mediums. Subsequently, we carried out detailed experiments for character recognition, as detailed in the results. The proposed CNN model outperformed as compared to previously published approaches.  相似文献   

2.
    
This research proposed an improved transfer-learning bird classification framework to achieve a more precise classification of Protected Indonesia Birds (PIB) which have been identified as the endangered bird species. The framework takes advantage of using the proposed sequence of Batch Normalization Dropout Fully-Connected (BNDFC) layers to enhance the baseline model of transfer learning. The main contribution of this work is the proposed sequence of BNDFC that can be applied to any Convolutional Neural Network (CNN) based model to improve the classification accuracy, especially for image-based species classification problems. The experiment results show that the proposed sequence of BNDFC layers outperform other combination of BNDFC. The addition of BNDFC can improve the model’s performance across ten different CNN-based models. On average, BNDFC can improve by approximately 19.88% in Accuracy, 24.43% in F-measure, 17.93% in G-mean, 23.41% in Sensitivity, and 18.76% in Precision. Moreover, applying fine-tuning (FT) is able to enhance the accuracy by 0.85% with a smaller validation loss of 18.33% improvement. In addition, MobileNetV2 was observed to be the best baseline model with the lightest size of 35.9 MB and the highest accuracy of 88.07% in the validation set.  相似文献   

3.
    

Lip reading is typically regarded as visually interpreting the speaker’s lip movements during the speaking. This is a task of decoding the text from the speaker’s mouth movement. This paper proposes a lip-reading model that helps deaf people and persons with hearing problems to understand a speaker by capturing a video of the speaker and inputting it into the proposed model to obtain the corresponding subtitles. Using deep learning technologies makes it easier for users to extract a large number of different features, which can then be converted to probabilities of letters to obtain accurate results. Recently proposed methods for lip reading are based on sequence-to-sequence architectures that are designed for natural machine translation and audio speech recognition. However, in this paper, a deep convolutional neural network model called the hybrid lip-reading (HLR-Net) model is developed for lip reading from a video. The proposed model includes three stages, namely, pre-processing, encoder, and decoder stages, which produce the output subtitle. The inception, gradient, and bidirectional GRU layers are used to build the encoder, and the attention, fully-connected, activation function layers are used to build the decoder, which performs the connectionist temporal classification (CTC). In comparison with the three recent models, namely, the LipNet model, the lip-reading model with cascaded attention (LCANet), and attention-CTC (A-ACA) model, on the GRID corpus dataset, the proposed HLR-Net model can achieve significant improvements, achieving the CER of 4.9%, WER of 9.7%, and Bleu score of 92% in the case of unseen speakers, and the CER of 1.4%, WER of 3.3%, and Bleu score of 99% in the case of overlapped speakers.

  相似文献   

4.
    

The increasing capabilities of Artificial Intelligence (AI), has led researchers and visionaries to think in the direction of machines outperforming humans by gaining intelligence equal to or greater than humans, which may not always have a positive impact on the society. AI gone rogue, and Technological Singularity are major concerns in academia as well as the industry. It is necessary to identify the limitations of machines and analyze their incompetence, which could draw a line between human and machine intelligence. Internet memes are an amalgam of pictures, videos, underlying messages, ideas, sentiments, humor, and experiences, hence the way an internet meme is perceived by a human may not be entirely how a machine comprehends it. In this paper, we present experimental evidence on how comprehending Internet Memes is a challenge for AI. We use a combination of Optical Character Recognition techniques like Tesseract, Pixel Link, and East Detector to extract text from the memes, and machine learning algorithms like Convolutional Neural Networks (CNN), Region-based Convolutional Neural Networks (RCNN), and Transfer Learning with pre-trained denseNet for assessing the textual and facial emotions combined. We evaluate the performance using Sensitivity and Specificity. Our results show that comprehending memes is indeed a challenging task, and hence a major limitation of AI. This research would be of utmost interest to researchers working in the areas of Artificial General Intelligence and Technological Singularity.

  相似文献   

5.
给出了大数据和机器学习的子领域——深度学习的概念,阐述了深度学习对获取大数据中的有价值信息的重要作用。描述了大数据下利用图像处理单元(GPU)进行并行运算的深度学习框架,对其中的大规模卷积神经网络(CNN)、大规模深度置信网络(DBN)和大规模递归神经网络(RNN)进行了重点论述。分析了大数据的容量、多样性、速率特征,介绍了大规模数据、多样性数据、高速率数据下的深度学习方法。展望了大数据背景下深度学习的发展前景,指出在不远的将来,大数据与深度学习融合的技术将会在计算机视觉、机器智能等多个领域获得突破性进展。  相似文献   

6.
    
This study is designed to develop Artificial Intelligence (AI) based analysis tool that could accurately detect COVID-19 lung infections based on portable chest x-rays (CXRs). The frontline physicians and radiologists suffer from grand challenges for COVID-19 pandemic due to the suboptimal image quality and the large volume of CXRs. In this study, AI-based analysis tools were developed that can precisely classify COVID-19 lung infection. Publicly available datasets of COVID-19 (N = 1525), non-COVID-19 normal (N = 1525), viral pneumonia (N = 1342) and bacterial pneumonia (N = 2521) from the Italian Society of Medical and Interventional Radiology (SIRM), Radiopaedia, The Cancer Imaging Archive (TCIA) and Kaggle repositories were taken. A multi-approach utilizing deep learning ResNet101 with and without hyperparameters optimization was employed. Additionally, the features extracted from the average pooling layer of ResNet101 were used as input to machine learning (ML) algorithms, which twice trained the learning algorithms. The ResNet101 with optimized parameters yielded improved performance to default parameters. The extracted features from ResNet101 are fed to the k-nearest neighbor (KNN) and support vector machine (SVM) yielded the highest 3-class classification performance of 99.86% and 99.46%, respectively. The results indicate that the proposed approach can be better utilized for improving the accuracy and diagnostic efficiency of CXRs. The proposed deep learning model has the potential to improve further the efficiency of the healthcare systems for proper diagnosis and prognosis of COVID-19 lung infection.  相似文献   

7.
针对多种定位因素存在复杂关联且不易准确提取的问题,提出了以完整双耳声信号作为输入的、基于深度学习的双耳声源定位算法。首先,分别采用深层全连接后向传播神经网络(Deep Back Propagation Neural Network,D-BPNN)和卷积神经网络(Convolutional Neural Network, CNN)实现深度学习框架;然后,分别以水平面 15°、30°和 45°空间角度间隔的双耳声信号进行模型训练;最后,采用前后混乱率、定位准确率与训练时长等指标进行算法有效性分析。模型预测结果表明,CNN模型的前后混乱率远低于 D-BPNN;D-BPNN模型的定位准确率能够达到87%以上,而 CNN模型的定位准确率能够达到 98%左右;在相同实验条件下,CNN模型的训练时长大于 D-BPNN,且随着水平面角度间隔的减小,两者训练时长之间的差异愈发显著。  相似文献   

8.
针对传统鸟声识别算法中特征提取方式单一、分类识别准确率低等问题,提出一种结合卷积神经网络和Transformer网络的鸟声识别方法。该方法综合考虑网络局部特征学习和全局上下文依赖性构造,从原始鸟声音频信号中提取短时傅里叶变换(Short Time Fourier Transform,STFT)语谱图特征,将其输入到卷积神经网络(ConvolutionalNeural Network,CNN)中提取局部频谱特征信息,同时提取鸟声信号的对数梅尔特征及一阶差分、二阶差分特征用于合成梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)混合特征向量,将其输入到Transformer网络中获取全局序列特征信息,最后融合所提取的特征可得到更丰富的鸟声特征参数,通过Softmax分类器得到鸟声识别结果。在Birdsdata和xeno-canto鸟声数据集上进行实验,平均识别准确率分别达到了97.81%和89.47%。实验结果表明该方法相较于其他现有的鸟声识别模型具有更高的识别准确率。  相似文献   

9.
针对电机故障诊断问题,设计一种新型的一维卷积神经网络结构(1D-CNN),提出一种基于声信号和1D-CNN的电机故障诊断方法.为了验证1D-CNN算法在电机故障识别领域的有效性,以一组空调故障电机作为实验对象,搭建电机故障诊断平台,对4种状态的空调电机进行声信号采集实验,制作电机故障声信号数据集,并运用1D-CNN算法...  相似文献   

10.
储有亮  李梁 《声学技术》2021,40(6):815-821
为了解决人们在强噪声环境下,通过空气途径传递的语音信号会严重失真的问题,提出了一种基于深层双向长短期记忆-深度卷积神经网络(Deep Bidirectional Long and Short Term Memory-Deep Convolutional Neural Network,DBLSTM-DCNN)的骨导语音转...  相似文献   

11.
    
Emotion recognition systems are helpful in human–machine interactions and Intelligence Medical applications. Electroencephalogram (EEG) is closely related to the central nervous system activity of the brain. Compared with other signals, EEG is more closely associated with the emotional activity. It is essential to study emotion recognition based on EEG information. In the research of emotion recognition based on EEG, it is a common problem that the results of individual emotion classification vary greatly under the same scheme of emotion recognition, which affects the engineering application of emotion recognition. In order to improve the overall emotion recognition rate of the emotion classification system, we propose the CSP_VAR_CNN (CVC) emotion recognition system, which is based on the convolutional neural network (CNN) algorithm to classify emotions of EEG signals. Firstly, the emotion recognition system using common spatial patterns (CSP) to reduce the EEG data, then the standardized variance (VAR) is selected as the parameter to form the emotion feature vectors. Lastly, a 5-layer CNN model is built to classify the EEG signal. The classification results show that this emotion recognition system can better the overall emotion recognition rate: the variance has been reduced to 0.0067, which is a decrease of 64% compared to that of the CSP_VAR_SVM (CVS) system. On the other hand, the average accuracy reaches 69.84%, which is 0.79% higher than that of the CVS system. It shows that the overall emotion recognition rate of the proposed emotion recognition system is more stable, and its emotion recognition rate is higher.  相似文献   

12.
    
Classifying fetal ultrasound images into different anatomical categories, such as the abdomen, brain, femur, thorax, and so forth can contribute to the early identification of potential anomalies or dangers during prenatal care. Ignoring major abnormalities that might lead to fetal death or permanent disability. This article proposes a novel hybrid capsule network architecture-based method for identifying fetal ultrasound images. The proposed architecture increases the precision of fetal image categorization by combining the benefits of a capsule network with a convolutional neural network. The proposed hybrid model surpasses conventional convolutional network-based techniques with an overall accuracy of 0.989 when tested on a publicly accessible dataset of prenatal ultrasound images. The results indicate that the proposed hybrid architecture is a promising approach for precisely and consistently classifying fetal ultrasound images, with potential uses in clinical settings.  相似文献   

13.
仝钰  庞新宇  魏子涵 《振动与冲击》2021,(5):247-253,260
针对一维信号作为卷积神经网络输入时无法充分利用数据间的相关信息的问题,提出GADF-CNN的轴承故障诊断模型.利用格拉姆角差域(GADF)对采集到的振动信号进行编码,可以很容易地进行角度透视,从而识别出不同时间间隔内的时间相关性并生产相应特征图,之后将其输入卷积神经网络(CNN)自适应的完成滚动轴承故障特征的提取与分类...  相似文献   

14.
车辆识别代号对于车辆年检具有重要的意义.由于缺乏字符级标注,无法对车辆识别代号进行单字符风格校验.针对该问题,设计了一种单字符检测和识别框架,并对此框架提出了一种无须字符级标注的弱监督学习方法.首先,对VGG16-BN各个层次的特征信息进行融合,获得具有单字符位置信息与语义信息的融合特征图;其次,设计了一个字符检测分支...  相似文献   

15.
目的 当水轮机发生空化故障时;机组效率下降、部件侵蚀加速;严重时甚至引发安全事故。因此;准确且快速识别水轮机空化状态;对水电站高效、安全运行至关重要。针对目前复杂卷积神经网络(convolutional neural network;CNN)模型存在的识别速度慢与简单CNN模型存在的识别准确率低等问题;提出一种基于知识蒸馏(knowledge distillation;KD)-CNN的水轮机空化状态识别方法。 方法 首先;引入知识蒸馏理论中教师模型与学生模型相互作用机理;定义3层CNN网络作为教师模型;定义单层CNN网络作为学生模型;然后;利用试验获取的空化声发射信号数据对教师模型进行训练;最后;将代表空化状态类型的数据标签替换成教师模型的输出;通过学生模型对替换标签后的新数据集进行学习;使交叉熵达到最小值。训练完成后的模型即为KD-CNN模型;利用该模型对各工况数据进行空化状态识别试验。 结果 KD-CNN模型在2 s内即可完成水轮机空化状态识别;且各工况的识别准确率均高于97%。 结论 KD-CNN模型结构简单;同时具有学生模型的识别速度与教师模型的识别准确率;为水轮机空化实时监测提供了新思路。  相似文献   

16.
    
The healthcare industry has been significantly impacted by the widespread adoption of advanced technologies such as deep learning (DL) and artificial intelligence (AI). Among various applications, computer-aided diagnosis has become a critical tool to enhance medical practice. In this research, we introduce a hybrid approach that combines a deep neural model, data collection, and classification methods for CT scans. This approach aims to detect and classify the severity of pulmonary disease and the stages of lung cancer. Our proposed lung cancer detector and stage classifier (LCDSC) demonstrate greater performance, achieving higher accuracy, sensitivity, specificity, recall, and precision. We employ an active contour model for lung cancer segmentation and high-resolution net (HRNet) for stage classification. This methodology is validated using the industry-standard benchmark image dataset lung image database consortium and image database resource initiative (LIDC-IDRI). The results show a remarkable accuracy of 98.4% in classifying lung cancer stages. Our approach presents a promising solution for early lung cancer diagnosis, potentially leading to improved patient outcomes.  相似文献   

17.
    
In the modern world, one of the most severe eye infections brought on by diabetes is known as diabetic retinopathy (DR), which will result in retinal damage, and, thus, lead to blindness. Diabetic retinopathy (DR) can be well treated with early diagnosis. Retinal fundus images of humans are used to screen for lesions in the retina. However, detecting DR in the early stages is challenging due to the minimal symptoms. Furthermore, the occurrence of diseases linked to vascular anomalies brought on by DR aids in diagnosing the condition. Nevertheless, the resources required for manually identifying the lesions are high. Similarly, training for Convolutional Neural Networks (CNN) is more time-consuming. This proposed research aims to improve diabetic retinopathy diagnosis by developing an enhanced deep learning model (EDLM) for timely DR identification that is potentially more accurate than existing CNN-based models. The proposed model will detect various lesions from retinal images in the early stages. First, characteristics are retrieved from the retinal fundus picture and put into the EDLM for classification. For dimensionality reduction, EDLM is used. Additionally, the classification and feature extraction processes are optimized using the stochastic gradient descent (SGD) optimizer. The EDLM’s effectiveness is assessed on the KAGGLE dataset with 3459 retinal images, and results are compared over VGG16, VGG19, RESNET18, RESNET34, and RESNET50. Experimental results show that the EDLM achieves higher average sensitivity by 8.28% for VGG16, by 7.03% for VGG19, by 5.58% for ResNet18, by 4.26% for ResNet 34, and by 2.04% for ResNet 50, respectively.  相似文献   

18.
水下声信号分类是水声学研究的一个重要方向.一个有效的特征提取和分类决策方法对水声信号分类技术至关重要.文章将鱼声、商船辐射噪声和风关噪声三类实测的水声信号在小波包分解的基础上提取时频图特征,并搭建了一个七层结构的卷积神经网络作为分类器.研究结果表明:三种水声信号的小波包时频图特征结合卷积神经网络在不同测试集可达到(98...  相似文献   

19.
针对滚动轴承退化性能难以评估、寿命状态难以识别的难题,提出一种基于性能衰退评估的轴承寿命状态识别新方法,该方法基于卷积自编码器(convolutional autoencoder,CAE)与多维尺度分析(multidimensional scaling,MDS)算法构建轴承性能衰退指标,再根据构建指标和改进卷积神经网络...  相似文献   

20.
基于卷积神经网络模型的遥感图像分类   总被引:2,自引:0,他引:2  
研究了遥感图像的分类,针对遥感图像的支持向量机(SVM)等浅层结构分类模型特征提取困难、分类精度不理想等问题,设计了一种卷积神经网络(CNN)模型,该模型包含输入层、卷积层、全连接层以及输出层,采用Soft Max分类器进行分类。选取2010年6月6日Landsat TM5富锦市遥感图像为数据源进行了分类实验,实验表明该模型采用多层卷积池化层能够有效地提取非线性、不变的地物特征,有利于图像分类和目标检测。针对所选取的影像,该模型分类精度达到94.57%,比支持向量机分类精度提高了5%,在遥感图像分类中具有更大的优势。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号