共查询到20条相似文献,搜索用时 10 毫秒
1.
糖尿病性视网膜病变(diabetic retinopathy, DR)是糖尿病在发病过程中影响视网膜的症状。针对模型下采样过程中特征提取DR图像微动脉瘤等病灶区域信息丢失问题,提出了一种DenseNet融合残差结构的模块。该模块首先连接两个连续的dense block,然后利用残差结构对特征信息求和,并行融合处理特征图像信息,以防止有效特征信息的丢失,最后残差连接两个含有dropout的卷积块,抑制过拟合现象。针对以往卷积操作中未对病变区域的特征图通道加权的问题,提出了一种SeNet融合残差结构的模块。该模块首先连接SeNet,把全局平均池化和全局最大池化的特征信息相加,以提高有效通道信息的利用率,然后通过Conv1×1的残差方式来保证特征图信息的完整性。基于以上两个模块的设计,提出了一种DenseNet和SeNet融合残差结构的DR分类方法。该模型在APTOS2019数据集上的精确度达到89.8%,特异性达到97.0%,在Messidor-2数据集上的精确度达到78.8%,特异性达到91.9%,能够有效地提高视网膜图像病变程度的分类能力。 相似文献
2.
In the context of Arabic optical characters recognition, Arabic poses more challenges because of its cursive nature. We purpose a system for recognizing a document containing Arabic text, using a pipeline of three neural networks. The first network model predicts the font size of an Arabic word, then the word is normalized to an 18pt font size that will be used to train the next two models. The second model is used to segment a word into characters. The problem of words segmentation in the Arabic language, as in many similar cursive languages, presents a challenge to the OCR systems. This paper presents a multichannel neural network to solve the offline segmentation of machine-printed Arabic documents. The segmented characters are then fed as an input to a convolutional neural network for Arabic characters recognition. The font size prediction model produced a test accuracy of 99.1%. The accuracy of the segmentation model using one font is 98.9%, while four-font model showed 95.5% accuracy. The whole pipeline showed an accuracy of 94.38% on Arabic Transparent font of size 18pt from APTI data set. 相似文献
3.
卷积神经网络在基于视觉的机器人抓取检测任务上取得了较好的检测效果,但是大多数方法都有太多的计算参数,不适合资源有限的系统。针对这个问题,基于SqueezeNet轻量级神经网络,结合DenseNet多旁路连接加强特征复用的思想,提出了轻量级抓取检测回归模型SqueezeNet-RM(SqueezeNet Regression Model),并使用SqueezeNet-RM从RGB-D图像中提取多模态特征,预测二指机器人夹持器的最佳抓取位姿。在标准的康奈尔抓取数据集上,提出的轻量级抓取检测网络与经典的抓取检测方法相比,在保证检测准确率不降低的情况下,模型占用更少的存储空间,表现出更快的检测速度和更高的泛化性能,所提出的模型占用的存储空间比AlexNet模型减少86.97%,平均检测速度快3倍,适用于FPGA(Field Programmable Gate Array)或者资源受限的移动机器人抓取检测系统。 相似文献
4.
When patterns occur in large groups generated by a single source (style consistent test data), the statistics of the test data differ from those of the training data, which consist of patterns from all sources. We present a Gaussian model for continuously distributed sources under which we develop adaptive classifiers that specialize in the statistics of style-consistent test data. On NIST handwritten digit data, the adaptive classifiers reduce the error rate by more than 50% operating on one writer (
samples/class) at a time.Received: 14 November 2002, Accepted: 6 March 2003, Published online: 12 September 2003Correspondence to: George Nagy 相似文献
5.
D.E. Troxel 《Pattern recognition》1976,8(2):73-76
A scheme suitable for selection of features for OCR systems has been developed. This scheme has been successfully applied to several type fonts resulting in systems which recognize upper and lower case alphanumerics with less than one error per 10 000 processed characters. Data based on a large number of test characters is collected and formatted to provide a basis for the actual selection of features. The error rate for the resulting recognition systems is then verified. Considerable portions of this process have been automated, while retaining adequate opportunity for the OCR systems designer to control and influence the selection process. 相似文献
6.
Slant estimation algorithm for OCR systems 总被引:1,自引:0,他引:1
A slant removal algorithm is presented based on the use of the vertical projection profile of word images and the Wigner–Ville distribution. The slant correction does not affect the connectivity of the word and the resulting words are natural. The evaluation of our algorithm was equally made by subjective and objective means. The algorithm has been tested in English and Modern Greek samples of more than 500 writers, taken from the databases IAM-DB and GRUHD. The extracted results are natural, and almost always improved with respect to the original image, even in the case of variant-slanted writing. The performance of an existed character recognition system showed an increase of up to 9% for the same data, while the training time cost was significantly reduced. Due to its simplicity, this algorithm can be easily incorporated into any optical character recognition system. 相似文献
7.
命名实体识别是自然语言处理中的一个关键.在需求文档中存在过长的实体:虚功能,使得普适的传统命名实体识别方法无法有效地识别得到完整的实体.本文针对需求文档实体识别模型进行深入研究,引入深度学习方法,提出基于深度残差网络(ResNet)的CNER方法与基于规则的方法相结合,进行针对中文需求文档的分词.本文的命名实体识别模型... 相似文献
8.
Some objects in specific poses cannot be distinguished using a single view. A model is proposed and developed for 3D object recognition based on multiple-views; it was applied on hand postures recognition. A pulse-coupled neural network is used to generate features vector for single view. Two views with different view angles are used; each view generates its features’ vector. The two 2D-vectors are then linearly combined into one 3D vector. The hand postures are then combined to construct a dynamic gesture (word). The reconstruction is performed using best-match search algorithm. The experiment was conducted on 50 words and the result was 96% recognition accuracy confirming objects dataset offline extendibility. 相似文献
9.
Qasem A. Al-RadaidehAuthor Vitae Kamal H. MasriAuthor Vitae 《Computer Standards & Interfaces》2011,33(1):108-113
Classical mobile phone keypads which consist of 12 buttons are commonly used to write short text messages through two common methods, the multi-tap and the predictive text entry. For the Arabic language mobile keypads, all Arabic letters are distributed over the 8 buttons of the keypad where three or more letters share the same button. In this paper, a new text entry environment is proposed. The environment includes two proposed improved approaches for Arabic language messages to make the multi-tap text entry method faster and easier. The first approach is based on the idea of remapping the distribution of Arabic letters on the keypad according to the frequency of letters. In the second approach, a bi-Gram based method is used to predict the next letter to be typed automatically. The proposed approaches are evaluated using a corpus of 1514 real Arabic text messages. Several experiments were conducted to evaluate the proposed text entry environment. The results of the experiments have showed that using the proposed remapped keypad is faster and consumes less effort in comparison to the classical keypad. 相似文献
10.
Baligh M. Al-Helali 《控制论与系统》2016,47(6):478-498
The widely-used PDAs, touch screens, tablet-PCs are alternatives to keyboards with the advantages of being more friendly, easy, and natural. A framework for Arabic online character recognition is developed. The framework integrates the different phases of online Arabic text recognition. The used data poses several challenges such as delayed strokes handling, connectivity problems, variability, and style change of text. We process the delayed strokes at the different phases differently to improve the overall performance. This work includes feature extraction of many features, including several novel statistical features. Experimental results on challenging online Arabic characters show encouraging results. 相似文献
11.
Document recognition is a lively research area with much effort concentrated on optical character recognition. Less attention
is paid to locating and extracting text from the general (non-desktop, non-scanner) environment. Such contact-free extraction
of text from a general scene has applications in the context of wearable computing, robotic vision, point and click document
capture, or as an aid for visually handicapped people. Here, a novel automatic text reading system is introduced using an
active camera focused on text regions already located in the scene (using our recent work). Initially, a located region of
text is analysed to determine the optimal zoom that would foveate onto it. Then a number of images are captured over the text
region to construct a high-resolution mosaic composite of the whole region. This magnified image of the text is suitable for
reading by humans or for recognition by OCR, or even for text-to speech synthesis. Although we employed a low resolution camera,
we still obtained very good results.
ID="A1"Correspondance and offprint requests to: Dr M. Mirmehdi, Department of Computer Science, University of Bristol, Bristol BS8 1UB, UK. Email: majid@cs.bris.ac.uk 相似文献
12.
为提高行人检测的检测性能, 本文结合SqueezeNet、注意力机制、空洞卷积和Inception等结构, 提出一种基于改进YOLOv4的行人检测算法. 改进YOLO在特征增强部分引入残差连接和结合空洞卷积的注意力模块D-CBAM, 可以从提取到的特征中选择对目标检测重要的信息. 此外, 结合SqueezeNet的“squeeze- expand”结构和Inception网络的多尺度卷积思想提出Inception-fire模块用于替代网络中的连续卷积层, 通过增加网络的宽度达到提升算法性能的效果, 同时减少网络的参数. 最后, 根据行人检测任务的特点并结合Focal loss对损失函数进行改进, 分别对正负样本和难易样本添加权重因子, 强调对正样本和难分类样本的训练, 从而提高网络的检测能力. 改进的YOLO算法在INRIA行人数据集上的检测精度能够达到94.95%, 相对原YOLOv4提高4.25%, 同时参数量减少了36.35%, 检测速度也获得13.54%的提升, 在行人检测中能够表现出更优秀的性能. 相似文献
13.
We compare kernel estimators, single and multi-layered perceptrons and radial-basis functions for the problems of classification of handwritten digits and speech phonemes. By taking two different applications and employing many techniques, we report here a two-dimensional study whereby a domain-independent assessment of these learning methods can be possible. We consider a feed-forward network with one hidden layer. As examples of the local methods, we use kernel estimators like k-nearest neighbour (k-nn), Parzen windows, generalised k-nn, and Grow and Learn (Condensed Nearest Neighbour). We have also considered fuzzy k-nn due to its similarity. As distributed networks, we use linear perceptron, pairwise separating linear perceptron and multi-layer perceptrons with sigmoidal hidden units. We also tested the radial-basis function network, which is a combination of local and distributed networks. Four criteria are taken for comparison: correct classification of the test set; network size; learning time; and the operational complexity. We found that perceptrons, when the architecture is suitable, generalise better than local, memory-based kernel estimators, but require a longer training and more precise computation. Local networks are simple, leant very quickly and acceptably, but use more memory. 相似文献
14.
用于脱机手写数字识别的隐马尔可夫模型 总被引:9,自引:0,他引:9
将隐马尔可夫模型(HMM)用于脱机手写数字识别中,系统如何建模是一个值得研究的问题.在考虑手写数字自身特点及特征抽取的基础上,对HMM模型的训练方法及模型参数的选取进行了研究,以提高系统识别率.在银行票据OCR的应用中,与基于神经网络的方法结合使用,使得整张票据的拒识率降低了3%,明显提高了银行票据OCR系统的性能. 相似文献
15.
Deep Learning (DL) is known for its golden standard computing paradigm in the learning community. However, it turns out to be an extensively utilized computing approach in the ML field. Therefore, attaining superior outcomes over cognitive tasks based on human performance. The primary benefit of DL is its competency in learning massive data. The DL-based technologies have grown faster and are widely adopted to handle the conventional approaches resourcefully. Specifically, various DL approaches outperform the conventional ML approaches in real-time applications. Indeed, various research works are reviewed to understand the significance of the individual DL models and some computational complexity is observed. This may be due to the broader expertise and knowledge required for handling these models during the prediction process. This research proposes a holistic approach for pneumonia prediction and offers a more appropriate DL model for classification purposes. This work incorporates a novel fused Squeeze and Excitation (SE) block with the ResNet model for pneumonia prediction and better accuracy. The expected model reduces the human effort during the prediction process and makes it easier to diagnose it intelligently as the feature learning is adaptive. The experimentation is carried out in Keras, and the model’s superiority is compared with various advanced approaches. The proposed model gives 90% prediction accuracy, 93% precision, 90% recall and 89% F1-measure. The proposed model shows a better trade-off compared to other approaches. The evaluation is done with the existing standard ResNet model, GoogleNet+ResNet+DenseNet, and different variants of ResNet models. 相似文献
16.
针对手掌静脉图像数量少且质量参差不齐,进而导致掌脉识别系统的性能降低的现象,提出一种基于侧链连接卷积神经网络的手掌静脉图像识别方法。首先,在ResNet模型的基础上,用卷积层和池化层提取掌脉特征。然后,采用指数线性单元(ELU)激活函数、批归一化(BN)和Dropout技术来改进和优化模型,以缓解梯度消失、防止过拟合、加快收敛及增强模型泛化能力。最后,引入稠密连接网络(DenseNet),使提取到的手掌静脉特征更具丰富性和有效性。在两个公开库和一个自建库上分别进行实验,结果表明所提方法在三个数据库上的识别率分别为99.98%、97.95%、97.96%。可见该方法能有效提高掌脉识别系统的性能,且更适用于掌脉识别的实际应用。 相似文献
17.
DenseNet是一种广泛用于影像分类的卷积神经网络,但它不具备记忆功能,无法反映卷积操作后不同特征映射之间的关联关系。若将其直接应用于判断直肠癌是否发生淋巴结转移,则无法比较直肠癌CT影像特征在深度神经网络映射过程中的变化。基于此,提出了一种新颖的深度神经网络模型DenseNet-GRU(gated recurrent unit),其核心是利用GRU获取DenseNet提取的不同影像特征之间的关联关系,进而获得不同图像之间相同像素区域的特征变化情况,最终判断直肠癌患者的淋巴结是否存在转移。以包含107个患者DCM格式的腹部横断位动脉期和门脉期两种增强CT影像为实验数据集,采用数据增强和阈值分割方法对数据进行预处理,DenseNet-GRU模型在F-score上的分类精度达到了65%以上,对临床辅助诊断具有重要的现实意义。 相似文献
18.
Hossein Khosravi Ehsanollah Kabir 《International Journal on Document Analysis and Recognition》2009,12(1):21-32
An integrated OCR system for Farsi text is proposed. The system uses information from several knowledge sources (KSs) and
manages them in a blackboard approach. Some KSs like classifiers are acquired a priori through an offline training process
while others like statistical features are extracted online while recognizing. An arbiter controls the interactions between
the solution blackboard and KSs. The system has been tested on 20 real-life scanned documents with ten popular Farsi fonts
and a recognition rate of 97.05% in word level and 99.03% in character level has been achieved.
An erratum to this article can be found at 相似文献
19.
针对目前大量安装的固定监控摄像头存在监控死角,以及移动设备硬件性能较低等问题,提出一种可在较低性能的IOS移动设备上运行的城市管理案件目标识别算法。首先,在MobileNet中增加新的超参数,优化输入输出图像的通道数与每个通道所产生的特征图数量;随后,将改进后的MobileNet与SSD目标识别框架相结合构成一种新的识别算法,并移植到IOS移动端设备上;最后,该算法利用移动端设备自带的摄像头拍摄案发现场视频,实现对8种特定城管案件目标的准确检测。该算法检测结果的平均精度均值(mAP)与原型YOLO和原型SSD相比,分别提升了15.5个百分点和10.4个百分点。实验结果表明,所提算法可以在低性能IOS移动设备上流畅运行,减少了监控死角,为城管队员加速案件分类与处理提供了技术支撑。 相似文献
20.
The retrieval of information from scanned handwritten documents is becoming vital with the rapid increase of digitized documents, and word spotting systems have been developed to search for words within documents. These systems can be either template matching algorithms or learning based. This paper presents a coherent learning based Arabic handwritten word spotting system which can adapt to the nature of Arabic handwriting, which can have no clear boundaries between words. Consequently, the system recognizes Pieces of Arabic Words (PAWs), then re-constructs and spots words using language models. The proposed system produced promising result for Arabic handwritten word spotting when tested on the CENPARMI Arabic documents database. 相似文献