首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 156 毫秒
针对任意形状的场景文本检测与识别,提出一种新的端到端场景文本检测与识别算法。首先,引入了文本感知模块基于分割思想的检测分支从卷积网络提取的视觉特征中完成场景文本的检测;然后,由基于Transformer视觉模块和Transformer语言模块组成的识别分支对检测结果进行文本特征的编码;最后,由识别分支中的融合门融合编码的文本特征,输出场景文本。在Total-Text、ICDAR2013和ICDAR2015基准数据集上进行的实验结果表明,所提算法在召回率、准确率和F值上均表现出了优秀的性能,且时间效率具有一定的优势。  相似文献   

自然场景的弯曲文本检测技术多用于智慧旅游场景.针对当前弯曲文本检测存在的受到卷积神经网络的感受野大小和提取特征能力有待提升的影响,网络难以识别自然场景图像中的文本和非文本区域问题,提出了一种基于注意力机制和空洞卷积的自然场景下文本检测方法 (Resnet Squeeze and Excitation Dilation Jaccard Progressive Scale Expansion Network, RSDJ-PSE). RSDJ-PSE引入软注意力机制SE块在检测网络的骨干网络中,进一步增强了特征提取能力,接着引入空洞卷积到骨干网络中,扩展了卷积的感受野且不增大参数量,最后使用Jaccard系数替换Dice系数在后处理算法中,提升了该文本检测方法的F值.在定向文本数据集ICDAR2015、标准弯曲文本数据集CTW1500和Total-Text数据集上的检测结果表明:与8种检测方法对比,该方法具有最好的文本检测性能.  相似文献   

吴炜晨  许衍 《电子设计工程》2023,(8):101-104+109
随着细粒度图像分类研究的不断深入,用户点击数据逐渐被人们当成可靠的语义特征。由于用户点击数据集规模巨大且存在大量冗余,直接使用点击特征进行识别也存在诸多挑战。该文提出利用文本聚类降低文本空间并优化原始点击特征,从而建立精简的文本空间来表征图像,该方法能更好地合并语义相近的文本。在微软发布的Clickture-Dog大数据集上进行的大量实验表明,点击向量特征优于传统图像的视觉特征,图像识别任务中的准确率也更高;基于视觉相似度的传播算法能帮助提高点击特征的表征能力;在大规模文本聚类中,基于稀疏编码的聚类方式识别率达到了58.24%。  相似文献   

视频数据中的文本是视频语义理解和检索的重要信息来源.文中对视频中文本的检测、定位、提取、增强和识别进行了研究.提出了应用小波模极大值算法检测视频帧文本所在的位置,用由粗到精的多层定位方法以及金字塔模型,对于多尺度的静止和滚动中英文文字进行提取,最后对文本区域进行二值化.实验表明文中方法取得了良好的效果.  相似文献   

针对场景文本识别在长距离建模时容易产生信息丢失和对低分辨率文本图像表征能力较弱的问题,提出了一种基于多模态迭代及修正的文本识别算法。本文算法的视觉模型(vision model)是由CoTNet(contextual transformer networks for visual recognition)、动态卷积注意力模块(dynamic convolution attention module,DCAM)、EA-Encoder(external attention encoder)和位置注意力机制组合而成的。其中CoTNet可以有效起到缓解长距离建模产生的信息丢失问题;DCAM在增强表征能力、专注于重要特征的同时,将重要的特征传给EA-Encoder,进而提高CoTNet和EA-Encoder之间的联系;EA-Encoder可以学习整个数据集上最优区分度的特征,捕获最有语义信息的部分,进而增强表征能力。经过视觉模型后,再经过文本修正模块(text correction model)和融合模块(fusion model)得到最终的识别结果。实验数据显示,本文所提出的算法在多个公共场景文本数据集上表现良好,尤其是在不规则数据集ICDAR2015上准确率高达85.9%。  相似文献   

在深度学习技术的发展驱动下,智慧应用场景对文本识别任务提出了更高的要求。现有方法更加侧重构建强大的视觉特征提取网络,忽略了文本序列特征的提取能力。针对该问题,提出了一种基于层次自注意力的场景文本识别网络。通过融合卷积和自注意力可以建立并增强文本序列信息与视觉感知信息间的联系。由于视觉特征和序列特征在全局空间中的充分交互,有效地减小了复杂背景噪声对识别精度的影响,实现了对规则和不规则场景文本的鲁棒性预测。实验结果表明,所提方法在各数据集上均表现出竞争力。尤其是在CUTE数据集上可以实现81.4%,6.24 ms的最佳精度和速度,具备一定的应用潜力。  相似文献   

如何利用数量庞大的专利并从中找到用户感兴趣的专利进行推荐是很多专利数据库迫切需要解决的问题。文中从专利文本的标题和摘要入手,提出一种基于文本挖掘的专利推荐方法。首先,利用词袋模型将专利文本转化成计算机能够识别的数据;其次,利用文本聚类算法完成专利数据集进行领域划分;再次,结合词频-逆文档频率特征权重计算和余弦相似度来选择合适的发明人进行专利的推荐;最后,以我国物流产业下的专利数据作为数据集完成文中所提方法的验证与分析。实验结果表明,基于文本挖掘的专利推荐研究能够实现对发明人的个性化推荐。  相似文献   

李健壮 《移动信息》2020,(5):00050-00051
作为现代计算机科学领域的重要研究方向,文本大数据与自然语言处理的结合程度日益提升,大有集语言学、计算机科学、数学于一体的趋势。文章对文本大数据与自然语言处理的有关概念进行了介绍,并阐述了文本大数据与自然语言处理融合运用的有关思考,希望为读者提供一定的参考。  相似文献   

利用光学字符识别方法对印章文字进行检测与识别,能够加快各类合同的分类处理速度与鉴别效率。该文针对圆形印章文字呈环形排列的特点,利用极坐标展开对印章文字进行预处理,克服了印章文字方向不统一的问题。对于展开后上下起伏的文本区域,利用带角度信息的联结文本提议网(CTPN)对印章文字区域进行检测,并使用贝塞尔拟合文本区域,实现了对印章区域的准确检测。最后利用注意力转移机制和该文匹配算法对检测的文字区域进行识别,输出印章文字内容。运用该算法对输出印章文字内容自制的中文印章数据集进行实验,印章内容的文字检测F值可以达到84.73%,文字识别召回率达到84.4%,表明该算法可以有效地检测识别印章内容,对文档的分类与鉴别研究具有重要的意义。  相似文献   

利用光学字符识别方法对印章文字进行检测与识别,能够加快各类合同的分类处理速度与鉴别效率.该文针对圆形印章文字呈环形排列的特点,利用极坐标展开对印章文字进行预处理,克服了印章文字方向不统一的问题.对于展开后上下起伏的文本区域,利用带角度信息的联结文本提议网(CTPN)对印章文字区域进行检测,并使用贝塞尔拟合文本区域,实现了对印章区域的准确检测.最后利用注意力转移机制和该文匹配算法对检测的文字区域进行识别,输出印章文字内容.运用该算法对输出印章文字内容自制的中文印章数据集进行实验,印章内容的文字检测F值可以达到84.73%,文字识别召回率达到84.4%,表明该算法可以有效地检测识别印章内容,对文档的分类与鉴别研究具有重要的意义.  相似文献   

提出了一种基于色彩距离最小化和最大 色彩差(MCD)的场景文本定位方法。首先,使用多次K均值 聚类和色彩距离最小化的方法,从不同复杂程度的场景图像中提取文本 连通区域;考虑到色彩聚类方法容易受光照影响,使用基于MCD最大色彩差的方法,提取 文本连通区域作为补充,由于将 色彩与梯度信息相结合,在一定程度上能克服光照的影响;将得到的连通区域通过设 定的字符合并规则,构建文本行; 候选文本行中通常包含错误检测的非文本行,为了提高文本检测的正确率,最后采用基于特 征提取和机器学习的方法,验证 候选文本行,得到文本定位结果。将本文方法在ICDAR2011和ICDAR2013公共数 据库上实验,对于ICDAR2011数据集,本文 获得的召回率、准确率和F指标分别为0.66、0.77;对于ICDAR2013数据集,本文获得的召回率、准确率和F 指标分别为0.65、0.77。将本文方法与 其它文本检测算法比较,结果表明本文方法的可行性、有效性。  相似文献   

重叠社团在社交网络大数据中普遍存在.针对现有重叠社团挖掘算法易将重叠区域错误地划分为独立的社团且计算复杂的问题,提出了一种基于局部信息度量的快速重叠社团挖掘算法(Local information based Fast Overlapped Communities Detection,Li-FOCD).首先,为节点定义局部信息度量指标——社团连接度和邻居连接度,建模节点与社团的关系,缩小了计算范围;然后,每次并行地迭代执行缩减、扩展、去重等操作,并更新局部度量指标,通过松弛每次迭代的终止条件,发现近似最优社团集合而不是最优社团,最终算法复杂度为O(m+n).基于真实的大规模社交网络数据的试验分析表明:与当前流行的重叠社团挖掘算法相比,Li-FOCD在不损失检测质量的前提下,大幅提升了计算效率.  相似文献   

In real‐world intelligent transportation systems, accuracy in vehicle license plate detection and recognition is considered quite critical. Many algorithms have been proposed for still images, but their accuracy on actual videos is not satisfactory. This stems from several problematic conditions in videos, such as vehicle motion blur, variety in viewpoints, outliers, and the lack of publicly available video datasets. In this study, we focus on these challenges and propose a license plate detection and recognition scheme for videos based on a temporal matching prior network. Specifically, to improve the robustness of detection and recognition accuracy in the presence of motion blur and outliers, forward and bidirectional matching priors between consecutive frames are properly combined with layer structures specifically designed for plate detection. We also built our own video dataset for the deep training of the proposed network. During network training, we perform data augmentation based on image rotation to increase robustness regarding the various viewpoints in videos.  相似文献   

Sarcasm is a type of sentiment where people express their negative feelings using positive or intensified positive words in the text. While speaking, people often use heavy tonal stress and certain gestural clues like rolling of the eyes, hand movement, etc. to reveal sarcastic. In the textual data, these tonal and gestural clues are missing, making sarcasm detection very difficult for an average human. Due to these challenges, researchers show interest in sarcasm detection of social media text, especially in tweets. Rapid growth of tweets in volume and its analysis pose major challenges. In this paper, we proposed a Hadoop based framework that captures real time tweets and processes it with a set of algorithms which identifies sarcastic sentiment effectively. We observe that the elapse time for analyzing and processing under Hadoop based framework significantly outperforms the conventional methods and is more suited for real time streaming tweets.  相似文献   

Person re-identification(ReID) is an intelligent video surveillance technology that retrieves the same person from different cameras. This task is extremely challenging due to changes in person poses, different camera views, and occlusion. In recent years, person ReID based on deep learning technology has received widespread attention due to the rapid development and excellent performance of deep learning. In this paper, we first divide person ReID based on deep learning approaches into seven types, i.e., fused hand-crafted features deep model, representation learning model, metric learning model, part-based deep model, video-based model, GAN-based model, unsupervised model. Furthermore, we launched a brief overview of the seven types. Then, we introduce some examples of commonly used datasets, compare the performance of some algorithms on image and video datasets in recent years, and analyze the advantages and disadvantages of various methods. Finally, we summarize the possible future research directions of person ReID technology.  相似文献   

贺超波  汤庸  张琼  刘双印  刘海 《电子学报》2019,47(5):1086-1093
对社会化媒体产生的大量短文本进行聚类分析具有重要的应用价值,但短文本往往具有噪音数据多、增长迅速且数据量大的特点,导致现有相关算法难于有效处理.提出一种基于增量式鲁棒非负矩阵分解的短文本在线聚类算法STOCIRNMF.STOCIRNMF基于非负矩阵分解构建短文本聚类模型,通过l2,1范数设计模型的优化求解目标函数提高鲁棒性,同时应用增量式迭代更新规则实现短文本的在线聚类.在搜狐新闻标题和微博短文本数据集上进行相关实验,结果表明STOCIRNMF不仅比现有代表性算法具有更好的聚类性能,而且能够有效对微博话题进行在线检测.  相似文献   

The powerful representation capacity of deep learning has made it inevitable for the underwater image enhancement community to employ its potential. The exploration of deep underwater image enhancement networks is increasing over time; hence, a comprehensive survey is the need of the hour. In this paper, our main aim is two-fold, (1): to provide a comprehensive and in-depth survey of the deep learning-based underwater image enhancement, which covers various perspectives ranging from algorithms to open issues, and (2): to conduct a qualitative and quantitative comparison of the deep algorithms on diverse datasets to serve as a benchmark, which has been barely explored before.We first introduce the underwater image formation models, which are the base of training data synthesis and design of deep networks, and also helpful for understanding the process of underwater image degradation. Then, we review deep underwater image enhancement algorithms, and a glimpse of some of the aspects of the current networks is presented, including architecture, parameters, training data, loss function, and training configurations. We also summarize the evaluation metrics and underwater image datasets. Following that, a systematically experimental comparison is carried out to analyze the robustness and effectiveness of deep algorithms. Meanwhile, we point out the shortcomings of current benchmark datasets and evaluation metrics. Finally, we discuss several unsolved open issues and suggest possible research directions. We hope that all efforts done in this paper might serve as a comprehensive reference for future research and call for the development of deep learning-based underwater image enhancement.  相似文献   

In this paper, we propose a novel framework to extract text regions from scene images with complex backgrounds and multiple text appearances. This framework consists of three main steps: boundary clustering (BC), stroke segmentation, and string fragment classification. In BC, we propose a new bigram-color-uniformity-based method to model both text and attachment surface, and cluster edge pixels based on color pairs and spatial positions into boundary layers. Then, stroke segmentation is performed at each boundary layer by color assignment to extract character candidates. We propose two algorithms to combine the structural analysis of text stroke with color assignment and filter out background interferences. Further, we design a robust string fragment classification based on Gabor-based text features. The features are obtained from feature maps of gradient, stroke distribution, and stroke width. The proposed framework of text localization is evaluated on scene images, born-digital images, broadcast video images, and images of handheld objects captured by blind persons. Experimental results on respective datasets demonstrate that the framework outperforms state-of-the-art localization algorithms.  相似文献   

本文提出了一种场景文本检测方法,用于应对复杂自然场景中文本检测的挑战。该方法采用了双重注意力和多尺度特征融合的策略,通过双重注意力融合机制增强了文本特征通道之间的关联性,提升了整体检测性能。在考虑到深层特征图上下采样可能引发的语义信息损失的基础上,提出了空洞卷积多尺度特征融合金字塔(dilated convolution multi-scale feature fusion pyramid structure, MFPN),它采用双融合机制来增强语义特征,有助于加强语义特征,克服尺度变化的影响。针对不同密度信息融合引发的语义冲突和多尺度特征表达受限问题,创新性地引入了多尺度特征融合模块(multi-scale feature fusion module, MFFM)。此外,针对容易被冲突信息掩盖的小文本问题,引入了特征细化模块(feature refinement module, FRM)。实验表明,本文的方法对复杂场景中文本检测有效,其F值在CTW1500、ICDAR2015和Total-Text 3个数据集上分别达到了85.6%、87.1%和86.3%。  相似文献   

Federated Learning (FL) with mobile computing and the Internet of Things (IoT) is an effective cooperative learning approach. However, several technical challenges still need to be addressed. For instance, dividing the training process among several devices may impact the performance of Machine Learning (ML) algorithms, often significantly degrading prediction accuracy compared to centralized learning. One of the primary reasons for such performance degradation is that each device can access only a small fraction of data (that it generates), which limits the efficacy of the local ML model constructed on that device. The performance degradation could be exacerbated when the participating devices produce different classes of events, which is known as the class balance problem. Moreover, if the participating devices are of different types, each device may never observe the same types of events, which leads to the device heterogeneity problem. In this study, we investigate how data augmentation can be applied to address these challenges and improving detection performance in an anomaly detection task using IoT datasets. Our extensive experimental results with three publicly accessible IoT datasets show the performance improvement of up to 22.9% with the approach of data augmentation, compared to the baseline (without relying on data augmentation). In particular, stratified random sampling and uniform random sampling show the best improvement in detection performance with only a modest increase in computation time, whereas the data augmentation scheme using Generative Adversarial Networks is the most time-consuming with limited performance benefits.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号