首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 203 毫秒
1.
目的 输电线路金具种类繁多、用处多样,与导线和杆塔安全密切相关。评估金具运行状态并实现故障诊断,需对输电线路金具目标进行精确定位和识别,然而随着无人机巡检采集的数据逐渐增多,将全部数据进行人工标注愈发困难。针对无标注数据无法有效利用的问题,提出一种基于自监督E-Swin Transformer (efficient shifted windows Transformer)的输电线路金具检测模型,充分利用无标注数据提高检测精度。方法 首先,为了减少自注意力的计算量、提高模型计算效率,对Swin Transformer自注意力计算进行优化,提出一种高效的主干网络E-Swin。然后,为了利用无标注金具数据加强特征提取效果,针对E-Swin设计轻量化的自监督方法,并进行预训练。最后,为了提高检测定位精度,采用一种添加额外分支的检测头,并结合预训练之后的主干网络构建检测模型,利用少量有标注的数据进行微调训练,得到最终检测结果。结果 实验结果表明,在输电线路金具数据集上,本文模型的各目标平均检测精确度(AP50)为88.6%,相比传统检测模型提高了10%左右。结论 本文改进主干网络的自注意力计算,并采用自监督学习,使模型高效提取特征,实现无标注数据的有效利用,构建的金具检测模型为解决输电线路金具检测的数据利用问题提供了新思路。  相似文献   

2.
针对视频图像连续帧间的目标具有冗余性,采用手动标注方式耗时耗力的问题,提出一种融合检测和跟踪算法的视频目标半自动标注框架。利用手动标注的样本离线训练改进YOLO v3模型,并将该检测模型作为在线标注的检测器。在线标注时在初始帧手动确定目标位置和标签,在后续帧根据检测框与跟踪框的IOU(Intersection-Over-Union)值自动确定目标的位置,并利用跟踪器的响应输出判断目标消失,从而自动停止当前目标标注。采用一种基于目标显著性的关键帧提取算法选择关键帧。采用自建舰船目标数据集进行了改进YOLO v3检测性能对比实验,并采用舰船视频序列验证了提出的视频目标半自动标注方法的有效性。实验结果表明,该方法可以显著提高标注效率,能够快速生成标注数据,适用于海上舰船等场景的视频目标标注任务。  相似文献   

3.
目的 针对目标在跟踪过程中出现剧烈形变,特别是剧烈尺度变化的而导致跟踪失败情况,提出融合图像显著性与特征点匹配的目标跟踪算法。方法 首先利用改进的BRISK(binary robust invariant scalable keypoints)特征点检测算法,对视频序列中的初始帧提取特征点,确定跟踪算法中的目标模板和目标模板特征点集合;接着对当前帧进行特征点检测,并与目标模板特征点集合利用FLANN(fast approximate nearest neighbor search library)方法进行匹配得到匹配特征点子集;然后融合匹配特征点和光流特征点确定可靠特征点集;再后基于可靠特征点集和目标模板特征点集计算单应性变换矩阵粗确定目标跟踪框,继而基于LC(local contrast)图像显著性精确定目标跟踪框;最后融合图像显著性和可靠特征点自适应确定目标跟踪框。当连续三帧目标发生剧烈形变时,更新目标模板和目标模板特征点集。结果 为了验证算法性能,在OTB2013数据集中挑选出具有形变特性的8个视频序列,共2214帧图像作为实验数据集。在重合度实验中,本文算法能够达到0.567 1的平均重合度,优于当前先进的跟踪算法;在重合度成功率实验中,本文算法也比当前先进的跟踪算法具有更好的跟踪效果。最后利用Vega Prime仿真了无人机快速抵近飞行下目标出现剧烈形变的航拍视频序列,序列中目标的最大形变量超过14,帧间最大形变量达到1.72,实验表明本文算法在该视频序列上具有更好的跟踪效果。本文算法具有较好的实时性,平均帧率48.6帧/s。结论 本文算法能够实时准确的跟踪剧烈形变的目标,特别是剧烈尺度变化的目标。  相似文献   

4.
目的 目标在跟踪过程中,各种因素的干扰使得跟踪结果存在不确定性。因此,将跟踪过程中所提取样本的可靠性融入跟踪模型中,有助于克服低可靠性样本对跟踪算法的影响。为此,基于最近的结构化支持向量机(SSVM)跟踪算法,提出一种包含样本置信度的加权间隔结构化支持向量机跟踪模型(WMSSVM),以增强SSVM跟踪算法性能。方法 首先,基于打分和位置重合率估计样本可靠性;其次,建立WMSSVM模型处理具有不同置信度的跟踪样本训练问题,并采用对偶坐标下降优化算法求解跟踪模型。结果 在包含100个视频的OTB100跟踪数据集上进行测试,提出的WMSSVM跟踪器与基准跟踪器Scale-DLSSVM相比,在精准度和成功率两个指标上分别提高了1%和2%。与最近的跟踪算法相比,提出的方法也表现出更好的性能。结论 本文首次将样本的可靠性融入结构化支持向量机跟踪模型,并提出一种基于加权间隔的结构化支持向量机跟踪模型及其优化求解方法,在包含100个视频序列的跟踪数据集上验证了提出方法的有效性,本文提出的算法能够适应复杂场景下的跟踪任务,并在背景混杂、目标形变、遮挡、运动模糊、目标出界、快速位移等类别的视频中表现出优异的性能。  相似文献   

5.
目的 近年来,目标跟踪领域取得了很大进步,但是由于尺度变化,运动,形状畸变或者遮挡等造成的外观变化,仍然是目标跟踪中的一大挑战,因而有效的图像表达方法是提高目标跟踪鲁棒性的一个关键因素。方法 从中层视觉角度出发,首先对训练图像进行超像素分割,将得到特征向量集以及对应的置信值作为输入值,通过特征回归的方法建立目标跟踪中的判别外观模型,将跟踪图像的特征向量输入该模型,得到候选区域的置信值,从而高效地分离前景和背景,确定目标区域。结果 在公开数据集上进行跟踪实验。本文算法能较好地处理目标尺度变化、姿态变化、光照变化、形状畸变、遮挡等外观变化;和主流跟踪算法进行对比,本文算法在跟踪误差方面表现出色,在carScale、subway、tiger1视频中能取得最好结果,平均误差为12像素,3像素和21像素;和同类型的方法相比,本文算法在算法效率上表现出色,所有视频的跟踪效率均高于同类型算法,在carScale视频中的效率,是同类算法效率的32倍。结论 实验结果表明,本文目标跟踪算法具有高效性和鲁棒性,适用于目标发生外观变化时的目标跟踪问题。目前跟踪中只用了单一特征,未来考虑融合多特征来提升算法鲁棒性和准确度。  相似文献   

6.
目的 传统的半监督视频分割多是基于光流的方法建模关键帧与当前帧之间的特征关联。而光流法在使用过程中容易因遮挡、特殊纹理等情况产生错误,从而导致多帧融合存在问题。为了更好地融合多帧特征,本文提取第1帧的外观特征信息与邻近关键帧的位置信息,通过Transformer和改进的PAN(path aggregation network)模块进行特征融合,从而基于多帧时空注意力学习并融合多帧的特征。方法 多帧时空注意力引导的半监督视频分割方法由视频预处理(即外观特征提取网络和当前帧特征提取网络)以及基于Transformer和改进的PAN模块的特征融合两部分构成。具体包括以下步骤:构建一个外观信息特征提取网络,用于提取第1帧图像的外观信息;构建一个当前帧特征提取网络,通过Transformer模块对当前帧与第1帧的特征进行融合,使用第1帧的外观信息指导当前帧特征信息的提取;借助邻近数帧掩码图与当前帧特征图进行局部特征匹配,决策出与当前帧位置信息相关性较大的数帧作为邻近关键帧,用来指导当前帧位置信息的提取;借助改进的PAN特征聚合模块,将深层语义信息与浅层语义信息进行融合。结果 本文算法在DAVIS(densely annotated video segmentation)-2016数据集上的J和F得分为81.5%和80.9%,在DAVIS-2017数据集上为78.4%和77.9%,均优于对比方法。本文算法的运行速度为22帧/s,对比实验中排名第2,比PLM(pixel-level matching)算法低1.6%。在YouTube-VOS(video object segmentation)数据集上也取得了有竞争力的结果,JF的平均值达到了71.2%,领先于对比方法。结论 多帧时空注意力引导的半监督视频分割算法在对目标物体进行分割的同时,能有效融合全局与局部信息,减少细节信息丢失,在保持较高效率的同时能有效提高半监督视频分割的准确率。  相似文献   

7.
目的 基于水平集的轮廓提取方法被广泛用于运动物体的轮廓跟踪。针对传统方法易受局部遮挡、复杂背景等因素影响的问题,提出一种先验模型约束的抗干扰(AC-PMC)轮廓跟踪算法。方法 首先,选取图像序列的前5帧进行跟踪训练,将每帧图像基于颜色特征分割成若干超像素块,利用均值聚类组建簇集合,并通过该集合建立目标的先验模型。然后,利用水平集分割方法提取目标轮廓,并提出决策判定算法,判断是否需要引入形状先验模型加以约束,避免遮挡、复杂背景等影响。最后,提出一种在线模型更新算法,在特征集中加入适当特征补偿,使得更新的目标模型更为准确。结果 本文算法与多种优秀的轮廓跟踪算法相比,可以达到相同甚至更高的跟踪精度,在Fish、Face1、Face2、Shop、Train以及Lemming视频图像序列下的平均中心误差分别为3.46、7.16、3.82、13.42、14.72、12.47,算法的跟踪重叠率分别为0.92、0.74、0.85、0.77、0.73、0.82,算法的平均运行速度分别为4.27 帧/s、4.03 帧/s、3.11 帧/s、2.94 帧/s、2.16 帧/s、1.71 帧/s。结论 利用目标的先验模型约束以及提取轮廓过程中的决策判定,使本文算法在局部遮挡、目标形变、目标旋转、复杂背景等条件下具有跟踪准确、适应性强的特点。  相似文献   

8.
联合特征融合和判别性外观模型的多目标跟踪   总被引:1,自引:1,他引:0       下载免费PDF全文
目的 针对基于检测的目标跟踪问题,提出一种联合多特征融合和判别性外观模型的多目标跟踪算法。方法 对时间滑动窗内的检测器输出响应,采用双阈值法对相邻帧目标进行初级关联,形成可靠的跟踪片,从中提取训练样本;融合多个特征对样本进行鲁棒表达,利用Adaboost算法在线训练分类器,形成目标的判别性外观模型;再利用该模型对可靠的跟踪片进行多次迭代关联,形成目标完整的轨迹。结果 4个视频数据库的目标跟踪结果表明,本文算法能较好的处理目标间遮挡、目标自身形变,以及背景干扰。对TUD-Crossing数据库的跟踪结果进行了定量分析,本文算法的FAF(跟踪视频序列时,平均每帧被错误跟踪的目标数)为0.21、MT(在整个序列中,有超过80%视频帧被跟踪成功目标数占视频序列目标总数的比例)为84.6%、ML(在整个序列中,有低于20%视频帧被跟踪成功目标数占视频序列目标总数的比例)为7.7%、Frag(视频序列目标真值所对应轨迹在跟踪中断开的次数)为9、IDS(在跟踪中,目标身份的改变次数)为4; 与其他同类型多目标跟踪算法相比,本文算法在FAF和Frag两个评估参数上表现出色。结论 融合特征能对目标进行较为全面的表达、判别性外观模型能有效地应用于跟踪片关联,本文算法能实现复杂场景下的多目标跟踪,且可以应用到一些高级算法的预处理中,如行为识别中的轨迹检索。  相似文献   

9.
目的 针对现有视频目标分割(video object segmentation,VOS)算法不能自适应进行样本权重更新,以及使用过多的冗余特征信息导致不必要的空间与时间消耗等问题,提出一种自适应权重更新的轻量级视频目标分割算法。方法 首先,为建立一个具有较强目标判别性的算法模型,所提算法根据提取特征的表征质量,自适应地赋予特征相应的权重;其次,为了去除冗余信息,提高算法的运行速度,通过优化信息存储策略,构建了一个轻量级的记忆模块。结果 实验结果表明,在公开数据集DAVIS2016 (densely annotated video segmentation)和DAVIS2017上,本文算法的区域相似度与轮廓准确度的均值J&F分别达到了85.8%和78.3%,与对比的视频目标分割算法相比具有明显的优势。结论 通过合理且无冗余的历史帧信息利用方式,提升了算法对于目标建模的泛化能力,使目标掩码质量更高。  相似文献   

10.
目的 在复杂背景下,传统模型匹配的跟踪方法只考虑了目标自身特征,没有充分考虑与其所处图像的关系,尤其是目标发生遮挡时,易发生跟踪漂移,甚至丢失目标。针对上述问题,提出一种前景判别的局部模型匹配(FDLM)跟踪算法。方法 首先选取图像帧序列前m帧进行跟踪训练,将每帧图像分割成若干超像素块。然后,将所有的超像素块组建向量簇,利用判别外观模型建立包含超像素块的目标模型。最后,将建立的目标模型作为匹配模板,采用期望最大化(EM)估计图像的前景信息,通过前景判别进行局部模型匹配,确定跟踪目标。结果 本文算法在前景判别和模型匹配等方面能准确有效地适应视频场景中目标状态的复杂变化,较好地解决各种不确定因素干扰下的跟踪漂移问题,和一些优秀的跟踪算法相比,可以达到相同甚至更高的跟踪精度,在Girl、Lemming、Liquor、Shop、Woman、Bolt、CarDark、David以及Basketball视频序列下的平均中心误差分别为9.76、28.65、19.41、5.22、8.26、7.69、8.13、11.36、7.66,跟踪重叠率分别为0.69、0.61、0.77、0.74、0.80、0.79、0.79、0.75、0.69。结论 实验结果表明,本文算法能够自适应地实时更新噪声模型参数并较准确估计图像的前景信息,排除背景信息干扰,在部分遮挡、目标形变、光照变化、复杂背景等条件下具有跟踪准确、适应性强的特点。  相似文献   

11.
目的 随着深度神经网络的出现,视觉跟踪快速发展,视觉跟踪任务中的视频时空特性,尤其是时序外观一致性(temporal appearance consistency)具有巨大探索空间。本文提出一种新颖简单实用的跟踪算法——时间感知网络(temporal-aware network, TAN),从视频角度出发,对序列的时间特征和空间特征同时编码。方法 TAN内部嵌入了一个新的时间聚合模块(temporal aggregation module, TAM)用来交换和融合多个历史帧的信息,无需任何模型更新策略也能适应目标的外观变化,如形变、旋转等。为了构建简单实用的跟踪算法框架,设计了一种目标估计策略,通过检测目标的4个角点,由对角构成两组候选框,结合目标框选择策略确定最终目标位置,能够有效应对遮挡等困难。通过离线训练,在没有任何模型更新的情况下,本文提出的跟踪器TAN通过完全前向推理(fully feed-forward)实现跟踪。结果 在OTB(online object tracking:a benchmark)50、OTB100、TrackingNet、LaSOT(a high-qua...  相似文献   

12.
Large scale labeled datasets are of key importance for the development of automatic video analysis tools as they, from one hand, allow multi-class classifiers training and, from the other hand, support the algorithms’ evaluation phase. This is widely recognized by the multimedia and computer vision communities, as witnessed by the growing number of available datasets; however, the research still lacks in annotation tools able to meet user needs, since a lot of human concentration is necessary to generate high quality ground truth data. Nevertheless, it is not feasible to collect large video ground truths, covering as much scenarios and object categories as possible, by exploiting only the effort of isolated research groups. In this paper we present a collaborative web-based platform for video ground truth annotation. It features an easy and intuitive user interface that allows plain video annotation and instant sharing/integration of the generated ground truths, in order to not only alleviate a large part of the effort and time needed, but also to increase the quality of the generated annotations. The tool has been on-line in the last four months and, at the current date, we have collected about 70,000 annotations. A comparative performance evaluation has also shown that our system outperforms existing state of the art methods in terms of annotation time, annotation quality and system’s usability.  相似文献   

13.
We present an extensive three year study on economically annotating video with crowdsourced marketplaces. Our public framework has annotated thousands of real world videos, including massive data sets unprecedented for their size, complexity, and cost. To accomplish this, we designed a state-of-the-art video annotation user interface and demonstrate that, despite common intuition, many contemporary interfaces are sub-optimal. We present several user studies that evaluate different aspects of our system and demonstrate that minimizing the cognitive load of the user is crucial when designing an annotation platform. We then deploy this interface on Amazon Mechanical Turk and discover expert and talented workers who are capable of annotating difficult videos with dense and closely cropped labels. We argue that video annotation requires specialized skill; most workers are poor annotators, mandating robust quality control protocols. We show that traditional crowdsourced micro-tasks are not suitable for video annotation and instead demonstrate that deploying time-consuming macro-tasks on MTurk is effective. Finally, we show that by extracting pixel-based features from manually labeled key frames, we are able to leverage more sophisticated interpolation strategies to maximize performance given a fixed budget. We validate the power of our framework on difficult, real-world data sets and we demonstrate an inherent trade-off between the mix of human and cloud computing used vs. the accuracy and cost of the labeling. We further introduce a novel, cost-based evaluation criteria that compares vision algorithms by the budget required to achieve an acceptable performance. We hope our findings will spur innovation in the creation of massive labeled video data sets and enable novel data-driven computer vision applications.  相似文献   

14.
基于Transformer的视觉目标跟踪算法能够很好地捕获目标的全局信息,但是,在对目标特征的表述上还有进一步提升的空间.为了更好地提升对目标特征的表达能力,提出一种基于混合注意力的Transformer视觉目标跟踪算法.首先,引入混合注意力模块捕捉目标在空间和通道维度中的特征,实现对目标特征上下文依赖关系的建模;然后,通过多个不同空洞率的平行空洞卷积对特征图进行采样,以获得图像的多尺度特征,增强局部特征表达能力;最后,在Transformer编码器中加入所构建的卷积位置编码层,为跟踪器提供精确且长度自适应的位置编码,提升跟踪定位的精度.在OTB100、VOT2018和LaSOT等数据集上进行大量实验,实验结果表明,通过基于混合注意力的Transformer网络学习特征间的关系,能够更好地表示目标特征.与其他主流目标跟踪算法相比,所提出算法具有更好的跟踪性能,且能够达到26帧/s的实时跟踪速度.  相似文献   

15.
RGBD images with high quality annotations, both in the form of geometric (i.e., segmentation) and structural (i.e., how do the segments mutually relate in 3D) information, provide valuable priors for a diverse range of applications in scene understanding and image manipulation. While it is now simple to acquire RGBD images, annotating them, automatically or manually, remains challenging. We present Smart Annotator , an interactive system to facilitate annotating raw RGBD images. The system performs the tedious tasks of grouping pixels, creating potential abstracted cuboids, inferring object interactions in 3D, and generates an ordered list of hypotheses. The user simply has to flip through the suggestions for segment labels, finalize a selection, and the system updates the remaining hypotheses. As annotations are finalized, the process becomes simpler with fewer ambiguities to resolve. Moreover, as more scenes are annotated, the system makes better suggestions based on the structural and geometric priors learned from previous annotation sessions. We test the system on a large number of indoor scenes across different users and experimental settings, validate the results on existing benchmark datasets, and report significant improvements over low‐level annotation alternatives. (Code and benchmark datasets are publicly available on the project page.)  相似文献   

16.
Modern data-driven spoken language systems (SLS) require manual semantic annotation for training spoken language understanding parsers. Multilingual porting of SLS demands significant manual effort and language resources, as this manual annotation has to be replicated. Crowdsourcing is an accessible and cost-effective alternative to traditional methods of collecting and annotating data. The application of crowdsourcing to simple tasks has been well investigated. However, complex tasks, like cross-language semantic annotation transfer, may generate low judgment agreement and/or poor performance. The most serious issue in cross-language porting is the absence of reference annotations in the target language; thus, crowd quality control and the evaluation of the collected annotations is difficult. In this paper we investigate targeted crowdsourcing for semantic annotation transfer that delegates to crowds a complex task such as segmenting and labeling of concepts taken from a domain ontology; and evaluation using source language annotation. To test the applicability and effectiveness of the crowdsourced annotation transfer we have considered the case of close and distant language pairs: Italian–Spanish and Italian–Greek. The corpora annotated via crowdsourcing are evaluated against source and target language expert annotations. We demonstrate that the two evaluation references (source and target) highly correlate with each other; thus, drastically reduce the need for the target language reference annotations.  相似文献   

17.
Image annotation datasets are becoming larger and larger, with tens of millions of images and tens of thousands of possible annotations. We propose a strongly performing method that scales to such datasets by simultaneously learning to optimize precision at k of the ranked list of annotations for a given image and learning a low-dimensional joint embedding space for both images and annotations. Our method both outperforms several baseline methods and, in comparison to them, is faster and consumes less memory. We also demonstrate how our method learns an interpretable model, where annotations with alternate spellings or even languages are close in the embedding space. Hence, even when our model does not predict the exact annotation given by a human labeler, it often predicts similar annotations, a fact that we try to quantify by measuring the newly introduced “sibling” precision metric, where our method also obtains excellent results.  相似文献   

18.
To support effective multimedia information retrieval, video annotation has become an important topic in video content analysis. Existing video annotation methods put the focus on either the analysis of low-level features or simple semantic concepts, and they cannot reduce the gap between low-level features and high-level concepts. In this paper, we propose an innovative method for semantic video annotation through integrated mining of visual features, speech features, and frequent semantic patterns existing in the video. The proposed method mainly consists of two main phases: 1) Construction of four kinds of predictive annotation models, namely speech-association, visual-association, visual-sequential, and statistical models from annotated videos. 2) Fusion of these models for annotating un-annotated videos automatically. The main advantage of the proposed method lies in that all visual features, speech features, and semantic patterns are considered simultaneously. Moreover, the utilization of high-level rules can effectively complement the insufficiency of statistics-based methods in dealing with complex and broad keyword identification in video annotation. Through empirical evaluation on NIST TRECVID video datasets, the proposed approach is shown to enhance the performance of annotation substantially in terms of precision, recall, and F-measure.  相似文献   

19.
社会化标签提供了网页信息的额外描述,直观上对搜索具有重要价值。该文提出一种新颖的利用社会化标签的分类属性进行检索的方法。该方法通过将群体的标注信息建模为高层类别来估计话题模型,然后基于该话题模型来对语言模型进行平滑。建模方法可以降低标注稀疏性的影响,有效地表达标签含义,从而提升检索效果。基于TREC评测构建的数据集上的实验结果表明,该方法优于基于LDA的检索方法以及现有其他基于标签数据的检索方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号