首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
2.
比较同一图像不同增强的相似性是对比学习取得显著成果的关键。传统对比学习方法使用了图像的两个不同视图,为了学习到图像更多的信息以提高分类准确率,在MoCo(momentum contrast for unsupervised visual representation learning)的基础上,提出了一种多视图动量对比学习算法。每次迭代中,对于图像的多个数据增强分别使用一个查询编码器和多个动量编码器进行特征提取,使得本次迭代可以使用更多的数据增强和负样本。使用优化的噪声对比估计(InfoNCE)来计算损失,使得查询编码器能得到更有益于下游任务的特征表示。对查询编码器使用梯度回传更新网络,对各动量编码器使用改进的动量更新公式以提高模型的泛化能力。实验结果表明,使用多视图动量对比学习可以有效提高模型的分类准确率。  相似文献   

3.
郭玉慧  梁循 《计算机学报》2022,45(1):98-114
如何识别同一物体的不同结构的表现形式,对于机器而言,是一个比较困难的识别工作.本文以易变形的纸币为例,提出了一种基于异构特征聚合的局部视图扭曲型纸币识别方法.首先利用灰度梯度共生矩阵、Haishoku算法和圆形LBP分别获得纹理风格、色谱风格和纹理,这些特征从不同的角度描述了局部纸币图像,然后通过VGG-16、ResN...  相似文献   

4.
Feature grouping and local soft match for mobile visual search   总被引:1,自引:0,他引:1  
More powerful mobile devices stimulate mobile visual search to become a popular and unique image retrieval application. A number of challenges come up with such application, resulting from appearance variations in mobile images. Performance of state-of-the-art image retrieval systems is improved using bag-of-words approaches. However, for visual search by mobile images with large variations, there are at least two critical issues unsolved: (1) the loss of features discriminative power due to quantization; and (2) the underuse of spatial relationships among visual words. To address both issues, this paper presents a novel visual search method based on feature grouping and local soft match, which considers properties of mobile images and couples visual and spatial information consistently. First features of the query image are grouped using both matched visual features and their spatial relationships; and then grouped features are softly matched to alleviate quantization loss. An efficient score scheme is devised to utilize inverted file index and compared with vocabulary-guided pyramid kernels. Finally experiments on Stanford mobile visual search database and a collected database with more than one million images show that the proposed method achieves promising improvement over the approach with a vocabulary tree, especially when large variations exist in query images.  相似文献   

5.
由于多视点立体视频合成具有数据量大,图像处理速度要求较高,支持的立体视角有限等特点,这些问题一直没有很好的解决并已成为多视点立体视频产业化的瓶颈。针对这种情况,提出了一种基于立体图像融合算法与人眼跟踪算法的立体视频处理系统。首先,按顺序循环读取立体视频中的每一帧,然后用立体图像融合算法对每一帧进行合成运算,接下来将融合后的图像依原有顺序进行显示与播放。同时加入人眼跟踪算法,根据观看者眼部所处的位置实时投放对应视区的图像。图像融合算法与人眼跟踪的结合有效地扩大了立体视角。实验结果表明,该方法实现了将多视点视频在立体显示器中以自由立体显示的方式展现出来,使观看者在屏幕前可以自由移动而不影响立体观看效果,同时播放速度流畅,能给观众带来比较真实的立体感受。  相似文献   

6.
In this paper, we propose a multimodal query suggestion method for video search which can leverage multimodal processing to improve the quality of search results. When users type general or ambiguous textual queries, our system MQSS provides keyword suggestions and representative image examples in an easy-to-use dropdown manner which can help users specify their search intent more precisely and effortlessly. It is a powerful complement to initial queries. After the queries are formulated as multimodal query (i.e., text, image), the new queries are input to individual search models, such as text-based, concept-based and visual example-based search model. Then we apply multimodal fusion method to aggregate the above-mentioned several search results. The effectiveness of MQSS is demonstrated by evaluations over a web video data set.  相似文献   

7.
分组聚集查询已成为数据仓库领域研究的核心问题之一,实视图是提高分组聚集查询性能的有效手段。利用维属性间的层次关系,对一般意义上的实视图重写查询进行了扩展,讨论了单一视图重写查询的限制条件,并给出重写方法,在此基础上,提出了一种利用多个实视图重写查询的优化选择算法,并通过实验表明,该算法进一步提高了分组聚集查询效率。  相似文献   

8.
9.
莫宏伟  田朋 《控制与决策》2021,36(12):2881-2890
视觉场景理解包括检测和识别物体、推理被检测物体之间的视觉关系以及使用语句描述图像区域.为了实现对场景图像更全面、更准确的理解,将物体检测、视觉关系检测和图像描述视为场景理解中3种不同语义层次的视觉任务,提出一种基于多层语义特征的图像理解模型,并将这3种不同语义层进行相互连接以共同解决场景理解任务.该模型通过一个信息传递图将物体、关系短语和图像描述的语义特征同时进行迭代和更新,更新后的语义特征被用于分类物体和视觉关系、生成场景图和描述,并引入融合注意力机制以提升描述的准确性.在视觉基因组和COCO数据集上的实验结果表明,所提出的方法在场景图生成和图像描述任务上拥有比现有方法更好的性能.  相似文献   

10.
视觉归一化是多视点图像拼接领域的一个关键技术,在对大量图像处理算法研究的基础上,提出了一种针对多摄像机图像拼接的视觉归一化处理方法。该方法主要包括图像颜色校正和图像边缘融合两个模块;在图像颜色校正模块中,引入了图像区域划分策略和自适应颜色调节因子,使不同的像素点都有不同的颜色调节因子,并充分利用相邻图像间的颜色关联性对目标图像的颜色进行自适应校正;在图像边缘融合模块中,利用反映射矩阵计算出拼接图像的重叠区域,利用自适应边缘融合因子对重叠区域进行边缘融合处理。实验结果表明,该方法能够较好地减少甚至消除拼接图像间的视觉差异,较好地改善了图像拼接的视觉效果。  相似文献   

11.
Finding an object inside a target image by querying multimedia data is desirable, but remains a challenge. The effectiveness of region-based representation for content-based image retrieval is extensively studied in the literature. One common weakness of region-based approaches is that perform detection using low level visual features within the region and the homogeneous image regions have little correspondence to the semantic objects. Thus, the retrieval results are often far from satisfactory. In addition, the performance is significantly affected by consistency in the segmented regions of the target object from the query and database images. Instead of solving these problems independently, this paper proposes region-based object retrieval using the generalized Hough transform (GHT) and adaptive image segmentation. The proposed approach has two phases. First, a learning phase identifies and stores stable parameters for segmenting each database image. In the retrieval phase, the adaptive image segmentation process is also performed to segment a query image into regions for retrieving visual objects inside database images through the GHT with a modified voting scheme to locate the target visual object under a certain affine transformation. The learned parameters make the segmentation results of query and database images more stable and consistent. Computer simulation results show that the proposed method gives good performance in terms of retrieval accuracy, robustness, and execution speed.  相似文献   

12.
In this paper, we propose multi-view object detection methodology by using specific extended class of haar-like filters, which apparently detects the object with high accuracy in the unconstraint environments. There are several object detection techniques, which work well in restricted environments, where illumination is constant and the view angle of the object is restricted. The proposed object detection methodology successfully detects faces, cars, logo objects at any size and pose with high accuracy in real world conditions. To cope with angle variation, we propose a multiple trained cascades by using the proposed filters, which performs even better detection by spanning a different range of orientation in each cascade. We tested the proposed approach by still images by using image databases and conducted some evaluations by using video images from an IP camera placed in outdoor. We tested the method for detecting face, logo, and vehicle in different environments. The experimental results show that the proposed method yields higher classification performance than Viola and Jones’s detector, which uses a single feature for each weak classifier. Given the less number of features, our detector detects any face, object, or vehicle in 15 fps when using 4 megapixel images with 95% accuracy on an Intel i7 2.8 GHz machine.  相似文献   

13.
State-of-the-art visual search systems allow to retrieve efficiently small rigid objects in very large datasets. They are usually based on the query-by-window paradigm: a user selects any image region containing an object of interest and the system returns a ranked list of images that are likely to contain other instances of the query object. User’s perception of these tools is however affected by the fact that many submitted queries actually return nothing or only junk results (complex non-rigid objects, higher-level visual concepts, etc.). In this paper, we address the problem of suggesting only the object’s queries that actually contain relevant matches in the dataset. This requires to first discover accurate object’s clusters in the dataset (as an offline process); and then to select the most relevant objects according to user’s intent (as an on-line process). We therefore introduce a new object’s instances clustering framework based on a major contribution: a bipartite shared-neighbours clustering algorithm that is used to gather object’s seeds discovered by matching adaptive and weighted sampling. Shared nearest neighbours methods were not studied beforehand in the case of bipartite graphs and never used in the context of object discovery. Experiments show that this new method outperforms state-of-the-art object mining and retrieval results on the Oxford Building dataset. We finally describe two object-based visual query suggestion scenarios using the proposed framework and show examples of suggested object queries.  相似文献   

14.
目的 传统视觉场景识别(visual place recognition,VPR)算法的性能依赖光学图像的成像质量,因此高速和高动态范围场景导致的图像质量下降会进一步影响视觉场景识别算法的性能。针对此问题,提出一种融合事件相机的视觉场景识别算法,利用事件相机的低延时和高动态范围的特性,提升视觉场景识别算法在高速和高动态范围等极端场景下的识别性能。方法 本文提出的方法首先使用图像特征提取模块提取质量良好的参考图像的特征,然后使用多模态特征融合模块提取查询图像及其曝光区间事件信息的多模态融合特征,最后通过特征匹配查找与查询图像最相似的参考图像。结果 在MVSEC(multi-vehicle stereo event camera dataset)和RobotCar两个数据集上的实验表明,本文方法对比现有视觉场景识别算法在高速和高动态范围场景下具有明显优势。在高速高动态范围场景下,本文方法在MVSEC数据集上相较对比算法最优值在召回率与精度上分别提升5.39%和8.55%,在Robot‐Car数据集上相较对比算法最优值在召回率与精度上分别提升3.36%与4.41%。结论 本文提出了融合事件相机的视觉场景识别算法,利用了事件相机在高速和高动态范围场景的成像优势,有效提升了视觉场景识别算法在高速和高动态范围场景下的场景识别性能。  相似文献   

15.
卫星  李佳  孙晓  刘邵凡  陆阳 《自动化学报》2021,47(11):2623-2636
多视角图像生成即基于某个视角图像生成其他多个视角图像, 是多视角展示和虚拟现实目标建模等领域的基本问题, 已引起研究人员的广泛关注. 近年来, 生成对抗网络(Generative adversarial network, GAN)在多视角图像生成任务上取得了不错的成绩, 但目前的主流方法局限于固定领域, 很难迁移至其他场景, 且生成的图像存在模糊、失真等弊病. 为此本文提出了一种基于混合对抗生成网络的多视角图像生成模型ViewGAN, 它包括多个生成器和一个多类别判别器, 可灵活迁移至多视角生成的多个场景. 在ViewGAN中, 多个生成器被同时训练, 旨在生成不同视角的图像. 此外, 本文提出了一种基于蒙特卡洛搜索的惩罚机制来促使每个生成器生成高质量的图像, 使得每个生成器更专注于指定视角图像的生成. 在DeepFashion, Dayton, ICG Lab6数据集上的大量实验证明: 我们的模型在Inception score和Top-k accuracy上的性能优于目前的主流模型, 并且在结构相似性(Structural similarity, SSIM)上的分数提升了32.29%, 峰值信噪比(Peak signal-to-noise ratio, PSNR)分数提升了14.32%, SD (Sharpness difference)分数提升了10.18%.  相似文献   

16.
提出了一种基于视觉知识加工模型的目标识别方法. 该加工模型结合目标定位、模板筛选和MFF-HMAX (Hierarchical model and X based on multi-feature fusion)方法对图像进行学习, 形成相应的视觉知识库, 并用于指导目标的识别. 首先, 利用Itti模型获取图像的显著区, 结合视觉通路中What和Where通道的位置、大小等特征以及视觉知识库中的定位知识确定初期候选目标区域; 然后, 采用二步去噪处理获取候选目标区域, 利用MFF-HMAX模型提取目标区域的颜色、亮度、纹理、轮廓、大小等知识特征, 并采用特征融合思想将各项特征融合供目标识别; 最后, 与单一特征以及目前的流行方法进行对比实验, 结果表明本文方法不仅具备较高的识别效果, 同时能够模仿人脑学习视觉知识的过程形成视觉知识库.  相似文献   

17.
针对大学课程时间表问题,提出一种基于改进迭代局部搜索的并行多视图搜索算法进行求解。依据课程时间表问题特性设计包含八种基础邻域的多邻域集,并根据提升速度比制定基邻域选择概率设置规则。在迭代局部搜索过程中,运用多视图学习策略对多个局部搜索步骤进行视图共享,及时调整搜索方向以提升搜索效率。通过并行计算思想对算法优化,提升多视图搜索的收敛速度。实验结果表明,提出的算法求解精度更佳,且具有优异的扩展性和并行效率。  相似文献   

18.
System performance assessment and comparison are fundamental for large-scale image search engine development. This article documents a set of comprehensive empirical studies to explore the effects of multiple query evidences on large-scale social image search. The search performance based on the social tags, different kinds of visual features and their combinations are systematically studied and analyzed. To quantify the visual query complexity, a novel quantitative metric is proposed and applied to assess the influences of different visual queries based on their complexity levels. Besides, we also study the effects of automatic text query expansion with social tags using a pseudo relevance feedback method on the retrieval performance. Our analysis of experimental results shows a few key research findings: (1) social tag-based retrieval methods can achieve much better results than content-based retrieval methods; (2) a combination of textual and visual features can significantly and consistently improve the search performance; (3) the complexity of image queries has a strong correlation with retrieval results’ quality—more complex queries lead to poorer search effectiveness; and (4) query expansion based on social tags frequently causes search topic drift and consequently leads to performance degradation.  相似文献   

19.
3D face reconstruction is an efficient method for pedestrian recognition in non-cooperative environment because of its outstanding performance in robust face recognition for uncontrolled pose and illumination changes. Visual sensor network is widely used in target surveillance as powerful unattended distributed measurement systems. This paper proposes a collaborative multi-view non-cooperative 3D face reconstruction method in visual sensor network. A peer-to-peer paradigm-based visual sensor network is employed for distributed pedestrian tracking and optimal face image acquisition. Gaussian probability distribution-based multi-view data fusion is used for target localization, and kalman filter is applied for target tracking. A lightweight face image quality evaluation method is presented to search optimal face images. A self-adaptive morphable model is designed for multiview 3D face reconstruction. To adjust the self-adaptive morphable model, the optimal face images and their poses estimation are used. Cooperative chaotic particle swarm optimization is employed for parameters optimization of the self-adaptive morphable model. Experimental results on real data show that the proposed method can acquire optimal face images and achieve non-cooperative 3D reconstruction efficiently.  相似文献   

20.
刘冬  秦瑞  陈曦  李庆 《计算机科学》2017,44(4):302-305
通过单应矩阵生成的鸟瞰视角全景图像存在严重的信息损失和边缘模糊形变。为三维空间点加入约束,将其限制在二维空间中,可以实现一幅图像从一个视角唯一变换另一视角。假设3D空间点来自于一个“包裹”车辆的曲面,使得原始摄像头图像上每一点对应的投影线都和曲面有交点,摄像头从不同角度观察曲面和车辆,便可生成多视角全景图像。针对相交区域也进行了特殊处理。最后的实验表明,生成的多视角的全景图像既可以充分利用原始图像信息从不同视角观测车辆周边场景,同时也减少了边缘模糊和扭曲,相交区域的过渡也显得更加自然、平滑。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号