共查询到20条相似文献,搜索用时 15 毫秒
1.
Guangyu Zhu Yefeng Zheng Doermann D. Jaeger S. 《IEEE transactions on pattern analysis and machine intelligence》2009,31(11):2015-2031
As one of the most pervasive methods of individual identification and document authentication, signatures present convincing evidence and provide an important form of indexing for effective document image processing and retrieval in a broad range of applications. However, detection and segmentation of free-form objects such as signatures from clustered background is currently an open document analysis problem. In this paper, we focus on two fundamental problems in signature-based document image retrieval. First, we propose a novel multiscale approach to jointly detecting and segmenting signatures from document images. Rather than focusing on local features that typically have large variations, our approach captures the structural saliency using a signature production model and computes the dynamic curvature of 2D contour fragments over multiple scales. This detection framework is general and computationally tractable. Second, we treat the problem of signature retrieval in the unconstrained setting of translation, scale, and rotation invariant nonrigid shape matching. We propose two novel measures of shape dissimilarity based on anisotropic scaling and registration residual error and present a supervised learning framework for combining complementary shape information from different dissimilarity metrics using LDA. We quantitatively study state-of-the-art shape representations, shape matching algorithms, measures of dissimilarity, and the use of multiple instances as query in document image retrieval. We further demonstrate our matching techniques in offline signature verification. Extensive experiments using large real-world collections of English and Arabic machine-printed and handwritten documents demonstrate the excellent performance of our approaches. 相似文献
2.
3.
针对以维吾尔语书写的文档间的相似性计算及剽窃检测问题,提出了一种基于内容的维吾尔语剽窃检测(U-PD)方法。首先,通过预处理阶段对维吾尔语文本进行分词、删除停止词、提取词干和同义词替换,其中提取词干是基于N-gram 统计模型实现。然后,通过BKDRhash算法计算每个文本块的hash值并构建整个文档的hash指纹信息。最后,根据hash指纹信息,基于RKR-GST匹配算法在文档级、段落级和句子级将文档与文档库进行匹配,获得文档相似度,以此实现剽窃检测。通过在维吾尔语文档中的实验评估表明,提出的方法能够准确检测出剽窃文档,具有可行性和有效性。 相似文献
4.
When streaming packetized media data over a lossy packet network, it is desirable to use transmission strategies that minimize the expected distortion subject to a constraint on the expected transmission rate. Because the computation of such optimal strategies is usually an intractable problem, fast heuristic techniques are often used. We first show that when the graph that gives the decoding dependencies between the data packets is reducible to a tree, optimal transmission strategies can be efficiently computed with dynamic programming algorithms. The proposed algorithms are much faster than other exact algorithms developed for arbitrary dependency graphs. They are slower than previous heuristic techniques but can provide much better solutions. We also show how to apply our algorithms to find high-quality approximate solutions when the dependency graph is not tree reducible. To validate our approach, we run simulations for MPEG1 and H.264 video data. We first consider a simulated packet erasure channel. Then we implement a real video streaming system and provide experimental results for an Internet connection. 相似文献
5.
为实现局部文档集抄袭的识别,将基于回退数与前跳数的广义编辑距离的近似值定义为文档抄袭距离,分析该文档抄袭距离满足三角不等式成立和弱三角不等式成立时的充分条件,提出一种快速全文识别算法,能识别出文档集内涉嫌抄袭的所有文档有序对。实验结果表明,相比其他算法,该算法在兼顾识别召回率的同时效率提高了3倍~5倍。 相似文献
6.
临床数据共享平台是我国医疗信息化发展的重要方向,在当今医疗数据呈几何级增长的环境下,多中心海量的临床数据如何管理、共享、并有效地查询和检索是一项重要的课题.该系统使用HL7 CDA XML作为描述电子病历的标准,采用关系型-XML混合数据库提供索引和XQuery查询工具.同时为了提高查询效率和并发性能,使用了BerkeleyDB作为Key-Value存储的数据层,并架设了Memcached作为查询数据的缓存层,增强了整体系统的可用性,最终形成了一个标准、通用、高效的临床数据共享平台. 相似文献
7.
针对互联网音频大数据的高速检索问题,结合音频指纹技术与过滤-提纯思想,提出一种面向音频大数据的鲁棒高效检索方法.在经典的Philips音频指纹基础上,提出了一种基于bag-of-features(BoF)的音频中间过滤指纹用于快速缩小检索范围,与Fibonacci Hashing检索相比提高检索速度约130倍;并设计了一种基于阈值的固定间隔抽样匹配方法,大幅减少匹配计算量,进一步提高检索速度可达140倍.实验结果显示:使用该方法在约10万首音频中对不同时长的音频片段进行批量检索,平均检索时间均小于1s;对音频进行MP3转换、重采样、随机剪切后再检索,召回率均在99.47%以上,理论准确率接近100%. 相似文献
8.
为了解决空间技术不断发展造成的大量空间数据难以及时处理的缺陷,提出用于数据检索的数据结构模式快速匹配方法。将网络划分多个不同的区域,每个区域分配一个域首,负责采集区域的信息,将采集的所有信息发送至中心管理系统,获取不同层次的匹配内容。通过匹配内容,获取各个匹配方式的相似度函数,通过相似度计算结果完成数据结构模式快速匹配。实验结果表明,所提方法能够有效减少通信降低,提高匹配速度,增加匹配精度,达到理想的匹配效果。 相似文献
9.
工程数据管理系统中的工程图档检索 总被引:3,自引:0,他引:3
工程数据管理系统 (EDMS) ,已在越来越多的现代企业中被推广应用。它使企业中的传统纸质工程图档及相关信息被数字化了的电子图档所替代。真正实现了企业内部的工程技术图档无纸化计算机管理。而工程图档的电子检索在整个EDMS中是一项十分重要的功能。它使被授权用户能通过网络从系统的共享数据库中快速、方便、灵活、安全地获得有用的信息。 相似文献
10.
面向XML文档的概念检索技术 总被引:11,自引:1,他引:11
面向XML文档的信息检索是一个重要的研究课题,文中介绍了结构化文档的结构索引以及语义检索中的“上下文共现分析”技术,并在此基础上提出了一个面向XML文档的概念检索原型系统,并对系统设计及实现中应注意考虑的几个主要问题进行了分析。 相似文献
11.
In this paper, we present a novel approach to image indexing by incorporating a neural network model, Kohonen’s Self-Organising
Map (SOM), for content-based image retrieval. The motivation stems from the idea of finding images by regarding users’ specifications
or requirements imposed on the query, which has been ignored in most existing image retrieval systems. An important and unique
aspect of our interactive scheme is to allow the user to select a Region-Of-Interest (ROI) from the sample image, and subsequent
query concentrates on matching the regional colour features to find images containing similar regions as indicated by the
user. The SOM algorithm is capable of adaptively partitioning each image into several homogeneous regions for representing
and indexing the image. This is achieved by unsupervised clustering and classification of pixel-level features, called Local
Neighbourhood Histograms (LNH), without a priori knowledge about the data distribution in the feature space. The indexes generated from the resultant prototypes of SOM learning
demonstrate fairly good performance over an experimental image database, and therefore suggest the effectiveness and significant
potential of our proposed indexing and retrieval strategy for application to content-based image retrieval.
Receiveed: 4 June 1998?,Received in revised form: 7 January 1999?Accepted: 7 January 1999 相似文献
12.
随着网格从科学计算转到企业级应用,要求数据库提供多种服务支持以实现更强更丰富的资源共享和应用。网格上的数据库只能通过网格服务进行访问,而数据库中的数据也只能通过网格服务接口来存取。因此如何在网格环境下直接对分布在各地的数据库进行高效的检索就是迫切要解决的问题。本文首先提出了一个网格环境下数据检索的体系结构,然后针对该结构下的数值型数据的Top-k查询问题给出了GrangM算法,它有效解决了来自不同数据源查询结果的合并问题。对该算法的模拟实现表明,它可以快速、高效地合并网格中多结点检索出的结果,减少连接中间结果的大小,降低发送查询请求的通信量。 相似文献
13.
14.
Lucene应用中Pdf文档文本数据提取方法研究 总被引:1,自引:0,他引:1
基于Lucene的搜索已在各种应用系统中已经得到广泛应用,但是Lucene仅仅提供了全文文本搜索的函数库。本文研究了Pdf文档文本数据的提取方法,其优点在于能快速对Pdf文档文本数据进行提取,得到站内Pdf文档文本数据。 相似文献
15.
Many content-based multimedia data retrieval problems can be transformed into the near neighbor searching problem in multidimensional feature space. An efficient near neighbor searching algorithm is needed when developing a multimedia database system. In this paper, we propose an approach to efficiently solve the near neighbor searching problem. In this approach, along each dimension an index is constructed according to the values of feature points of multimedia objects. A user can pose a content-based query by specifying a multimedia query example and a similarity measure. The specified query example will be transformed into a query point in the multi-dimensional feature space. The possible result points in each dimension are then retrieved by searching the value of the query point in the corresponding dimension. The sets of the possible result points are merged one by one by removing the points which are not within the query radius. The resultant points and their distances from the query point form the answer of the query. To show the efficiency of our approach, a series of experiments are performed to compare with the related approaches. 相似文献
16.
介绍了在PowerBuilder8.0下递归调用处理对类似于金字塔型的多层数据结构的方法,对主要的函数和语句给予解释和说明.该方法已成功应用于飞机电网路分析计算软件的前台人机界面程序中. 相似文献
17.
大数据时代多源、异构、海量的数据正逐渐成为各种应用的主流.多源异构不可避免地会使数据出现重复,同时庞大的数据量对重复检测的效率提出了极高的要求,传统技术在大数据环境下并不能很好地对高维数据进行重复检测,就此问题展开研究,分析了传统SNM类方法的不足,将重复问题概化为一类特殊的聚类问题,利用R-树建立了高效的索引,利用聚类簇的特性减少了在R-树叶子中比较的次数,利用重复检测的Apriori性质实现了对高维数据集并行处理.实验结果表明,提出的算法能有效地提高高维数据的重复检测效率. 相似文献
18.
《软件》2019,(11):4-8
为了检测出数据库实验课程中结构化查询语言(StructuredQueryLanguage,SQL)语句代码的抄袭行为,发现因SQL语句篇幅较短,使用现有的技术进行代码检测却没有得到预期的结果,于是提出了一种基于编码习惯的SQL语句抄袭检测算法。获取学生历史的编码数据并分类,判定待检测代码的类别并将其与之类别相同的代码依照学生的编码习惯进行特征提取,进而得到特征矩阵并对比代码之间的相似程度,对涉嫌抄袭的代码进行过滤,判断该代码是否为该学生编写。实验结果表明,该算法能够有效地判断出学生的抄袭行为,同时也解决因编码篇幅较短而带来的难以检测是否为抄袭代码这一问题。 相似文献
19.
一种基于SOM和K-means的文档聚类算法 总被引:9,自引:0,他引:9
提出了一种把自组织特征映射SOM和K-means算法结合的聚类组合算法。先用SOM对文档聚类,然后以SOM的输出权值初始化K-means的聚类中心,再用K—means算法对文档聚类。实验结果表明,该聚类组合算法能改进文档聚类的性能。 相似文献
20.
Efficient Phrase-Based Document Similarity for Clustering 总被引:1,自引:0,他引:1
In this paper, we propose a phrase-based document similarity to compute the pair-wise similarities of documents based on the Suffix Tree Document (STD) model. By mapping each node in the suffix tree of STD model into a unique feature term in the Vector Space Document (VSD) model, the phrase-based document similarity naturally inherits the term tf-idf weighting scheme in computing the document similarity with phrases. We apply the phrase-based document similarity to the group-average Hierarchical Agglomerative Clustering (HAC) algorithm and develop a new document clustering approach. Our evaluation experiments indicate that, the new clustering approach is very effective on clustering the documents of two standard document benchmark corpora OHSUMED and RCV1. The quality of the clustering results significantly surpass the results of traditional single-word textit{tf-idf} similarity measure in the same HAC algorithm, especially in large document data sets. Furthermore, by studying the property of STD model, we conclude that the feature vector of phrase terms in the STD model can be considered as an expanded feature vector of the traditional single-word terms in the VSD model. This conclusion sufficiently explains why the phrase-based document similarity works much better than the single-word tf-idf similarity measure. 相似文献