共查询到20条相似文献,搜索用时 113 毫秒
1.
Kanungo T. Haralick R.M. 《IEEE transactions on pattern analysis and machine intelligence》1999,21(2):179-183
Character groundtruth for real, scanned document images is crucial for evaluating the performance of OCR systems, training OCR algorithms, and validating document degradation models. Unfortunately, manual collection of accurate groundtruth for characters in a real (scanned) document image is not practical because (i) accuracy in delineating groundtruth character bounding boxes is not high enough, (ii) it is extremely laborious and time consuming, and (iii) the manual labor required for this task is prohibitively expensive. Ee describe a closed-loop methodology for collecting very accurate groundtruth for scanned documents. We first create ideal documents using a typesetting language. Next we create the groundtruth for the ideal document. The ideal document is then printed, photocopied and then scanned. A registration algorithm estimates the global geometric transformation and then performs a robust local bitmap match to register the ideal document image to the scanned document image. Finally, groundtruth associated with the ideal document image is transformed using the estimated geometric transformation to create the groundtruth for the scanned document image. This methodology is very general and can be used for creating groundtruth for documents in typeset in any language, layout, font, and style. We have demonstrated the method by generating groundtruth for English, Hindi, and FAX document images. The cost of creating groundtruth using our methodology is minimal. If character, word or zone groundtruth is available for any real document, the registration algorithm can be used to generate the corresponding groundtruth for a rescanned version of the document 相似文献
2.
F.S. Brundick Ann E.M. Brodeen Malcolm S. Taylor 《International Journal on Document Analysis and Recognition》2002,4(3):170-176
In this paper we consider a statistical approach to augment a limited database of groundtruth documents for use in evaluation
of optical character recognition software. A modified moving-blocks bootstrap procedure is used to construct surrogate documents
for this purpose which prove to serve effectively and, in some regards, indistinguishably from groundtruth. The proposed method
is validated through a rigorous statistical procedure.
Received: March 30, 2000 / Revised: September 14, 2001 相似文献
3.
4.
Ada Wai-chee Fu Polly Mei-shuen Chan Yin-Ling Cheung Yiu Sang Moon 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(2):154-173
Abstract. For some multimedia applications, it has been found that domain objects cannot be represented as feature vectors in a multidimensional
space. Instead, pair-wise distances between data objects are the only input. To support content-based retrieval, one approach
maps each object to a k-dimensional (k-d) point and tries to preserve the distances among the points. Then, existing spatial access index methods such as the R-trees
and KD-trees can support fast searching on the resulting k-d points. However, information loss is inevitable with such an approach since the distances between data objects can only
be preserved to a certain extent. Here we investigate the use of a distance-based indexing method. In particular, we apply
the vantage point tree (vp-tree) method. There are two important problems for the vp-tree method that warrant further investigation,
the n-nearest neighbors search and the updating mechanisms. We study an n-nearest neighbors search algorithm for the vp-tree, which is shown by experiments to scale up well with the size of the dataset
and the desired number of nearest neighbors, n. Experiments also show that the searching in the vp-tree is more efficient than that for the -tree and the M-tree. Next, we propose solutions for the update problem for the vp-tree, and show by experiments that the algorithms are
efficient and effective. Finally, we investigate the problem of selecting vantage-point, propose a few alternative methods,
and study their impact on the number of distance computation.
Received June 9, 1998 / Accepted January 31, 2000 相似文献
5.
6.
Samia Boukir Patrick Bouthemy François Chaumette Didier Juvin 《Machine Vision and Applications》1998,10(5-6):321-330
This paper presents a local approach for matching contour segments in an image sequence. This study has been primarily motivated
by work concerned with the recovery of 3D structure using active vision. The method to recover the 3D structure of the scene
requires to track in real-time contour segments in an image sequence. Here, we propose an original and robust approach that
is ideally suited for this problem. It is also of more general interest and can be used in any context requiring matching
of line boundaries over time. This method only involves local modeling and computation of moving edges dealing “virtually”
with a contour segment primitive representation. Such an approach brings robustness to contour segmentation instability and
to occlusion, and easiness for implementation. Parallelism has also been investigated using an SIMD-based real-time image-processing
system. This method has been validated with experiments on several real-image sequences. Our results show quite satisfactory
performance and the algorithm runs in a few milliseconds.
Received: 11 December 1996 / Accepted: 8 August 1997 相似文献
7.
Yi-Ping Hung Chu-Song Chen Kuan-Chung Hung Yong-Sheng Chen Chiou-Shann Fuh 《Machine Vision and Applications》1998,10(5-6):280-291
This paper presents a new multi-pass hierarchical stereo-matching approach for generation of digital terrain models (DTMs)
from two overlapping aerial images. Our method consists of multiple passes which compute stereo matches with a coarse-to-fine
and sparse-to-dense paradigm. An image pyramid is generated and used in the hierarchical stereo matching. Within each pass,
the DTM is refined by using the image pyramid from the coarse to the fine level. At the coarsest level of the first pass,
a global stereo-matching technique, the intra-/inter-scanline matching method, is used to generate a good initial DTM for
the subsequent stereo matching. Thereafter, hierarchical block matching is applied to image locations where features are detected
to refine the DTM incrementally. In the first pass, only the feature points near salient edge segments are considered in block
matching. In the second pass, all the feature points are considered, and the DTM obtained from the first pass is used as the
initial condition for local searching. For the passes after the second pass, 3D interactive manual editing can be incorporated
into the automatic DTM refinement process whenever necessary. Experimental results have shown that our method can successfully
provide accurate DTM from aerial images. The success of our approach and system has also been demonstrated with a flight simulation
software.
Received: 4 November 1996 / Accepted: 20 October 1997 相似文献
8.
Yihong Gong 《Multimedia Systems》1999,7(6):449-457
In this paper, we propose a novel system that strives to achieve advanced content-based image retrieval using seamless combination
of two complementary approaches: on the one hand, we propose a new color-clustering method to better capture color properties
of the original images; on the other hand, expecting that image regions acquired from the original images inevitably contain
many errors, we make use of the available erroneous, ill-segmented image regions to accomplish the object-region-based image
retrieval. We also propose an effective image-indexing scheme to facilitate fast and efficient image matching and retrieval.
The carefully designed experimental evaluation shows that our proposed image retrieval system surpasses other methods under
comparison in terms of not only quantitative measures, but also image retrieval capabilities. 相似文献
9.
Direct linear sub-pixel correlation by incorporation of neighbor pixels' information and robust estimation of window transformation 总被引:1,自引:0,他引:1
Standard methods for sub-pixel matching are iterative and nonlinear; they are also sensitive to false initialization and
window deformation. In this paper, we present a linear method that incorporates information from neighboring pixels. Two algorithms
are presented: one ‘fast’ and one ‘robust’. They both start from an initial rough estimate of the matching. The fast one is
suitable for pairs of images requiring negligible window deformation. The robust method is slower but more general and more
precise. It eliminates false matches in the initialization by using robust estimation of the local affine deformation. The
first algorithm attains an accuracy of 0.05 pixels for interest points and 0.06 for random points in the translational case.
For the general case, if the deformation is small, the second method gives an accuracy of 0.05 pixels; while for large deformation,
it gives an accuracy of about 0.06 pixels for points of interest and 0.10 pixels for random points. They are very few false
matches in all cases, even if there are many in the initialization.
Received: 24 July 1997 / Accepted: 4 December 1997 相似文献
10.
Query by video clip 总被引:15,自引:0,他引:15
Typical digital video search is based on queries involving a single shot. We generalize this problem by allowing queries
that involve a video clip (say, a 10-s video segment). We propose two schemes: (i) retrieval based on key frames follows the traditional approach of identifying shots, computing key frames from a video, and then extracting image features
around the key frames. For each key frame in the query, a similarity value (using color, texture, and motion) is obtained
with respect to the key frames in the database video. Consecutive key frames in the database video that are highly similar
to the query key frames are then used to generate the set of retrieved video clips. (ii) In retrieval using sub-sampled frames, we uniformly sub-sample the query clip as well as the database video. Retrieval is based on matching color and texture features
of the sub-sampled frames. Initial experiments on two video databases (basketball video with approximately 16,000 frames and
a CNN news video with approximately 20,000 frames) show promising results. Additional experiments using segments from one
basketball video as query and a different basketball video as the database show the effectiveness of feature representation
and matching schemes. 相似文献
11.
12.
We describe a process of word recognition that has high tolerance for poor image quality, tunability to the lexical content
of the documents to which it is applied, and high speed of operation. This process relies on the transformation of text images
into character shape codes, and on special lexica that contain information on the shape of words. We rely on the structure
of English and the high efficiency of mapping between shape codes and the characters in the words. Remaining ambiguity is
reduced by template matching using exemplars derived from surrounding text, taking advantage of the local consistency of font,
face and size as well as image quality. This paper describes the effects of lexical content, structure and processing on the
performance of a word recognition engine. Word recognition performance is shown to be enhanced by the application of an appropriate
lexicon. Recognition speed is shown to be essentially independent of the details of lexical content provided the intersection
of the occurrences of words in the document and the lexicon is high. Word recognition accuracy is dependent on both intersection
and specificity of the lexicon.
Received May 1, 1998 / Revised October 20, 1998 相似文献
13.
We have developed a novel approach to the extraction of cloud base height (CBH) from pairs of whole-sky imagers (WSIs). The
core problem is to spatially register cloud fields from widely separated WSIs; this complete, triangulation provides the CBH
measurements. The wide camera separation and the self-similarity of clouds defeats standard matching algorithms when applied
to static views of the sky. In response, we use optical flow methods that exploit the fact that modern WSIs provide image
sequences. We will describe the algorithm, a confidence metric for its performance, a method to correct the severe projective
effects of the WSI camera, and results on real data. 相似文献
14.
Justin E. Harlow III Franc Brglez 《International Journal on Software Tools for Technology Transfer (STTT)》2001,3(2):193-206
Traditional approaches to the measurement of performance for CAD algorithms involve the use of sets of so-called “benchmark
circuits.” In this paper, we demonstrate that current procedures do not produce results which accurately characterize the
behavior of the algorithms under study. Indeed, we show that the apparent advances in algorithms which are documented by traditional
benchmarking may well be due to chance, and not due to any new properties of the algorithms. As an alternative, we introduce
a new methodology for the characterization of CAD heuristics which employs well-studied design of experiments methods. We
show through numerous examples how such methods can be applied to evaluate the behavior of heuristics used in BDD variable
ordering.
Published online: 15 May 2001 相似文献
15.
K. Selçuk Candan Eric Lemar V.S. Subrahmanian 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(2):131-153
Abstract. Though there has been extensive work on multimedia databases in the last few years, there is no prevailing notion of a multimedia
view, nor there are techniques to create, manage, and maintain such views. Visualizing the results of a dynamic multimedia
query or materializing a dynamic multimedia view corresponds to assembling and delivering an interactive multimedia presentation
in accordance with the visualization specifications. In this paper, we suggest that a non-interactive multimedia presentation
is a set of virtual objects with associated spatial and temporal presentation constraints. A virtual object is either an object, or the result of a query.
As queries may have different answers at different points in time, scheduling the presentation of such objects is nontrivial.
We then develop a probabilistic model of interactive multimedia presentations, extending the non-interactive model described
earlier. We also develop a probabilistic model of interactive visualization where the probabilities reflect the user profiles,
or the likelihood of certain user interactions. Based on this probabilistic model, we develop three utility-theoretic based
types of prefetching algorithms that anticipate how users will interact with the presentation. These prefetching algorithms
allow efficient visualization of the query results in accordance with the underlying specification. We have built a prototype
system that incorporates these algorithms. We report on the results of experiments conducted on top of this implementation.
Received June 10, 1998 / Accepted November 10, 1999 相似文献
16.
为提高复杂场景下基于关键点的平面物体跟踪算法的鲁棒性,提出一种融合光流的平面物体跟踪算法。检测目标物体与输入图像的关键点及其对应描述符,由最近邻匹配方法构建目标与图像间关键点匹配集合,通过光流法构建相邻两张图像间关键点的对应关系,将已构建的关键点匹配集合与基于光流的对应关系通过加权平均的策略进行融合,得出修正的关键点匹配集合,根据关键点匹配估计目标物体在当前图像的单应性变换矩阵,从而完成目标跟踪。在POT数据集上的实验结果表明,与SIFT、FERNS等算法相比,在校正误差阈值为5时,该算法在所有图像序列上的平均跟踪精度达到66.67%,具有较好的跟踪性能。 相似文献
17.
Automatic text segmentation and text recognition for video indexing 总被引:13,自引:0,他引:13
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval
is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of
text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable
and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their
complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single
bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate
the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments
to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable
for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging
and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics
in videos. 相似文献
18.
Algorithms for coplanar camera calibration 总被引:5,自引:0,他引:5
Abstract. Coplanar camera calibration is the process of determining the extrinsic and intrinsic camera parameters from a given set
of image and world points, when the world points lie on a two-dimensional plane. Noncoplanar calibration, on the other hand,
involves world points that do not lie on a plane. While optimal solutions for both the camera-calibration procedures can be
obtained by solving a set of constrained nonlinear optimization problems, there are significant structural differences between
the two formulations. We investigate the computational and algorithmic implications of such underlying differences, and provide
a set of efficient algorithms that are specifically tailored for the coplanar case. More specifically, we offer the following:
(1) four algorithms for coplanar calibration that use linear or iterative linear methods to solve the underlying nonlinear
optimization problem, and produce sub-optimal solutions. These algorithms are motivated by their computational efficiency
and are useful for real-time low-cost systems. (2) Two optimal solutions for coplanar calibration, including one novel nonlinear
algorithm. A constraint for the optimal estimation of extrinsic parameters is also given. (3) A Lyapunov type convergence
analysis for the new nonlinear algorithm. We test the validity and performance of the calibration procedures with both synthetic
and real images. The results consistently show significant improvements over less complete camera models.
Received: 30 September 1998 / Accepted: 12 January 2000 相似文献
19.
Distributed Cognition in an Emergency Co-ordination Center 总被引:1,自引:1,他引:0
*Formerly at the Department of Communication Studies, Linko¨ping University, Sweden. Most of this work was conducted during
the author’s employment at the Department of Communication Studies. Recent research concerning the control of complex systems
stresses the systemic character of the work of the controlling system, including the number of people and artefacts as well
as the environment. This study adds to the growing body of knowledge by focusing on the internal working of such a system.
Our vantage point is the theoretical framework of distributed cognition. Through a field study of an emergency co-ordination
centre we try to demonstrate how the team’s cognitive tasks, to assess an event and to dispatch adequate resources, are achieved
by mutual awareness, joint situation assessment, and the co-ordinated use of the technology and the physical arrangement of
the co-ordination room. 相似文献
20.
We present an efficient and accurate method for retrieving images based on color similarity with a given query image or histogram.
The method matches the query against parts of the image using histogram intersection. Efficient searching for the best matching
subimage is done by pruning the set of subimages using upper bound estimates. The method is fast, has high precision and recall
and also allows queries based on the positions of one or more objects in the database image. Experimental results showing
the efficiency of the proposed search method, and high precision and recall of retrieval are presented.
Received: 20 January 1997 / Accepted: 5 January 1998 相似文献