期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An automatic closed-loop methodology for generating charactergroundtruth for scanned documents

Kanungo T. Haralick R.M. 《IEEE transactions on pattern analysis and machine intelligence》1999,21(2):179-183

Character groundtruth for real, scanned document images is crucial for evaluating the performance of OCR systems, training OCR algorithms, and validating document degradation models. Unfortunately, manual collection of accurate groundtruth for characters in a real (scanned) document image is not practical because (i) accuracy in delineating groundtruth character bounding boxes is not high enough, (ii) it is extremely laborious and time consuming, and (iii) the manual labor required for this task is prohibitively expensive. Ee describe a closed-loop methodology for collecting very accurate groundtruth for scanned documents. We first create ideal documents using a typesetting language. Next we create the groundtruth for the ideal document. The ideal document is then printed, photocopied and then scanned. A registration algorithm estimates the global geometric transformation and then performs a robust local bitmap match to register the ideal document image to the scanned document image. Finally, groundtruth associated with the ideal document image is transformed using the estimated geometric transformation to create the groundtruth for the scanned document image. This methodology is very general and can be used for creating groundtruth for documents in typeset in any language, layout, font, and style. We have demonstrated the method by generating groundtruth for English, Hindi, and FAX document images. The cost of creating groundtruth using our methodology is minimal. If character, word or zone groundtruth is available for any real document, the registration algorithm can be used to generate the corresponding groundtruth for a rescanned version of the document 相似文献

2.

A statistical approach to the generation of a database for evaluating OCR software

F.S. Brundick Ann E.M. Brodeen Malcolm S. Taylor 《International Journal on Document Analysis and Recognition》2002,4(3):170-176

In this paper we consider a statistical approach to augment a limited database of groundtruth documents for use in evaluation of optical character recognition software. A modified moving-blocks bootstrap procedure is used to construct surrogate documents for this purpose which prove to serve effectively and, in some regards, indistinguishably from groundtruth. The proposed method is validated through a rigorous statistical procedure. Received: March 30, 2000 / Revised: September 14, 2001 相似文献

3.

Contour representation of binary images using run-type direction codes

Takafumi Miyatake Hitoshi Matsushima Masakazu Ejiri 《Machine Vision and Applications》1997,9(4):193-200

相似文献

4.

Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances 总被引：1，自引：0，他引：1

Ada Wai-chee Fu Polly Mei-shuen Chan Yin-Ling Cheung Yiu Sang Moon 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(2):154-173

Abstract. For some multimedia applications, it has been found that domain objects cannot be represented as feature vectors in a multidimensional space. Instead, pair-wise distances between data objects are the only input. To support content-based retrieval, one approach maps each object to a k-dimensional (k-d) point and tries to preserve the distances among the points. Then, existing spatial access index methods such as the R-trees and KD-trees can support fast searching on the resulting k-d points. However, information loss is inevitable with such an approach since the distances between data objects can only be preserved to a certain extent. Here we investigate the use of a distance-based indexing method. In particular, we apply the vantage point tree (vp-tree) method. There are two important problems for the vp-tree method that warrant further investigation, the n-nearest neighbors search and the updating mechanisms. We study an n-nearest neighbors search algorithm for the vp-tree, which is shown by experiments to scale up well with the size of the dataset and the desired number of nearest neighbors, n. Experiments also show that the searching in the vp-tree is more efficient than that for the -tree and the M-tree. Next, we propose solutions for the update problem for the vp-tree, and show by experiments that the algorithms are efficient and effective. Finally, we investigate the problem of selecting vantage-point, propose a few alternative methods, and study their impact on the number of distance computation. Received June 9, 1998 / Accepted January 31, 2000 相似文献

5.

Software architecture of PSET: a page segmentation evaluation toolkit

Song Mao Tapas Kanungo 《International Journal on Document Analysis and Recognition》2002,4(3):205-217

相似文献

6.

A local method for contour matching and its parallel implementation

Samia Boukir Patrick Bouthemy François Chaumette Didier Juvin 《Machine Vision and Applications》1998,10(5-6):321-330

This paper presents a local approach for matching contour segments in an image sequence. This study has been primarily motivated by work concerned with the recovery of 3D structure using active vision. The method to recover the 3D structure of the scene requires to track in real-time contour segments in an image sequence. Here, we propose an original and robust approach that is ideally suited for this problem. It is also of more general interest and can be used in any context requiring matching of line boundaries over time. This method only involves local modeling and computation of moving edges dealing “virtually” with a contour segment primitive representation. Such an approach brings robustness to contour segmentation instability and to occlusion, and easiness for implementation. Parallelism has also been investigated using an SIMD-based real-time image-processing system. This method has been validated with experiments on several real-image sequences. Our results show quite satisfactory performance and the algorithm runs in a few milliseconds. Received: 11 December 1996 / Accepted: 8 August 1997 相似文献

7.

Multipass hierarchical stereo matching for generation of digital terrain models from aerial images

Yi-Ping Hung Chu-Song Chen Kuan-Chung Hung Yong-Sheng Chen Chiou-Shann Fuh 《Machine Vision and Applications》1998,10(5-6):280-291

This paper presents a new multi-pass hierarchical stereo-matching approach for generation of digital terrain models (DTMs) from two overlapping aerial images. Our method consists of multiple passes which compute stereo matches with a coarse-to-fine and sparse-to-dense paradigm. An image pyramid is generated and used in the hierarchical stereo matching. Within each pass, the DTM is refined by using the image pyramid from the coarse to the fine level. At the coarsest level of the first pass, a global stereo-matching technique, the intra-/inter-scanline matching method, is used to generate a good initial DTM for the subsequent stereo matching. Thereafter, hierarchical block matching is applied to image locations where features are detected to refine the DTM incrementally. In the first pass, only the feature points near salient edge segments are considered in block matching. In the second pass, all the feature points are considered, and the DTM obtained from the first pass is used as the initial condition for local searching. For the passes after the second pass, 3D interactive manual editing can be incorporated into the automatic DTM refinement process whenever necessary. Experimental results have shown that our method can successfully provide accurate DTM from aerial images. The success of our approach and system has also been demonstrated with a flight simulation software. Received: 4 November 1996 / Accepted: 20 October 1997 相似文献

8.

Advancing content-based image retrieval by exploiting image color and region features

Yihong Gong 《Multimedia Systems》1999,7(6):449-457

In this paper, we propose a novel system that strives to achieve advanced content-based image retrieval using seamless combination of two complementary approaches: on the one hand, we propose a new color-clustering method to better capture color properties of the original images; on the other hand, expecting that image regions acquired from the original images inevitably contain many errors, we make use of the available erroneous, ill-segmented image regions to accomplish the object-region-based image retrieval. We also propose an effective image-indexing scheme to facilitate fast and efficient image matching and retrieval. The carefully designed experimental evaluation shows that our proposed image retrieval system surpasses other methods under comparison in terms of not only quantitative measures, but also image retrieval capabilities. 相似文献

9.

Direct linear sub-pixel correlation by incorporation of neighbor pixels' information and robust estimation of window transformation 总被引：1，自引：0，他引：1

Zhong-Dan Lan Roger Mohr 《Machine Vision and Applications》1998,10(5-6):256-268

Standard methods for sub-pixel matching are iterative and nonlinear; they are also sensitive to false initialization and window deformation. In this paper, we present a linear method that incorporates information from neighboring pixels. Two algorithms are presented: one ‘fast’ and one ‘robust’. They both start from an initial rough estimate of the matching. The fast one is suitable for pairs of images requiring negligible window deformation. The robust method is slower but more general and more precise. It eliminates false matches in the initialization by using robust estimation of the local affine deformation. The first algorithm attains an accuracy of 0.05 pixels for interest points and 0.06 for random points in the translational case. For the general case, if the deformation is small, the second method gives an accuracy of 0.05 pixels; while for large deformation, it gives an accuracy of about 0.06 pixels for points of interest and 0.10 pixels for random points. They are very few false matches in all cases, even if there are many in the initialization. Received: 24 July 1997 / Accepted: 4 December 1997 相似文献

10.

Query by video clip 总被引：15，自引：0，他引：15

Anil K. Jain Aditya Vailaya Xiong Wei 《Multimedia Systems》1999,7(5):369-384

Typical digital video search is based on queries involving a single shot. We generalize this problem by allowing queries that involve a video clip (say, a 10-s video segment). We propose two schemes: (i) retrieval based on key frames follows the traditional approach of identifying shots, computing key frames from a video, and then extracting image features around the key frames. For each key frame in the query, a similarity value (using color, texture, and motion) is obtained with respect to the key frames in the database video. Consecutive key frames in the database video that are highly similar to the query key frames are then used to generate the set of retrieved video clips. (ii) In retrieval using sub-sampled frames, we uniformly sub-sample the query clip as well as the database video. Retrieval is based on matching color and texture features of the sub-sampled frames. Initial experiments on two video databases (basketball video with approximately 16,000 frames and a CNN news video with approximately 20,000 frames) show promising results. Additional experiments using segments from one basketball video as query and a different basketball video as the database show the effectiveness of feature representation and matching schemes. 相似文献

11.

A fast algorithm for skew detection of document images using morphology 总被引：1，自引：0，他引：1

A.K. Das B. Chanda 《International Journal on Document Analysis and Recognition》2001,4(2):109-114

相似文献

12.

Shape-based word recognition

A. Lawrence Spitz 《International Journal on Document Analysis and Recognition》1999,1(4):178-190

We describe a process of word recognition that has high tolerance for poor image quality, tunability to the lexical content of the documents to which it is applied, and high speed of operation. This process relies on the transformation of text images into character shape codes, and on special lexica that contain information on the shape of words. We rely on the structure of English and the high efficiency of mapping between shape codes and the characters in the words. Remaining ambiguity is reduced by template matching using exemplars derived from surrounding text, taking advantage of the local consistency of font, face and size as well as image quality. This paper describes the effects of lexical content, structure and processing on the performance of a word recognition engine. Word recognition performance is shown to be enhanced by the application of an appropriate lexicon. Recognition speed is shown to be essentially independent of the details of lexical content provided the intersection of the occurrences of words in the document and the lexicon is high. Word recognition accuracy is dependent on both intersection and specificity of the lexicon. Received May 1, 1998 / Revised October 20, 1998 相似文献

13.

The computation of cloud base height from paired whole-sky imaging cameras

Mark C. Allmen Philip Kegelmeyer Jr. 《Machine Vision and Applications》1997,9(4):160-165

We have developed a novel approach to the extraction of cloud base height (CBH) from pairs of whole-sky imagers (WSIs). The core problem is to spatially register cloud fields from widely separated WSIs; this complete, triangulation provides the CBH measurements. The wide camera separation and the self-similarity of clouds defeats standard matching algorithms when applied to static views of the sky. In response, we use optical flow methods that exploit the fact that modern WSIs provide image sequences. We will describe the algorithm, a confidence metric for its performance, a method to correct the severe projective effects of the WSI camera, and results on real data. 相似文献

14.

Design of experiments and evaluation of BDD ordering heuristics

Justin E. Harlow III Franc Brglez 《International Journal on Software Tools for Technology Transfer (STTT)》2001,3(2):193-206

Traditional approaches to the measurement of performance for CAD algorithms involve the use of sets of so-called “benchmark circuits.” In this paper, we demonstrate that current procedures do not produce results which accurately characterize the behavior of the algorithms under study. Indeed, we show that the apparent advances in algorithms which are documented by traditional benchmarking may well be due to chance, and not due to any new properties of the algorithms. As an alternative, we introduce a new methodology for the characterization of CAD heuristics which employs well-studied design of experiments methods. We show through numerous examples how such methods can be applied to evaluate the behavior of heuristics used in BDD variable ordering. Published online: 15 May 2001 相似文献

15.

View management in multimedia databases

K. Selçuk Candan Eric Lemar V.S. Subrahmanian 《The VLDB Journal The International Journal on Very Large Data Bases》2000,9(2):131-153

Abstract. Though there has been extensive work on multimedia databases in the last few years, there is no prevailing notion of a multimedia view, nor there are techniques to create, manage, and maintain such views. Visualizing the results of a dynamic multimedia query or materializing a dynamic multimedia view corresponds to assembling and delivering an interactive multimedia presentation in accordance with the visualization specifications. In this paper, we suggest that a non-interactive multimedia presentation is a set of virtual objects with associated spatial and temporal presentation constraints. A virtual object is either an object, or the result of a query. As queries may have different answers at different points in time, scheduling the presentation of such objects is nontrivial. We then develop a probabilistic model of interactive multimedia presentations, extending the non-interactive model described earlier. We also develop a probabilistic model of interactive visualization where the probabilities reflect the user profiles, or the likelihood of certain user interactions. Based on this probabilistic model, we develop three utility-theoretic based types of prefetching algorithms that anticipate how users will interact with the presentation. These prefetching algorithms allow efficient visualization of the query results in accordance with the underlying specification. We have built a prototype system that incorporates these algorithms. We report on the results of experiments conducted on top of this implementation. Received June 10, 1998 / Accepted November 10, 1999 相似文献

16.

基于关键点及光流的平面物体跟踪算法

季皓宣烨梁鹏鹏柴玉梅王黎明《计算机工程》2021,47(4):234-240

为提高复杂场景下基于关键点的平面物体跟踪算法的鲁棒性,提出一种融合光流的平面物体跟踪算法。检测目标物体与输入图像的关键点及其对应描述符,由最近邻匹配方法构建目标与图像间关键点匹配集合,通过光流法构建相邻两张图像间关键点的对应关系,将已构建的关键点匹配集合与基于光流的对应关系通过加权平均的策略进行融合,得出修正的关键点匹配集合,根据关键点匹配估计目标物体在当前图像的单应性变换矩阵,从而完成目标跟踪。在POT数据集上的实验结果表明,与SIFT、FERNS等算法相比,在校正误差阈值为5时,该算法在所有图像序列上的平均跟踪精度达到66.67%,具有较好的跟踪性能。相似文献

17.

Automatic text segmentation and text recognition for video indexing 总被引：13，自引：0，他引：13

Rainer Lienhart Wolfgang Effelsberg 《Multimedia Systems》2000,8(1):69-81

Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics in videos. 相似文献

18.

Algorithms for coplanar camera calibration 总被引：5，自引：0，他引：5

Chanchal Chatterjee Vwani P. Roychowdhury 《Machine Vision and Applications》2000,12(2):84-97

Abstract. Coplanar camera calibration is the process of determining the extrinsic and intrinsic camera parameters from a given set of image and world points, when the world points lie on a two-dimensional plane. Noncoplanar calibration, on the other hand, involves world points that do not lie on a plane. While optimal solutions for both the camera-calibration procedures can be obtained by solving a set of constrained nonlinear optimization problems, there are significant structural differences between the two formulations. We investigate the computational and algorithmic implications of such underlying differences, and provide a set of efficient algorithms that are specifically tailored for the coplanar case. More specifically, we offer the following: (1) four algorithms for coplanar calibration that use linear or iterative linear methods to solve the underlying nonlinear optimization problem, and produce sub-optimal solutions. These algorithms are motivated by their computational efficiency and are useful for real-time low-cost systems. (2) Two optimal solutions for coplanar calibration, including one novel nonlinear algorithm. A constraint for the optimal estimation of extrinsic parameters is also given. (3) A Lyapunov type convergence analysis for the new nonlinear algorithm. We test the validity and performance of the calibration procedures with both synthetic and real images. The results consistently show significant improvements over less complete camera models. Received: 30 September 1998 / Accepted: 12 January 2000 相似文献

19.

Distributed Cognition in an Emergency Co-ordination Center 总被引：1，自引：1，他引：0

H. Artman Y. Wærn 《Cognition, Technology & Work》1999,1(4):237-246

*Formerly at the Department of Communication Studies, Linko¨ping University, Sweden. Most of this work was conducted during the author’s employment at the Department of Communication Studies. Recent research concerning the control of complex systems stresses the systemic character of the work of the controlling system, including the number of people and artefacts as well as the environment. This study adds to the growing body of knowledge by focusing on the internal working of such a system. Our vantage point is the theoretical framework of distributed cognition. Through a field study of an emergency co-ordination centre we try to demonstrate how the team’s cognitive tasks, to assess an event and to dispatch adequate resources, are achieved by mutual awareness, joint situation assessment, and the co-ordinated use of the technology and the physical arrangement of the co-ordination room. 相似文献

20.

Image retrieval using efficient local-area matching

V.V. Vinod Hiroshi Murase 《Machine Vision and Applications》1998,11(1):7-15

We present an efficient and accurate method for retrieving images based on color similarity with a given query image or histogram. The method matches the query against parts of the image using histogram intersection. Efficient searching for the best matching subimage is done by pruning the set of subimages using upper bound estimates. The method is fast, has high precision and recall and also allows queries based on the positions of one or more objects in the database image. Experimental results showing the efficiency of the proposed search method, and high precision and recall of retrieval are presented. Received: 20 January 1997 / Accepted: 5 January 1998 相似文献