首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents an online learning approach to video-based face recognition that does not make any assumptions about the pose, expressions or prior localization of facial landmarks. Learning is performed online while the subject is imaged and gives near realtime feedback on the learning status. Face images are automatically clustered based on the similarity of their local features. The learning process continues until the clusters have a required minimum number of faces and the distance of the farthest face from its cluster mean is below a threshold. A voting algorithm is employed to pick the representative features of each cluster. Local features are extracted from arbitrary keypoints on faces as opposed to pre-defined landmarks and the algorithm is inherently robust to large scale pose variations and occlusions. During recognition, video frames of a probe are sequentially matched to the clusters of all individuals in the gallery and its identity is decided on the basis of best temporally cohesive cluster matches. Online experiments (using live video) were performed on a database of 50 enrolled subjects and another 22 unseen impostors. The proposed algorithm achieved a recognition rate of 97.8% and a verification rate of 100% at a false accept rate of 0.0014. For comparison, experiments were also performed using the Honda/UCSD database and 99.5% recognition rate was achieved.  相似文献   

2.
Automatic text segmentation and text recognition for video indexing   总被引:13,自引:0,他引:13  
Efficient indexing and retrieval of digital video is an important function of video databases. One powerful index for retrieval is the text appearing in them. It enables content-based browsing. We present our new methods for automatic segmentation of text in digital videos. The algorithms we propose make use of typical characteristics of text in videos in order to enable and enhance segmentation performance. The unique features of our approach are the tracking of characters and words over their complete duration of occurrence in a video and the integration of the multiple bitmaps of a character over time into a single bitmap. The output of the text segmentation step is then directly passed to a standard OCR software package in order to translate the segmented text into ASCII. Also, a straightforward indexing and retrieval scheme is introduced. It is used in the experiments to demonstrate that the proposed text segmentation algorithms together with existing text recognition algorithms are suitable for indexing and retrieval of relevant video sequences in and from a video database. Our experimental results are very encouraging and suggest that these algorithms can be used in video retrieval applications as well as to recognize higher level semantics in videos.  相似文献   

3.
A database for handwritten text recognition research   总被引:4,自引:0,他引:4  
An image database for handwritten text recognition research is described. Digital images of approximately 5000 city names, 5000 state names, 10000 ZIP Codes, and 50000 alphanumeric characters are included. Each image was scanned from mail in a working post office at 300 pixels/in in 8-bit gray scale on a high-quality flat bed digitizer. The data were unconstrained for the writer, style, and method of preparation. These characteristics help overcome the limitations of earlier databases that contained only isolated characters or were prepared in a laboratory setting under prescribed circumstances. Also, the database is divided into explicit training and testing sets to facilitate the sharing of results among researchers as well as performance comparisons  相似文献   

4.
This paper presents an end-to-end system for reading handwritten page images. Five functional modules included in the system are introduced in this paper: (i) pre-processing, which concerns introducing an image representation for easy manipulation of large page images and image handling procedures using the image representation; (ii) line separation, concerning text line detection and extracting images of lines of text from a page image; (iii) word segmentation, which concerns locating word gaps and isolating words from a line of text image obtained efficiently and in an intelligent manner; (iv) word recognition, concerning handwritten word recognition algorithms; and (v) linguistic post-pro- cessing, which concerns the use of linguistic constraints to intelligently parse and recognize text. Key ideas employed in each functional module, which have been developed for dealing with the diversity of handwriting in its various aspects with a goal of system reliability and robustness, are described in this paper. Preliminary experiments show promising results in terms of speed and accuracy. Received October 30, 1998 / Revised January 15, 1999  相似文献   

5.
Scene text recognition has received increasing attention in the research community.Text in the wild often possesses irregular arrangements,which typically inclu...  相似文献   

6.
A hybrid algorithm for text recognition is described. The algorithm combines a Markov method with a dictionary method. Experiments were conducted with the hybrid algorithm using as data texts of different nature comprising handprinted letters of varying quality. The performance and computational complexity of the algorithm are deliberated. The computational complexity of the hybrid algorithm is shown to be less than that for the Predictor-Corrector Algorithm,(1) for similar performance.  相似文献   

7.
In this paper, we examine image and video-based recognition applications where the underlying models have a special structure—the linear subspace structure. We discuss how commonly used parametric models for videos and image sets can be described using the unified framework of Grassmann and Stiefel manifolds. We first show that the parameters of linear dynamic models are finite-dimensional linear subspaces of appropriate dimensions. Unordered image sets as samples from a finite-dimensional linear subspace naturally fall under this framework. We show that an inference over subspaces can be naturally cast as an inference problem on the Grassmann manifold. To perform recognition using subspace-based models, we need tools from the Riemannian geometry of the Grassmann manifold. This involves a study of the geometric properties of the space, appropriate definitions of Riemannian metrics, and definition of geodesics. Further, we derive statistical modeling of inter and intraclass variations that respect the geometry of the space. We apply techniques such as intrinsic and extrinsic statistics to enable maximum-likelihood classification. We also provide algorithms for unsupervised clustering derived from the geometry of the manifold. Finally, we demonstrate the improved performance of these methods in a wide variety of vision applications such as activity recognition, video-based face recognition, object recognition from image sets, and activity-based video clustering.  相似文献   

8.
In this paper, we present a recognition system for on-line handwritten texts acquired from a whiteboard. The system is based on the combination of several individual classifiers of diverse nature. Recognizers based on different architectures (hidden Markov models and bidirectional long short-term memory networks) and on different sets of features (extracted from on-line and off-line data) are used in the combination. In order to increase the diversity of the underlying classifiers and fully exploit the current state-of-the-art in cursive handwriting recognition, commercial recognition systems have been included in the combined system, leading to a final word level accuracy of 86.16%. This value is significantly higher than the performance of the best individual classifier (81.26%).  相似文献   

9.
This paper investigates the automatic reading of unconstrained omni-writer handwritten texts. It shows how to endow the reading system with learning faculties necessary to adapt the recognition to each writer's handwriting. In the first part of this paper, we explain how the recognition system can be adapted to a current handwriting by exploiting the graphical context defined by the writer's invariants. This adaptation is guaranteed by activating interaction links over the whole text between the recognition procedures of word entities and those of letter entities. In the second part, we justify the need of an open multiple-agent architecture to support the implementation of such a principle of adaptation. The proposed platform allows to plug expert treatments dedicated to handwriting analysis. We show that this platform helps to implement specific collaboration or cooperation schemes between agents which bring out new trends in the automatic reading of handwritten texts.  相似文献   

10.
Handwritten text recognition systems commonly combine character classification confidence scores and context models for evaluating candidate segmentation-recognition paths, and the classification confidence is usually optimized at character level. In this paper, we investigate into different confidence-learning methods for handwritten Chinese text recognition and propose a string-level confidence-learning method, which estimates confidence parameters by directly optimizing the performance of character string recognition. We first compare the performances of parametric (class-dependent and class-independent parameters) and nonparametric (isotonic regression) confidence-learning methods. Then, we propose two regularized confidence estimation methods and particularly, a string-level confidence-learning method under the minimum classification error criterion. In experiments of online handwritten Chinese text recognition, the string-level confidence-learning method is shown to effectively improve the string recognition performance. Using three character classifiers, the character correct rates are improved from 92.39, 90.24 and 88.69 % to 92.76, 90.91 and 89.93 %, respectively.  相似文献   

11.
Zhang  Hanning  Dong  Bo  Zheng  Qinghua  Feng  Boqin  Xu  Bo  Wu  Haiyu 《Multimedia Tools and Applications》2022,81(20):28327-28346

With the development of the economy, the number of financial tickets is increasing. The traditional invoice reimbursement and entry work bring more and more burden to financial accountants. However, standard OCR technology weakly supports financial tickets with various layouts and mixed Chinese and English characters. In view of this problem, this paper designs a method of financial ticket all-content text information detection and recognition based on deep learning. This method can effectively suppress the common noise of ticket image and extract financial information from ticket image in batch. At the same time, aiming at the problem of multi-character mixed character recognition, we propose a financial ticket character recognition framework (FTCRF), which can improve the accuracy of multi-character mixed character recognition and make the detection and recognition of financial ticket surface information more efficient. The experimental results show that the average recognition accuracy of the character sequence is 91.75%. The average recognition accuracy of the whole ticket is 87%, which significantly improves the efficiency of the financial accounting system.

  相似文献   

12.
This paper investigates rejection strategies for unconstrained offline handwritten text line recognition. The rejection strategies depend on various confidence measures that are based on alternative word sequences. The alternative word sequences are derived from specific integration of a statistical language model in the hidden Markov model based recognition system. Extensive experiments on the IAM database validate the proposed schemes and show that the novel confidence measures clearly outperform two baseline systems which use normalised likelihoods and local n-best lists, respectively.  相似文献   

13.
Li  Jiahui  Wang  Siwei  Wang  Yongtao  Tang  Zhi 《Multimedia Tools and Applications》2019,78(20):29183-29196
Multimedia Tools and Applications - Most of the existing datasets for scene text recognition merely consist of a few thousand training samples with a very limited vocabulary, which cannot meet the...  相似文献   

14.
In many real-world applications, pattern recognition systems are designed a priori using limited and imbalanced data acquired from complex changing environments. Since new reference data often becomes available during operations, performance could be maintained or improved by adapting these systems through supervised incremental learning. To avoid knowledge corruption and sustain a high level of accuracy over time, an adaptive multiclassifier system (AMCS) may integrate information from diverse classifiers that are guided by a population-based evolutionary optimization algorithm. In this paper, an incremental learning strategy based on dynamic particle swarm optimization (DPSO) is proposed to evolve heterogeneous ensembles of classifiers (where each classifier corresponds to a particle) in response to new reference samples. This new strategy is applied to video-based face recognition, using an AMCS that consists of a pool of fuzzy ARTMAP (FAM) neural networks for classification of facial regions, and a niching version of DPSO that optimizes all FAM parameters such that the classification rate is maximized. Given that diversity within a dynamic particle swarm is correlated with diversity within a corresponding pool of base classifiers, DPSO properties are exploited to generate and evolve diversified pools of FAM classifiers, and to efficiently select ensembles among the pools based on accuracy and particle swarm diversity. Performance of the proposed strategy is assessed in terms of classification rate and resource requirements under different incremental learning scenarios, where new reference data is extracted from real-world video streams. Simulation results indicate the DPSO strategy provides an efficient way to evolve ensembles of FAM networks in an AMCS. Maintaining particle diversity in the optimization space yields a level of accuracy that is comparable to AMCS using reference ensemble-based and batch learning techniques, but requires significantly lower computational complexity than assessing diversity among classifiers in the feature or decision spaces.  相似文献   

15.
This paper presents an effective approach for unsupervised language model adaptation (LMA) using multiple models in offline recognition of unconstrained handwritten Chinese texts. The domain of the document to recognize is variable and usually unknown a priori, so we use a two-pass recognition strategy with a pre-defined multi-domain language model set. We propose three methods to dynamically generate an adaptive language model to match the text output by first-pass recognition: model selection, model combination and model reconstruction. In model selection, we use the language model with minimum perplexity on the first-pass recognized text. By model combination, we learn the combination weights via minimizing the sum of squared error with both L2-norm and L1-norm regularization. For model reconstruction, we use a group of orthogonal bases to reconstruct a language model with the coefficients learned to match the document to recognize. Moreover, we reduce the storage size of multiple language models using two compression methods of split vector quantization (SVQ) and principal component analysis (PCA). Comprehensive experiments on two public Chinese handwriting databases CASIA-HWDB and HIT-MW show that the proposed unsupervised LMA approach improves the recognition performance impressively, particularly for ancient domain documents with the recognition accuracy improved by 7 percent. Meanwhile, the combination of the two compression methods largely reduces the storage size of language models with little loss of recognition accuracy.  相似文献   

16.
This paper describes a robust context integration model for on-line handwritten Japanese text recognition. Based on string class probability approximation, the proposed method evaluates the likelihood of candidate segmentation–recognition paths by combining the scores of character recognition, unary and binary geometric features, as well as linguistic context. The path evaluation criterion can flexibly combine the scores of various contexts and is insensitive to the variability in path length, and so, the optimal segmentation path with its string class can be effectively found by Viterbi search. Moreover, the model parameters are estimated by the genetic algorithm so as to optimize the holistic string recognition performance. In experiments on horizontal text lines extracted from the TUAT Kondate database, the proposed method achieves the segmentation rate of 0.9934 that corresponds to a f-measure and the character recognition rate of 92.80%.  相似文献   

17.
Stop word location and identification for adaptive text recognition   总被引:2,自引:0,他引:2  
Abstract. We propose a new adaptive strategy for text recognition that attempts to derive knowledge about the dominant font on a given page. The strategy uses a linguistic observation that over half of all words in a typical English passage are contained in a small set of less than 150 stop words. A small dictionary of such words is compiled from the Brown corpus. An arbitrary text page first goes through layout analysis that produces word segmentation. A fast procedure is then applied to locate the most likely candidates for those words, using only widths of the word images. The identity of each word is determined using a word shape classifier. Using the word images together with their identities, character prototypes can be extracted using a previously proposed method. We describe experiments using simulated and real images. In an experiment using 400 real page images, we show that on average, eight distinct characters can be learned from each page, and the method is successful on 90% of all the pages. These can serve as useful seeds to bootstrap font learning. Received October 8, 1999 / Revised March 29, 2000  相似文献   

18.
19.
Video text detection and segmentation for optical character recognition   总被引:1,自引:0,他引:1  
In this paper, we present approaches to detecting and segmenting text in videos. The proposed video-text-detection technique is capable of adaptively applying appropriate operators for video frames of different modalities by classifying the background complexities. Effective operators such as the repeated shifting operations are applied for the noise removal of images with high edge density. Meanwhile, a text-enhancement technique is used to highlight the text regions of low-contrast images. A coarse-to-fine projection technique is then employed to extract text lines from video frames. Experimental results indicate that the proposed text-detection approach is superior to the machine-learning-based (such as SVM and neural network), multiresolution-based, and DCT-based approaches in terms of detection and false-alarm rates. Besides text detection, a technique for text segmentation is also proposed based on adaptive thresholding. A commercial OCR package is then used to recognize the segmented foreground text. A satisfactory character-recognition rate is reported in our experiments.Published online: 14 December 2004  相似文献   

20.
Text displayed in a video is an essential part for the high-level semantic information of the video content. Therefore, video text can be used as a valuable source for automated video indexing in digital video libraries. In this paper, we propose a workflow for video text detection and recognition. In the text detection stage, we have developed a fast localization-verification scheme, in which an edge-based multi-scale text detector first identifies potential text candidates with high recall rate. Then, detected candidate text lines are refined by using an image entropy-based filter. Finally, Stroke Width Transform (SWT)- and Support Vector Machine (SVM)-based verification procedures are applied to eliminate the false alarms. For text recognition, we have developed a novel skeleton-based binarization method in order to separate text from complex backgrounds to make it processible for standard OCR (Optical Character Recognition) software. Operability and accuracy of proposed text detection and binarization methods have been evaluated by using publicly available test data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号