首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A Nom historical document recognition system is being developed for digital archiving that uses image binarization, character segmentation, and character recognition. It incorporates two versions of off-line character recognition: one for automatic recognition of scanned and segmented character patterns (7660 categories) and the other for user handwritten input (32,695 categories). This separation is used since including less frequently appearing categories in automatic recognition increases the misrecognition rate without reliable statistics on the Nom language. Moreover, a user must be able to check the results and identify the correct categories from an extended set of categories, and a user can input characters by hand. Both versions use the same recognition method, but they are trained using different sets of training patterns. Recursive XY cut and Voronoi diagrams are used for segmentation; kd tree and generalized learning vector quantization are used for coarse classification; and the modified quadratic discriminant function is used for fine classification. The system provides an interface through which a user can check the results, change binarization methods, rectify segmentation, and input correct character categories by hand. Evaluation done using a limited number of Nom historical documents after providing ground truths for them showed that the two stages of recognition along with user checking and correction improved the recognition results significantly.  相似文献   

2.
Motivated by the need for the automatic indexing and analysis of huge number of documents in Ottoman divan poetry, and for discovering new knowledge to preserve and make alive this heritage, in this study we propose a novel method for segmenting and retrieving words in Ottoman divans. Documents in Ottoman are difficult to segment into words without a prior knowledge of the word. In this study, using the idea that divans have multiple copies (versions) by different writers in different writing styles, and word segmentation in some of those versions may be relatively easier to achieve than in other versions, segmentation of the versions (which are difficult, if not impossible, with traditional techniques) is performed using information carried from the simpler version. One version of a document is used as the source dataset and the other version of the same document is used as the target dataset. Words in the source dataset are automatically extracted and used as queries to be spotted in the target dataset for detecting word boundaries. We present the idea of cross-document word matching for a novel task of segmenting historical documents into words. We propose a matching scheme based on possible combinations of sequence of sub-words. We improve the performance of simple features through considering the words in a context. The method is applied on two versions of Layla and Majnun divan by Fuzuli. The results show that, the proposed word-matching-based segmentation method is promising in finding the word boundaries and in retrieving the words across documents.  相似文献   

3.
4.
5.
An algorithm for the recognition of handwritten characters based on the position-width-pulse method of recognition of curves is presented in the article. An algorithm for transformation of characters into curves is presented and a recognition procedure described. Numerical estimators of the proximity S w of curves that graphically map the image of characters to be recognized relative to printed characters used as reference characters are calculated. A conclusion that assigns a recognized character to a corresponding reference character is arrived at on the basis of the minimum value of S w with specified reliability.  相似文献   

6.
7.
8.
9.
New healthcare technologies are emerging with the increasing age of the society, where the development of smart homes for monitoring the elders’ activities is in the center of them. Identifying the resident’s activities in an apartment is an important module in such systems. Dense sensing approach aims to embed sensors in the environment to report the detected events continuously. The events are segmented and analyzed via classifiers to identify the corresponding activity. Although several methods were introduced in recent years for detecting simple activities, the recognition of complex ones requires more effort. Due to the different time duration and event density of each activity, finding the best size of the segments is one of the challenges in detecting the activity. Also, using appropriate classifiers that are capable of detecting simple and interleaved activities is the other issue. In this paper, we devised a two-phase approach called CARER (Complex Activity Recognition using Emerging patterns and Random forest). In the first phase, the emerging patterns are mined, and various features of the activities are extracted to build a model using the Random Forest technique. In the second phase, the sequences of events are segmented dynamically by considering their recency and sensor correlation. Then, the segments are analyzed by the generated model from the previous phase to recognize both simple and complex activities. We examined the performance of the devised approach using the CASAS dataset. To do this, first we investigated several classifiers. The outcome showed that the combination of emerging patterns and the random forest provide a higher degree of accuracy. Then, we compared CARER with the static window approach, which used Hidden Markov Model. To have a fair comparison, we replaced the dynamic segmentation module of CARER with the static one. The results showed more than 12% improvement in f-measure. Finally, we compared our work with Dynamic sensor segmentation for real-time activity recognition, which used dynamic segmentation. The f-measure metric demonstrated up to 12.73% improvement.  相似文献   

10.
A new approach using the Beltrami representation of a shape for topology-preserving image segmentation is proposed in this paper. Using the proposed model, the target object can be segmented from the input image by a region of user-prescribed topology. Given a target image I, a template image J is constructed and then deformed with respect to the Beltrami representation. The deformation on J is designed such that the topology of the segmented region is preserved as which the object is interior in J. The topology-preserving property of the deformation is guaranteed by imposing only one constraint on the Beltrami representation, which is easy to be handled. Introducing the Beltrami representation also allows large deformations on the topological prior J, so that it can be a very simple image, such as an image of disks, torus, disjoint disks. Hence, prior shape information of I is unnecessary for the proposed model. Additionally, the proposed model can be easily incorporated with selective segmentation, in which landmark constraints can be imposed interactively to meet any practical need (e.g., medical imaging). High accuracy and stability of the proposed model to deal with different segmentation tasks are validated by numerical experiments on both artificial and real images.  相似文献   

11.
The current generation of portable mobile devices incorporates various types of sensors that open up new areas for the analysis of human behavior. In this paper, we propose a method for human physical activity recognition using time series, collected from a single tri-axial accelerometer of a smartphone. Primarily, the method solves a problem of online time series segmentation, assuming that each meaningful segment corresponds to one fundamental period of motion. To extract the fundamental period we construct the phase trajectory matrix, applying the technique of principal component analysis. The obtained segments refer to various types of human physical activity. To recognize these activities we use the k-nearest neighbor algorithm and neural network as an alternative. We verify the accuracy of the proposed algorithms by testing them on the WISDM dataset of labeled accelerometer time series from thirteen users. The results show that our method achieves high precision, ensuring nearly 96 % recognition accuracy when using the bunch of segmentation and k-nearest neighbor algorithms.  相似文献   

12.
13.
In this paper, we investigate the relative effect of two strategies for language resource addition for Japanese morphological analysis, a joint task of word segmentation and part-of-speech tagging. The first strategy is adding entries to the dictionary and the second is adding annotated sentences to the training corpus. The experimental results showed that addition of annotated sentences to the training corpus is better than the addition of entries to the dictionary. In particular, adding annotated sentences is especially efficient when we add new words with contexts of several real occurrences as partially annotated sentences, i.e. sentences in which only some words are annotated with word boundary information. According to this knowledge, we performed real annotation experiments on invention disclosure texts and observed word segmentation accuracy. Finally we investigated various language resource addition cases and introduced the notion of non-maleficence, asymmetricity, and additivity of language resources for a task. In the WS case, we found that language resource addition is non-maleficent (adding new resources causes no harm in other domains) and sometimes additive (adding new resources helps other domains). We conclude that it is reasonable for us, NLP tool providers, to distribute only one general-domain model trained from all the language resources we have.  相似文献   

14.
This paper proposes a new fuzzy approach for the segmentation of images. L-interval-valued intuitionistic fuzzy sets (IVIFSs) are constructed from two L-fuzzy sets that corresponds to the foreground (object) and the background of an image. Here, L denotes the number of gray levels in the image. The length of the membership interval of IVIFS quantifies the influence of the ignorance in the construction of the membership function. Threshold for an image is chosen by finding an IVIFS with least entropy. Contributions also include a comparative study with ten other image segmentation techniques. The results obtained by each method have been systematically evaluated using well-known measures for judging the segmentation quality. The proposed method has globally shown better results in all these segmentation quality measures. Experiments also show that the results acquired from the proposed method are highly correlated to the ground truth images.  相似文献   

15.
There has been a growing interest in applying human computation – particularly crowdsourcing techniques – to assist in the solution of multimedia, image processing, and computer vision problems which are still too difficult to solve using fully automatic algorithms, and yet relatively easy for humans. In this paper we focus on a specific problem – object segmentation within color images – and compare different solutions which combine color image segmentation algorithms with human efforts, either in the form of an explicit interactive segmentation task or through an implicit collection of valuable human traces with a game. We use Click’n’Cut, a friendly, web-based, interactive segmentation tool that allows segmentation tasks to be assigned to many users, and Ask’nSeek, a game with a purpose designed for object detection and segmentation. The two main contributions of this paper are: (i) We use the results of Click’n’Cut campaigns with different groups of users to examine and quantify the crowdsourcing loss incurred when an interactive segmentation task is assigned to paid crowd-workers, comparing their results to the ones obtained when computer vision experts are asked to perform the same tasks. (ii) Since interactive segmentation tasks are inherently tedious and prone to fatigue, we compare the quality of the results obtained with Click’n’Cut with the ones obtained using a (fun, interactive, and potentially less tedious) game designed for the same purpose. We call this contribution the assessment of the gamification loss, since it refers to how much quality of segmentation results may be lost when we switch to a game-based approach to the same task. We demonstrate that the crowdsourcing loss is significant when using all the data points from workers, but decreases substantially (and becomes comparable to the quality of expert users performing similar tasks) after performing a modest amount of data analysis and filtering out of users whose data are clearly not useful. We also show that – on the other hand – the gamification loss is significantly more severe: the quality of the results drops roughly by half when switching from a focused (yet tedious) task to a more fun and relaxed game environment.  相似文献   

16.
In this paper, we identify and solve a multi-join optimization problem for Arbitrary Feature-based social image Similarity JOINs(AFS-JOIN). Given two collections(i.e., R and S) of social images that carry both visual, spatial and textual(i.e., tag) information, the multiple joins based on arbitrary features retrieves the pairs of images that are visually, textually similar or spatially close from different users. To address this problem, in this paper, we have proposed three methods to facilitate the multi-join processing: 1) two baseline approaches(i.e., a naïve join approach and a maximal threshold(MT)-based), and 2) a Batch Similarity Join(BSJ) method. For the BSJ method, given m users’ join requests, they are first conversed and grouped into m″ clusters which correspond to m″ join boxes, where m > m″. To speedup the BSJ processing, a feature distance space is first partitioned into some cubes based on four segmentation schemes; the image pairs falling in the cubes are indexed by the cube tree index; thus BSJ processing is transformed into the searching of the image pairs falling in some affected cubes for m″ AFS-JOINs with the aid of the index. An extensive experimental evaluation using real and synthetic datasets shows that our proposed BSJ technique outperforms the state-of-the-art solutions.  相似文献   

17.
Rapid advances in image acquisition and storage technology underline the need for real-time algorithms that are capable of solving large-scale image processing and computer-vision problems. The minimum st cut problem, which is a classical combinatorial optimization problem, is a prominent building block in many vision and imaging algorithms such as video segmentation, co-segmentation, stereo vision, multi-view reconstruction, and surface fitting to name a few. That is why finding a real-time algorithm which optimally solves this problem is of great importance. In this paper, we introduce to computer vision the Hochbaum’s pseudoflow (HPF) algorithm, which optimally solves the minimum st cut problem. We compare the performance of HPF, in terms of execution times and memory utilization, with three leading published algorithms: (1) Goldberg’s and Tarjan’s Push-Relabel; (2) Boykov’s and Kolmogorov’s augmenting paths; and (3) Goldberg’s partial augment-relabel. While the common practice in computer-vision is to use either BK or PRF algorithms for solving the problem, our results demonstrate that, in general, HPF algorithm is more efficient and utilizes less memory than these three algorithms. This strongly suggests that HPF is a great option for many real-time computer-vision problems that require solving the minimum st cut problem.  相似文献   

18.
Text representation is an essential task in transforming the input from text into features that can be later used for further Text Mining and Information Retrieval tasks. The commonly used text representation model is Bags-of-Words (BOW) and the N-gram model. Nevertheless, some known issues of these models, which are inaccurate semantic representation of text and high dimensionality of word size combination, should be investigated. A pattern-based model named Frequent Adjacent Sequential Pattern (FASP) is introduced to represent the text using a set of sequence adjacent words that are frequently used across the document collection. The purpose of this study is to discover the similarity of textual pattern between documents that can be later converted to a set of rules to describe the main news event. The FASP is based on the Pattern-Growth’s divide-and-conquer strategy where the main difference between FASP and the prior technique is in the Pattern Generation phase. This approach is tested against the BOW and N-gram text representation model using Malay and English language news dataset with different term weightings in the Vector Space Model (VSM). The findings demonstrate that the FASP model has a promising performance in finding similarities between documents with the average vector size reduction of 34% against the BOW and 77% against the N-gram model using the Malay dataset. Results using the English dataset is also consistent, indicating that the FASP approach is also language independent.  相似文献   

19.
We initiate a new line of investigation into online property-preserving data reconstruction. Consider a dataset which is assumed to satisfy various (known) structural properties; e.g., it may consist of sorted numbers, or points on a manifold, or vectors in a polyhedral cone, or codewords from an error-correcting code. Because of noise and errors, however, an (unknown) fraction of the data is deemed unsound, i.e., in violation with the expected structural properties. Can one still query into the dataset in an online fashion and be provided data that is always sound? In other words, can one design a filter which, when given a query to any item I in the dataset, returns a sound item J that, although not necessarily in the dataset, differs from I as infrequently as possible. No preprocessing should be allowed and queries should be answered online.We consider the case of a monotone function. Specifically, the dataset encodes a function f:{1,…,n}?? R that is at (unknown) distance ε from monotone, meaning that f can—and must—be modified at ε n places to become monotone.Our main result is a randomized filter that can answer any query in O(log?2 nlog? log?n) time while modifying the function f at only O(ε n) places. The amortized time over n function evaluations is O(log?n). The filter works as stated with probability arbitrarily close to 1. We provide an alternative filter with O(log?n) worst case query time and O(ε nlog?n) function modifications. For reconstructing d-dimensional monotone functions of the form f:{1,…,n} d ? ? R, we present a filter that takes (2 O(d)(log?n)4d?2log?log?n) time per query and modifies at most O(ε n d ) function values (for constant d).  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号