首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
N. Tripathy  U. Pal 《Sadhana》2006,31(6):755-769
Segmentation of handwritten text into lines, words and characters is one of the important steps in the handwritten text recognition process. In this paper we propose a water reservoir concept-based scheme for segmentation of unconstrained Oriya handwritten text into individual characters. Here, at first, the text image is segmented into lines, and the lines are then segmented into individual words. For line segmentation, the document is divided into vertical stripes. Analysing the heights of the water reservoirs obtained from different components of the document, the width of a stripe is calculated. Stripe-wise horizontal histograms are then computed and the relationship of the peak-valley points of the histograms is used for line segmentation. Based on vertical projection profiles and structural features of Oriya characters, text lines are segmented into words. For character segmentation, at first, the isolated and connected (touching) characters in a word are detected. Using structural, topological and water reservoir concept-based features, characters of the word that touch are then segmented. From experiments we have observed that the proposed “touching character” segmentation module has 96.7% accuracy for two-character touching strings.  相似文献   

2.
《成像科学杂志》2013,61(3):177-182
Abstract

In composite document image, handwritten and printed text is often found to be overlapped with printed lines. The problem becomes critical for obscure and broken lines at multiple positions. Consequently, line removal is unavoidable pre-processing stage in the development of robust object recognisers. Moreover, the restoration of the smash-up characters after removal of lines still persists to be a problem of interest. This paper presents a new approach to detect and remove unwanted printed line inherited in the text image at any position without character distortion to avoid restoration stage. The proposed technique is based on connected component analysis. Experiments are conducted using single line images that scanned and extracted manually from several documents and forms. It is demonstrated that our approach is equally suitable to deal with line removal in printed and handwritten text written in any language circumvent restoration stage. Promising results are reported in comparison with the other researchers in the state of the arts.  相似文献   

3.
4.
Text mining has become a major research topic in which text classification is the important task for finding the relevant information from the new document. Accordingly, this paper presents a semantic word processing technique for text categorization that utilizes semantic keywords, instead of using independent features of the keywords in the documents. Hence, the dimensionality of the search space can be reduced. Here, the Back Propagation Lion algorithm (BP Lion algorithm) is also proposed to overcome the problem in updating the neuron weight. The proposed text classification methodology is experimented over two data sets, namely, 20 Newsgroup and Reuter. The performance of the proposed BPLion is analysed, in terms of sensitivity, specificity, and accuracy, and compared with the performance of the existing works. The result shows that the proposed BPLion algorithm and semantic processing methodology classifies the documents with less training time and more classification accuracy of 90.9%.  相似文献   

5.
This article presents a new method for the binarization of color document images. Initially, the colors of the document image are reduced to a small number using a new color reduction technique. Specifically, this technique estimates the dominant colors and then assigns the original image colors to them in order that the background and text components to become uniform. Each dominant color defines a color plane in which the connected components (CCs) are extracted. Next, in each color plane a CC filtering procedure is applied which is followed by a grouping procedure. At the end of this stage, blocks of CCs are constructed which are next redefined by obtaining the direction of connection (DOC) property for each CC. Using the DOC property, the blocks of CCs are classified as text or nontext. The identified text blocks are binarized properly using suitable binarization techniques, considering the rest of the pixels as background. The final result is a binary image which contains always black characters in white background independently of the original colors of each text block. The proposed document binarization approach can also be used for binarization of noisy color (or gray‐scale) document images. Several experiments that confirm the effectiveness of the proposed technique are presented. © 2007 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 16, 262–274, 2006  相似文献   

6.
梁勇第 《包装工程》2018,39(24):136-140
目的 探讨在现代标志设计中,如何使设计作品既具有浓厚的民族传统内涵又不失时代气息。方法 借鉴篆刻艺术中典型的朱白文的布局、印面空间的疏密对比、空间排叠布局等形式语言。结论 汉字作为元素设计出的作品与篆刻艺术在设计构思、空间构成上有着极大的相似性,都是在“方寸之间”进行创意构思,因而,标志设计可以借鉴篆刻艺术的形式法则进行有效设计。  相似文献   

7.
8.
框架-剪力墙-薄壁筒斜交结构分析的状态空间法   总被引:8,自引:1,他引:7  
胡启平  张华 《工程力学》2006,23(4):125-129
采用沿高度方向连续化的方法,建立了框架-剪力墙-薄壁筒斜交结构协同分析的连续化计算模型。将沿结构高度方向的坐标模拟成时间坐标,导出了问题的状态空间表达式,用状态空间理论的方法求出了状态向量表达式,由结构的边界条件可求出初始状态向量,从而得到了结构各单元的侧移与内力。给出了数值算例,并与其他算法的结果进行了比较。这种结构计算方法计算量小,精度较高,便于应用,能方便地推广应用于变截面建筑结构。  相似文献   

9.
A new technique for color reduction of complex document images is presented in this article. It reduces significantly the number of colors of the document image (less than 15 colors in most of the cases) so as to have solid characters and uniform local backgrounds. Therefore, this technique can be used as a preprocessing step by text information extraction applications. Specifically, using the edge map of the document image, a representative set of samples is chosen that constructs a 3D color histogram. Based on these samples in the 3D color space, a relatively large number of colors (usually no more than 100 colors) are obtained by using a simple clustering procedure. The final colors are obtained by applying a mean‐shift based procedure. Also, an edge preserving smoothing filter is used as a preprocessing stage that enhances significantly the quality of the initial image. Experimental results prove the method's capability of producing correctly segmented complex color documents where the character elements can be easily extracted as connected components. © 2009 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 19, 14–26, 2009  相似文献   

10.
In this article, we developed a Bayesian model to characterize text line and text block structures on document images using the text word bounding boxes. We posed the extraction problem as finding the text lines and text blocks that maximize the Bayesian probability of the text lines and text blocks given the text word bounding boxes. In particular, we derived the so-called probabilistic linear displacement model (PLDM) to model the text line structures from text word bounding boxes. We also developed an augmented PLDM model to characterize the text block structures from text line bounding boxes. By systematically gathering statistics from a large population of document images, we are able to validate our models through experiments and determine the proper model parameters. We designed and implemented an iterative algorithm that used these probabilistic models to extract the text lines and text blocks. The quantitative performances of the algorithm in terms of the rates of miss, false, correct, splitting, merging, and spurious detections of the text lines and text blocks are reported. © 1996 John Wiley & Sons, Inc.  相似文献   

11.
Document image analysis: A primer   总被引:1,自引:0,他引:1  
  相似文献   

12.
In this paper, we investigate whether a semantic representation of patent documents provides added value for a multi-dimensional visual exploration of a patent landscape compared to traditional approaches that use tf–idf (term frequency–inverse document frequency). Word embeddings from a pre-trained word2vec model created from patent text are used to calculate pairwise similarities in order to represent each document in the semantic space. Then, a hierarchical clustering method is applied to create several semantic aggregation levels for a collection of patent documents. For visual exploration, we have seamlessly integrated multiple interaction metaphors that combine semantics and additional metadata for improving hierarchical exploration of large document collections.  相似文献   

13.
In recent years, the volume of information in digital form has increased tremendously owing to the increased popularity of the World Wide Web. As a result, the use of techniques for extracting useful information from large collections of data, and particularly documents, has become more necessary and challenging. Text clustering is such a technique; it consists in dividing a set of text documents into clusters (groups), so that documents within the same cluster are closely related, whereas documents in different clusters are as different as possible. Clustering depends on measuring the content (i.e., words) of a document in terms of relevance. Nevertheless, as documents usually contain a large number of words, some of them may be irrelevant to the topic under consideration or redundant. This can confuse and complicate the clustering process and make it less accurate. Accordingly, feature selection methods have been employed to reduce data dimensionality by selecting the most relevant features. In this study, we developed a text document clustering optimization model using a novel genetic frog-leaping algorithm that efficiently clusters text documents based on selected features. The proposed approach is based on two metaheuristic algorithms: a genetic algorithm (GA) and a shuffled frog-leaping algorithm (SFLA). The GA performs feature selection, and the SFLA performs clustering. To evaluate its effectiveness, the proposed approach was tested on a well-known text document dataset: the “20Newsgroup” dataset from the University of California Irvine Machine Learning Repository. Overall, after multiple experiments were compared and analyzed, it was demonstrated that using the proposed algorithm on the 20Newsgroup dataset greatly facilitated text document clustering, compared with classical K-means clustering. Nevertheless, this improvement requires longer computational time.  相似文献   

14.
15.
The paper presents the large deformation flexural response of composite laminated skew plates subjected to uniform transverse pressure. Third order shear deformation theory (TSDT) and von-Karman’s nonlinearity is used for the analysis. Skew domain is mapped into a square domain and finite degree double Chebyshev series is used to discretize the space domain. No grid generation is required in the present solution technique. The nonlinear equations are linearized using quadratic extrapolation technique and the behavior of moderately thick laminated composite skew plates is studied. The effects of geometric nonlinearity, transverse shear, boundary conditions, aspect ratio and modular ratio on the behavior of laminated composite skew plates are discussed in detail.  相似文献   

16.
The finite element dynamic stability analysis of laminated composite skew structures subjected to in-plane pulsating forces is carried out based on the higher-order shear deformation theory (HSDT). The two boundaries of the instability regions are determined using the method proposed by Bolotin. The numerical results obtained for square and skew plates with or without central cutout are in good agreement with those reported by other investigators. The new results for laminated skew plate structures containing cutout in this study mainly show the effect of the interactions between the skew angle and other various parameters, for example, cutout size, the fiber angle of layer and thickness-to-length ratio. The effect of the magnitude of the periodic in-plane load on the dynamic instability index is also investigated.  相似文献   

17.
An efficient solution technique is proposed for the three‐dimensional boundary element modelling of half‐space problems. The proposed technique uses alternative fundamental solutions of the half‐space (Mindlin's solutions for isotropic case) and full‐space (Kelvin's solutions) problems. Three‐dimensional infinite boundary elements are frequently employed when the stresses at the internal points are required to be evaluated. In contrast to the published works, the strongly singular line integrals are avoided in the proposed solution technique, while the discretization of infinite elements is independent of the finite boundary elements. This algorithm also leads to a better numerical accuracy while the computational time is reduced. Illustrative numerical examples for typical isotropic and transversely isotropichalf‐space problems demonstrate the potential applications of the proposed formulations. Incidentally, the results of the illustrative examples also provide a parametric study for the imperfect contact problem. Copyright © 2001 John Wiley & Sons, Ltd.  相似文献   

18.
Abstract

A textual database deals with retrieval and manipulation of documents. It allows a user to search on‐line complete documents or parts of documents rather than attributes of documents. Resembling a formatted database which uses a data model as its underlying structure, a textual database has to base its development upon a document model. In this paper, a document model, called the ECHO model, is proposed. The ECHO model provides a document representation, called the ECHO structure, for expressing documents and operations on the representation that serve to express queries and manipulations on documents. It has the ability to provide multiple document structures for a document, a flexible search unit for retrieving textual information, and a subrange search on a textual database. In addition, the ECHO structure is relatively easy to maintain. An architecture of a textual database based on the ECHO model is also proposed. In order to improve the query performance, a refined character inversion method, called ARCIM, is proposed as the text‐access method of the Chinese textual database. The ARCIM can retrieve texts faster than a simple inversion method and requires less space overhead.  相似文献   

19.
《中国工程学刊》2012,35(5):509-514
As we know, current classification methods are mostly based on the vector space model, which only accounts for term frequency in the documents, and ignores important semantic relationships between key terms. We have proposed a system that uses integrated ontologies and natural language processing techniques to index texts. The traditional words matrix is replaced by a concepts-based matrix. For this purpose, we have developed fully automated methods for mapping keywords to their corresponding ontology concepts. Support vector machine, a successful machine learning technique, is used for classification. Experimental results show that the proposed method improves text classification performance significantly.  相似文献   

20.
This paper concerns the development and use of a new interdisciplinary graphical approach in the statistical analysis of complexity of sentence structure for scientometric purposes. A scheme in three-dimensional space (barycentric plot) is used for a graphical representation of scientific research text correlations between the number of characters, the number of words, and the number of complex syllable words for sentences of several monolingual corpuses. The barycentric plots do not only drastically increase the visual information content in a given corpus, but at equal conditions of text-based corpus, they also contribute to the comparative analysis of different kinds of subject, section, author-style, journal, field, etc. As illustrated in present study, the proposed graphical approach can have broad implications and practical applications not only in scientometric field, but also in statistical linguistics, stylistic text research, and informetric research. This article explores the interdisciplinary approach research and applications of different areas of knowledge.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号