首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
《微型机与应用》2018,(2):55-57
研究了基于语义的裁判文书成分分割的方法,旨在在语义理解的基础上,从裁判文书中获取判决书信息、原告信息、被告信息、案件事实、原告诉求、被告辩称、法院认定证据、法院观点、法律依据、判决结果等信息,从而减少法官工作量,以及帮助人们更好地了解案件。通过两个部分详解了系统的构建以及分割的实现,系统构建主要包括线下裁判文书采集、语料库训练,以及线上的分析模块。  相似文献   

2.
Pattern Analysis and Applications - This paper presents a new perspective of text area segmentation from document images using a novel adaptive thresholding for image enhancement. Using sliding...  相似文献   

3.
4.
Unsupervised texture segmentation using Gabor filters   总被引:88,自引:0,他引:88  
This paper presents a texture segmentation algorithm inspired by the multi-channel filtering theory for visual information processing in the early stages of human visual system. The channels are characterized by a bank of Gabor filters that nearly uniformly covers the spatial-frequency domain, and a systematic filter selection scheme is proposed, which is based on reconstruction of the input image from the filtered images. Texture features are obtained by subjecting each (selected) filtered image to a nonlinear transformation and computing a measure of “energy” in a window around each pixel. A square-error clustering algorithm is then used to integrate the feature images and produce a segmentation. A simple procedure to incorporate spatial information in the clustering process is proposed. A relative index is used to estimate the “true” number of texture categories.  相似文献   

5.
Despite the advantages of the traditional vector space model (VSM) representation, there are known deficiencies concerning the term independence assumption. The high dimensionality and sparsity of the text feature space and phenomena such as polysemy and synonymy can only be handled if a way is provided to measure term similarity. Many approaches have been proposed that map document vectors onto a new feature space where learning algorithms can achieve better solutions. This paper presents the global term context vector-VSM (GTCV-VSM) method for text document representation. It is an extension to VSM that: (i) it captures local contextual information for each term occurrence in the term sequences of documents; (ii) the local contexts for the occurrences of a term are combined to define the global context of that term; (iii) using the global context of all terms a proper semantic matrix is constructed; (iv) this matrix is further used to linearly map traditional VSM (Bag of Words—BOW) document vectors onto a ‘semantically smoothed’ feature space where problems such as text document clustering can be solved more efficiently. We present an experimental study demonstrating the improvement of clustering results when the proposed GTCV-VSM representation is used compared with traditional VSM-based approaches.  相似文献   

6.
Segmentation of a document image plays an important role in automatic document processing. In this paper, we propose a consensus-based clustering approach for document image segmentation. In this method, the foreground regions of a document image are grouped into a set of primitive blocks, and a set of features is extracted from them. Similarities among the blocks are computed on each feature using a hypothesis test-based similarity measure. Based on the consensus of these similarities, clustering is performed on the primitive blocks. This clustering approach is used iteratively with a classifier to label each primitive block. Experimental results show the effectiveness of the proposed method. It is further shown in the experimental results that the dependency of classification performance on the training data is significantly reduced.  相似文献   

7.
Text compression for dynamic document databases   总被引:2,自引:0,他引:2  
For compression of text databases, semi-static word-based methods provide good performance in terms of both speed and disk space, but two problems arise. First, the memory requirements for the compression model during decoding can be unacceptably high. Second, the need to handle document insertions means that the collection must be periodically recompressed if compression efficiency is to be maintained on dynamic collections. The authors show that with careful management the impact of both of these drawbacks can be kept small. Experiments with a word-based model and over 500 Mb of text show that excellent compression rates can be retained even in the presence of severe memory limitations on the decoder, and after significant expansion in the amount of stored text  相似文献   

8.
We present an algorithm for layout-independent document page segmentation based on document texture using multiscale feature vectors and fuzzy local decision information. Multiscale feature vectors are classified locally using a neural network to allow soft/fuzzy multi-class membership assignments. Segmentation is performed by integrating soft local decision vectors to reduce their “ambiguities”  相似文献   

9.
This paper proposes a novel method for matching images. The results can be used for a variety of applications: fully automatic morphing, object recognition, stereo photogrammetry, and volume rendering. Optimal mappings between the given images are computed automatically using multiresolutional nonlinear filters that extract the critical points of the images of each resolution. Parameters are set completely automatically by dynamical computation analogous to human visual systems. No prior knowledge about the objects is necessary The matching results can be used to generate intermediate views when given two different views of objects. When used for morphing, our method automatically transforms the given images. There is no need for manually specifying the correspondence between the two images. When used for volume rendering, our method reconstructs the intermediate images between cross-sections accurately, even when the distance between them is long and the cross-sections vary widely in shape. A large number of experiments has been carried out to show the usefulness and capability of our method  相似文献   

10.
杨洋  平西建 《微计算机信息》2006,22(13):224-225
为了满足办公自动化的实时性要求,本文提出了一种改进的自顶向下的图文分割算法。该方法利用文本行基线之间的距离自适应的确定结构元素的大小,克服自顶向下算法要求对页面有先验知识的缺点。实验表明,本文提出的算法分割准确,速度快。  相似文献   

11.
This paper describes a method for texture based segmentation. Texture features are extracted by applying a bank of Gabor filters using two-sided convolution strategy. Probability texture model is represented by Gaussian mixture that is trained with the Expectation-maximization algorithm. Texture similarity, obtained this way, is used like the input of a Graph cut method. We show that the combination of texture analysis and the Graph cut method produce good results.  相似文献   

12.
A procedure for image segmentation involving no image-dependent thresholds is described. The method involves not only detection of edges but also production of closed region boundaries. The method has been developed and tested on head and shoulder images.  相似文献   

13.

This work presents the application of a multistrategy approach to some document processing tasks. The application is implemented in an enhanced version of the incremental learning system INTHELEX. This learning module has been embedded as a learning component in the system architecture of the EU project COLLATE, which deals with the annotation of cultural heritage documents. Indeed, the complex shape of the material handled in the project has suggested that the addition of multistrategy capabilities is needed to improve effectiveness and efficiency of the learning process. Results proving the benefits of these strategies in specific classfication tasks are reported in the experimentation presented in this work.  相似文献   

14.
Compression distances have been applied to a broad range of domains because of their parameter-free nature, wide applicability and leading efficacy. However, they have a characteristic that can be a drawback when applied under particular circumstances. Said drawback is that when they are used to compare two very different-sized objects, they do not consider them to be similar even if they are related by a substring relationship. This work focuses on addressing this issue when compression distances are used to calculate similarities between documents. The approach proposed in this paper consists of combining document segmentation and document distortion. On the one hand, it is proposed to use document segmentation to tackle the above mentioned drawback. On the other hand, it is proposed to use document distortion to help compression distances to obtain more reliable similarities. The results show that combining both techniques provides better results than not applying them or applying them separately. The said results are consistent across datasets of diverse nature.  相似文献   

15.
Clustering is a very powerful data mining technique for topic discovery from text documents. The partitional clustering algorithms, such as the family of k-means, are reported performing well on document clustering. They treat the clustering problem as an optimization process of grouping documents into k clusters so that a particular criterion function is minimized or maximized. Usually, the cosine function is used to measure the similarity between two documents in the criterion function, but it may not work well when the clusters are not well separated. To solve this problem, we applied the concepts of neighbors and link, introduced in [S. Guha, R. Rastogi, K. Shim, ROCK: a robust clustering algorithm for categorical attributes, Information Systems 25 (5) (2000) 345–366], to document clustering. If two documents are similar enough, they are considered as neighbors of each other. And the link between two documents represents the number of their common neighbors. Instead of just considering the pairwise similarity, the neighbors and link involve the global information into the measurement of the closeness of two documents. In this paper, we propose to use the neighbors and link for the family of k-means algorithms in three aspects: a new method to select initial cluster centroids based on the ranks of candidate documents; a new similarity measure which uses a combination of the cosine and link functions; and a new heuristic function for selecting a cluster to split based on the neighbors of the cluster centroids. Our experimental results on real-life data sets demonstrated that our proposed methods can significantly improve the performance of document clustering in terms of accuracy without increasing the execution time much.  相似文献   

16.
Adaptive document block segmentation and classification   总被引:3,自引:0,他引:3  
This paper presents an adaptive block segmentation and classification technique for daily-received office documents having complex layout structures such as multiple columns and mixed-mode contents of text, graphics, and pictures. First, an improved two-step block segmentation algorithm is performed based on run-length smoothing for decomposing any document into single-mode blocks. Then, a rule-based block classification is used for classifying each block into the text, horizontal/vertical line, graphics, or-picture type. The document features and rules used are independent of character font and size and the scanning resolution. Experimental results show that our algorithms are capable of correctly segmenting and classifying different types of mixed-mode printed documents.  相似文献   

17.
Image segmentation is a major task of handwritten document image processing. Many of the proposed techniques for image segmentation are complementary in the sense that each of them using a different approach can solve different difficult problems such as overlapping, touching components, influence of author or font style etc. In this paper, a combination method of different segmentation techniques is presented. Our goal is to exploit the segmentation results of complementary techniques and specific features of the initial image so as to generate improved segmentation results. Experimental results on line segmentation methods for handwritten documents demonstrate the effectiveness of the proposed combination method.  相似文献   

18.
Microelectromechanical filters for signal processing   总被引:4,自引:0,他引:4  
Microelectromechanical filters based on coupled lateral microresonators are demonstrated. This new class of microelectromechanical systems (MEMS) has potential signal-processing applications for filters which require narrow bandwidth (high Q), good signal-to-noise ratio, and stable temperature and aging characteristics. Microfilters presented in this paper are made by surface-micromachining technologies and tested by using an off-chip modulation technique. The frequency range of these filters is from approximately 5 kHz to on the order of 1 MHz for polysilicon microstructures with suspension beams having a 2-μm-square cross section. A series-coupled resonator pair, designed for operation at atmospheric pressure, has a measured center frequency of 18.7 kHz and a pass bandwidth of 1.2 kHz. A planar hermetic sealing process has been developed to enable high quality factors for these mechanical filters and make possible wafer-level vacuum encapsulations. This process uses a low-stress silicon nitride shell for vacuum sealing, and experimental results show that a measured quality factor of 2200 for comb-shape microresonators can be achieved  相似文献   

19.
Achieving illumination invariance in the presence of large pose changes remains one of the most challenging aspects of automatic face recognition from low resolution imagery. In this paper, we propose a novel recognition methodology for their robust and efficient matching. The framework is based on outputs of simple image processing filters that compete with unprocessed greyscale input to yield a single matching score between two individuals. Specifically, we show how the discrepancy of the illumination conditions between query input and training (gallery) data set can be estimated implicitly and used to weight the contributions of the two competing representations. The weighting parameters are representation-specific (i.e. filter-specific), but not gallery-specific. Thus, the computationally demanding, learning stage of our algorithm is offline-based and needs to be performed only once, making the added online overhead minimal. Finally, we describe an extensive empirical evaluation of the proposed method in both a video and still image-based setup performed on five databases, totalling 333 individuals, over 1660 video sequences and 650 still images, containing extreme variation in illumination, pose and head motion. On this challenging data set our algorithm consistently demonstrated a dramatic performance improvement over traditional filtering approaches. We demonstrate a reduction of 50–75% in recognition error rates, the best performing method-filter combination correctly recognizing 97% of the individuals.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号