首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Legal text retrieval traditionally relies upon external knowledge sources such as thesauri and classification schemes, and an accurate indexing of the documents is often manually done. As a result not all legal documents can be effectively retrieved. However a number of current artificial intelligence techniques are promising for legal text retrieval. They sustain the acquisition of knowledge and the knowledge-rich processing of the content of document texts and information need, and of their matching. Currently, techniques for learning information needs, learning concept attributes of texts, information extraction, text classification and clustering, and text summarization need to be studied in legal text retrieval because of their potential for improving retrieval and decreasing the cost of manual indexing. The resulting query and text representations are semantically much richer than a set of key terms. Their use allows for more refined retrieval models in which some reasoning can be applied. This paper gives an overview of the state of the art of these innovativetechniques and their potential for legal text retrieval.  相似文献   

2.
Most of the common techniques in text retrieval are based on the statistical analysis terms (words or phrases). Statistical analysis of term frequency captures the importance of the term within a document only. Thus, to achieve a more accurate analysis, the underlying model should indicate terms that capture the semantics of text. In this case, the model can capture terms that represent the concepts of the sentence, which leads to discovering the topic of the document. In this paper, a new concept-based retrieval model is introduced. The proposed concept-based retrieval model consists of conceptual ontological graph (COG) representation and concept-based weighting scheme. The COG representation captures the semantic structure of each term within a sentence. Then, all the terms are placed in the COG representation according to their contribution to the meaning of the sentence. The concept-based weighting analyzes terms at the sentence and document levels. This is different from the classical approach of analyzing terms at the document level only. The weighted terms are then ranked, and the top concepts are used to build a concept-based document index for text retrieval. The concept-based retrieval model can effectively discriminate between unimportant terms with respect to sentence semantics and terms which represent the concepts that capture the sentence meaning. Experiments using the proposed concept-based retrieval model on different data sets in text retrieval are conducted. The experiments provide comparison between traditional approaches and the concept-based retrieval model obtained by the combined approach of the conceptual ontological graph and the concept-based weighting scheme. The evaluation of results is performed using three quality measures, the preference measure (bpref), precision at 10 documents retrieved (P(10)) and the mean uninterpolated average precision (MAP). All of these quality measures are improved when the newly developed concept-based retrieval model is used, confirming that such model enhances the quality of text retrieval.  相似文献   

3.
王春兰 《计算机时代》2021,(11):60-62,67
文件恢复对电子取证行业非常重要,而文件签名恢复文件是一种常用的文件恢复方法.其过程是,搜索文件签名以确定文件起始扇区号;根据文件签名尾或大小来估算文件的结尾扇区号;把起始和结尾扇区之间的内容复制生成一个新文件即可得到被删文件.如果文件在数据区中是连续存放的,该恢复方法的成功率非常高.文章以FAT32文件系统为例,把一个JPG文件彻底删除,再借助WinHex软件成功地对其进行恢复.  相似文献   

4.
We describe a shape-based visual interface for information retrieval and interactive exploration that exploits shape recognition. Our exploratory system uses procedurally generated shapes coupled with an underlying text-retrieval engine. A visual interface based on 3D shapes (glyphs) enhances traditional text-based queries and summarization. Our interface lets users visualize multidimensional relationships among documents and perceive more information than with conventional text-based interfaces. It promotes information overview and drill-down in support of analysis. Before describing our visual interface and application, we introduce information retrieval within the context of data mining and provide a brief overview of procedural shape generation. We then describe our current system and give a few relevant examples. Finally, we offer some ideas for future enhancements and direction  相似文献   

5.
In this paper, we present a query-driven indexing/retrieval strategy for efficient full text retrieval from large document collections distributed within a structured P2P network. Our indexing strategy is based on two important properties: (1) the generated distributed index stores posting lists for carefully chosen indexing term combinations that are frequently present in user queries, and (2) the posting lists containing too many document references are truncated to a bounded number of their top-ranked elements. These two properties guarantee acceptable latency and bandwidth requirements, essentially because the number of indexing term combinations remains scalable and the posting lists transmitted during retrieval never exceed a constant size. A novel index update mechanism efficiently handles adding of new documents to the document collection. Thus, the generated distributed index corresponds to a constantly evolving query-driven indexing structure that efficiently follows current information needs of the users and changes in the document collection.We show that the size of the index and the generated indexing/retrieval traffic remains manageable even for Web-size document collections at the price of a marginal loss in precision for rare queries. Our theoretical analysis and experimental results provide convincing evidence about the feasibility of the query-driven indexing strategy for large scale P2P text retrieval.  相似文献   

6.
Cross-lingual text retrieval (CLTR) is a technique for locating relevant documents in different languages. The authors have developed fuzzy conceptual indexing (FCI) to extend CLTR to include documents that share concepts but don't contain exact translations of query terms. In FCI, documents and queries are represented as a function of language-independent concepts, thus enabling direct mapping between them across multiple languages. Experimental results suggest that concept-based CLTR outperforms translation-based CLTR in identifying conceptually relevant documents.  相似文献   

7.
Automatic text categorization and its application to text retrieval   总被引:4,自引:0,他引:4  
We develop an automatic text categorization approach and investigate its application to text retrieval. The categorization approach is derived from a combination of a learning paradigm known as instance-based learning and an advanced document retrieval technique known as retrieval feedback. We demonstrate the effectiveness of our categorization approach using two real-world document collections from the MEDLINE database. Next, we investigate the application of automatic categorization to text retrieval. Our experiments clearly indicate that automatic categorization improves the retrieval performance compared with no categorization. We also demonstrate that the retrieval performance using automatic categorization achieves the same retrieval quality as the performance using manual categorization. Furthermore, detailed analysis of the retrieval performance on each individual test query is provided  相似文献   

8.
9.
门限签名是一种特殊的数字签名,它在现实生活中具有广泛的用途。一个(t,n)门限签名方案是指n个成员组成的群中,群中任何不少于t个成员合作就能产生签名,然而任何少于t个成员合作都无法伪造签名。但是,现有的许多签名算法都存在一个普遍的缺陷,即不能抵抗合谋攻击,换句话说,任意t个成员合谋就可以恢复出秘密系统参数,从而就可以伪造其他签名小组签名。针对较小的n和t以及较大n和t分别提出两种有效的抗合谋攻击的门限签名方案,当n和t较小时,给出了一种基于分组秘密共享的RSA门限签名算法;当n和t比较大时,提供了一种具有指定签名者的方案来解决合谋攻击问题。  相似文献   

10.
With the great advantages of digitization, more and more documents are being transformed into digital representations. Most content digitization of documents is performed by scanners or digital cameras. However, the transformation might degrade the image quality caused by lighting variations, i.e. uneven illumination distribution. In this paper we describe a new approach for text images to compensate uneven illumination distribution with a high degree of text recognition. Our proposed scheme is implemented by enhancing the contrast of the scanned documents, and then generating an edge map from the contrast-enhanced image for locating text area. With the information of the text location, a light distribution image (background) is created to assist the producing of the final light balanced image. Simulation results demonstrate that our approach is superior to the previous works of Hsia et al. (2005, 2006).  相似文献   

11.
The growing amount of on-line data demands efficient parallel and distributed indexing mechanisms to manage large resource requirements and unpredictable system failures. Parallel and distributed indices built using commodity hardware like personal computers (PCs) can substantially save cost because PCs are produced in bulk, achieving the scale of economy. However, PCs have limited amount of random access memory (RAM) and the effective utilization of RAM for in-memory inversion is crucial. This paper presents an analytical investigation and an empirical evaluation of storage-efficient inmemory extensible inverted files, which are represented by fixed- or variable-sized linked list nodes. The size of these linked list nodes is determined by minimizing the storage wastes or maximizing storage utilization under different conditions, which lead to different storage allocation schemes. Minimizing storage wastes also reduces the number of address indirections (i.e., chaining). We evaluated our storage allocation schemes using a number of reference collections. We found that the arrival rate scheme is the best in terms of both storage utilization and the mean number of chainings per term. The final storage utilization can be over 90% in our evaluation if there is a sufficient number of documents indexed. The mean number of chainings is not large (less than 2.6 for all the reference collections). We have also showed that our best storage allocation scheme can be used for our extensible compressed inverted file. The final storage utilization of the extensible compressed inverted file can be over 90% in our evaluation provided that there is a sufficient number of documents indexed. The proposed storage allocation schemes can also be used by compressed extensible inverted files with word positions  相似文献   

12.
A video-on-demand (VOD) server needs to store hundreds of movie titles and to support thousands of concurrent accesses. This, technically and economically, imposes a great challenge on the design of the disk storage subsystem of a VOD server. Due to different demands for different movie titles, the numbers of concurrent accesses to each movie can differ a lot. We define access profile as the number of concurrent accesses to each movie title that should be supported by a VOD server. The access profile is derived based on the popularity of each movie title and thus serves as a major design goal for the disk storage subsystem. Since some popular (hot) movie titles may be concurrently accessed by hundreds of users and a current high-end magnetic disk array (disk) can only support tens of concurrent accesses, it is necessary to replicate and/or stripe the hot movie files over multiple disk arrays. The consequence of replication and striping of hot movie titles is the potential increase on the required number of disk arrays. Therefore, how to replicate, stripe, and place the movie files over a minimum number of magnetic disk arrays such that a given access profile can be supported is an important problem. In this paper, we formulate the problem of the video file allocation over disk arrays, demonstrate that it is a NP-hard problem, and present some heuristic algorithms to find the near-optimal solutions. The result of this study can be applied to the design of the storage subsystem of a VOD server to economically minimize the cost or to maximize the utilization of disk arrays.  相似文献   

13.
The continually growing Web challenges information retrieval systems to deliver data quickly. The authors' technique combines several data compression features to provide economical storage, faster indexing, and accelerated searches  相似文献   

14.
互联网文本数量持续爆炸式增长,用户通过互联网查找信息变得更加困难,响应时间得不到满足。针对藏文本身的语言学特点,探讨一种面向信息搜索的藏文文本索引建立策略,建立一种高效的藏文文本索引,以提高藏文信息检索速度。  相似文献   

15.
16.
Efficiency of full text retrieval using signatures depends on the number of filtering and the reduction of the original text, but there has been no discussion how a signature is constructed keeping the worst-case filtering ratio. In order to consider this problem, we present a technique of constructing signatures by using an appearance probability of strings in a textual data. It enables us to retrieve any keywords in expected worst-case searching time.

A partial appearance probability is proposed because the overall probability for the whole text takes a lot of time building signatures. From the simulation result, it turns can't that the worst-case filtering ratio of the presented method can keep the expected ratio while that of the traditional method degrades zero.  相似文献   

17.
Imaged document text retrieval without OCR   总被引:6,自引:0,他引:6  
We propose a method for text retrieval from document images without the use of OCR. Documents are segmented into character objects. Image features, namely the vertical traverse density (VTD) and horizontal traverse density (HTD), are extracted. An n-gram-based document vector is constructed for each document based on these features. Text similarity between documents is then measured by calculating the dot product of the document vectors. Testing with seven corpora of imaged textual documents in English and Chinese as well as images from the UW1 (University of Washington 1) database confirms the validity of the proposed method  相似文献   

18.
19.
通过巧妙设置公共参数,提出一种新的基于身份的高效环签名方案,利用签名参数和身份的线性关系减少验证时双线性对的个数.在标准模型下证明其能抵抗签名伪造攻击,且具有无条件匿名性.与现有标准模型下基于身份的环签名方案相比,对于n个成员的环,签名验证仅需要2个双线性对运算,因此签名验证效率有很大提高.  相似文献   

20.
Xuan Hong 《Information Sciences》2009,179(24):4243-4248
Mobile agents can migrate across different execution environments through the network. One important task of a mobile agent is to act as a proxy signer to sign a digital signature on behalf of the agent owner. As the agent and the remote hosts are not trustworthy, or are probably malicious, there are great challenges for the task. In this paper, we propose an efficient, secure (t,n) threshold proxy signature scheme based on the RSA cryptosystem. The proposed scheme shares the proxy signing key with a simple Lagrange formula. However, it does not reveal any secret information. Owing to its simple algorithm and few parameter requirements, the proposed scheme requires few calculations and few transactions. The proxy signature generation stage and the proxy signature combining stage are completely non-interactive. Furthermore, the size of the partial proxy signing key and that of the partial proxy signature are constant and independent of the number of proxy signers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号