首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 73 毫秒
Legal text retrieval traditionally relies upon external knowledge sources such as thesauri and classification schemes, and an accurate indexing of the documents is often manually done. As a result not all legal documents can be effectively retrieved. However a number of current artificial intelligence techniques are promising for legal text retrieval. They sustain the acquisition of knowledge and the knowledge-rich processing of the content of document texts and information need, and of their matching. Currently, techniques for learning information needs, learning concept attributes of texts, information extraction, text classification and clustering, and text summarization need to be studied in legal text retrieval because of their potential for improving retrieval and decreasing the cost of manual indexing. The resulting query and text representations are semantically much richer than a set of key terms. Their use allows for more refined retrieval models in which some reasoning can be applied. This paper gives an overview of the state of the art of these innovativetechniques and their potential for legal text retrieval.  相似文献   

农业知识库检索系统以"服务农民、助农民增收、提高农业科技水平"为目标,充分发挥农业信息在农业农村发展中的重要功能和巨大潜力,为涉农人员提供高效、准确的农业科技知识。对于新疆少数民族聚居地区的涉农人员来说,深入研究跨语种农业知识库检索技术非常必要。介绍汉-维跨语种农业知识库检索关键技术:Lucene全文检索技术,维吾尔文处理以及汉维双语倒排索引的创建与检索。阐述了系统的设计要点:汉维双语农业知识库设计,知识录入途径以及语义检索模型设计。最后,展示了系统的实现效果。目前,该系统已投入使用,取得了良好的应用效果。  相似文献   

随着人们对互联网多语言信息需求的日益增长,跨语言词向量已成为一项重要的基础工具,并成功应用到机器翻译、信息检索、文本情感分析等自然语言处理领域。跨语言词向量是单语词向量的一种自然扩展,词的跨语言表示通过将不同的语言映射到一个共享的低维向量空间,在不同语言间进行知识转移,从而在多语言环境下对词义进行准确捕捉。近几年跨语言词向量模型的研究成果比较丰富,研究者们提出了较多生成跨语言词向量的方法。该文通过对现有的跨语言词向量模型研究的文献回顾,综合论述了近年来跨语言词向量模型、方法、技术的发展。按照词向量训练方法的不同,将其分为有监督学习、无监督学习和半监督学习三类方法,并对各类训练方法的原理和代表性研究进行总结以及详细的比较;最后概述了跨语言词向量的评估及应用,并分析了所面临的挑战和未来的发展方向。  相似文献   

随着法律文书数据越来越多,信息过载问题日益严重,快速且准确地在海量法律文书中进行检索显得非常必要。法律文本作为一种特殊的文本形式,具有篇幅较长、结构复杂、专业性强等特点,传统基于关键字的文本检索方法不能满足用户查询法律信息的需求,容易出现答非所问、检索不全等问题。此外,基于语义的文本检索方法,大多依赖于对含有大量标注数据的法律文本进行有监督学习,而法律文本数据的人工标注则严重依赖专家知识,导致其需要高昂的人力成本。该文提出一种基于无监督学习的法律文书检索模型,分别从法律概念、词语和词组 3 个方面进行多粒度无监督文本匹配,避免了没有训练数据导致的冷启动问题。在法律裁判文书数据集上进行检索实验的结果表明,与基准模型相比,该模型在 MAP、MRR 和 NDCG@10 指标上均有显著提升,取得了优秀的检索效果,具有有效性和先进性。  相似文献   

To help design an environment in which professionals without legal training can make effective use of public sector legal information on planning and the environment – for Add-Wijzer, a European e-government project – we evaluated their perceptions of usefulness and usability. In concurrent think-aloud usability tests, lawyers and non-lawyers carried out information retrieval tasks on a range of online legal databases. We found that non-lawyers reported twice as many difficulties as those with legal training (p = 0.001), that the number of difficulties and the choice of database affected successful completion, and that the non-lawyers had surprisingly few problems understanding legal terminology. Instead, they had more problems understanding the syntactical structure of legal documents and collections. The results support the constraint attunement hypothesis (CAH) of the effects of expertise on information retrieval, with implications for the design of systems to support the effective understanding and use of information.  相似文献   

Search and retrieval is gaining importance in the ink domain due to the increase in the availability of online handwritten data. However, the problem is challenging due to variations in handwriting between various writers, digitizers and writing conditions. In this paper, we propose a retrieval mechanism for online handwriting, which can handle different writing styles, specifically for Indian languages. The proposed approach provides a keyboard-based search interface that enables to search handwritten data from any platform, in addition to pen-based and example-based queries. One of the major advantages of this framework is that information retrieval techniques such as ranking relevance, detecting stopwords and controlling word forms can be extended to work with search and retrieval in the ink domain. The framework also allows cross-lingual document retrieval across Indian languages.  相似文献   

法律人工智能因其高效、便捷的特点,近年来受到社会各界的广泛关注。法律文书是法律在社会生活中最常见的表现形式,应用自然语言理解方法智能地处理法律文书内容是一个重要的研究和应用方向。该文梳理与总结面向法律文书的自然语言理解技术,首先介绍了五类面向法律文书的自然语言理解任务形式: 法律文书信息提取、类案检索、司法问答、法律文书摘要和判决预测。然后,该文探讨了运用现有自然语言理解技术应对法律文书理解的主要挑战,指出需要解决好法律文书与日常生活语言之间的表述差异性、建模好法律文书中特有的推理与论辩结构,并且需要将法条、推理模式等法律知识融入自然语言理解模型。  相似文献   

Considerable attention has been given to the accessibility of legal documents, such as legislation and case law, both in legal information retrieval (query formulation, search algorithms), in legal information dissemination practice (numerous examples of on-line access to formal sources of law), and in legal knowledge-based systems (by translating the contents of those documents to ready-to-use rule and case-based systems). However, within AI & law, it has hardly ever been tried to make the contents of sources of law, and the relations among them, more accessible to those without a legal education. This article presents a theory about translating sources of law into information accessible to persons without a legal education. It illustrates the theory by providing two elaborated examples of such translation ventures. In the first example, formal sources of law in the domain of exchanging police information are translated into rules of thumb useful for policemen. In the second example, the goal of providing non-legal professionals with insight into legislative procedures is translated into a framework for making available sources of law through an integrated legislative calendar. Although the theory itself does not support automating the several stages described, in this article some hints are given as to what such automation would have to look like.
Laurens MommersEmail:

Juris-Data is one of the largest case-study base in France. The case studies are indexed by legal classification elaborated by the Juris-Data Group. Knowledge engineering was used to design an intelligent interface for information retrieval based on this classification. The aim of the system is to help users find the case-study which is the most relevant to their own.The approach is potentially very useful, but for standardising it for other legal document bases it is necessary to extract a legal classification of the primary documents. Thus, a methodology for the construction of these classifications was designed together with a framework for index construction. The project led to the implementation of a Legal Case Studies Engineering Framework based on the accumulated experimentation and the methodologies designed. It consists of a set of computerised tools which support the life-cycle of the legal document from their processing by legal experts to their consultation by clients.  相似文献   

In certain bilingual and multi‐lingual societies, translated legal documents are as important as the original legal documents because they have the same legal status as the originals. However, there is little reported work on the retrieval and management of bilingual legal documents. We describe the design and development of a bilingual document retrieval and management prototype, called ELDoS, which is used by court interpreters and judges from the Hong Kong Judiciary. Since the speed of retrieval is a major concern for user acceptance, and therefore for widespread deployment of the system, the architecture of the prototype is designed to balance the workload of the client and server. Extensible Markup Language (XML) is used to mark up the bilingual legal documents for a variety of document retrieval and management tasks. XML enables the use of XML Stylesheet Language Transformation (XSLT) to align bilingual data in the client, instead of the server, and improve alignment speed linearly with respect to the size of the document, using a high‐end PC, when the server has no concurrent access. The design of the interface was continually improved after extensive consultation with court interpreters and after the user acceptance tests. In our evaluation, the facilities for highlighting translated terms have a macro‐averaged precision of 90+% and a macro‐average recall of 80+%, which were considered acceptable by our users. We believe that the experience in the design and development of this prototype is applicable to other language pairs as well as to other domains. Copyright © 2002 John Wiley & Sons, Ltd.  相似文献   

Translated or cross-lingual plagiarism is defined as the translation of someone else’s work or words without marking it as such or without giving credit to the original author. The existence of cross-lingual plagiarism is not new, but only in recent years, due to the rapid development of the natural language processing, appeared the first algorithms which tackled the difficult task of detecting it. Most of these algorithms utilize machine translation to compare texts written in different languages. We propose a different method, which can effectively detect translations between language-pairs where machine translations still produce low quality results. Our new algorithm presented in this paper is based on information retrieval (IR) and a dictionary based similarity metric. The preprocessing of the candidate documents for the IR is computationally intensive, but easily parallelizable. We propose a desktop Grid solution for this task. As the application is time sensitive and the desktop Grid peers are unreliable, a resubmission mechanism is used which assures that all jobs of a batch finish within a reasonable time period without dramatically increasing the load on the whole system.  相似文献   

We discuss the development of factual and bibliographical databases and database systems, including management systems and information retrieval systems. Special reference is made to the International Bibliography on Computers and Law, with respect to both the current situation and future plans. Rosa Maria Di Giorgi has, since 1982, been a researcher at the Instituto per la Documentazione Giuridica of Florence of the Italian National Research Council. She took her degree in Letters and Philosophy at the University of Florence in 1979 and has completed an advanced course in computer applications. Her research activity concentrates on legal informatics and on advanced automated legal documentary systems and advisory legal expert systems. She is an editor of Informatica e Dirrito and of the International Bibliography on Computers and the Law.  相似文献   

研究基于矩阵分解的词嵌入方法,提出统一的描述模型,并应用于中英跨语言词嵌入问题。以双语对齐语料为知识源,提出跨语言关联词计算方法和两种点关联测度的计算方法: 跨语言共现计数和跨语言点互信息。分别设计目标函数学习中英跨语言词嵌入。从目标函数、语料数据、向量维数等角度进行实验,结果表明,在中英跨语言文档分类中以前者作为点关联测度最高得到87.04%的准确率;在中英跨语言词义相似度计算中,后者作为点关联测度得到更好的性能,同时在英—英词义相似度计算中的性能略高于主流的英语词嵌入。  相似文献   

翻译等价对在词典编纂、机器翻译和跨语言信息检索中有着广泛的应用。文章从双语句对的译文等价树中抽取翻译等价对。使用译文直译率、短语对齐概率和目标语-源语言短语长度差异等特征对自动获取的等价对进行评价。提出了一种基于多重线性回归模型的等价对评价方法,并结合N-Best策略对候选翻译等价对进行过滤。实验结果表明:在开放测试中,基于多重线性回归模型的等价对评价及过滤方法其性能要优于其它方法。  相似文献   

Information retrieval (IR) is the science of identifying documents or sub-documents from a collection of information or database. The collection of information does not necessarily be available in only one language as information does not depend on languages. Monolingual IR is the process of retrieving information in query language whereas cross-lingual information retrieval (CLIR) is the process of retrieving information in a language that differs from query language. In current scenario, there is a strong demand of CLIR system because it allows the user to expand the international scope of searching a relevant document. As compared to monolingual IR, one of the biggest problems of CLIR is poor retrieval performance that occurs due to query mismatching, multiple representations of query terms and untranslated query terms. Query expansion (QE) is the process or technique of adding related terms to the original query for query reformulation. Purpose of QE is to improve the performance and quality of retrieved information in CLIR system. In this paper, QE has been explored for a Hindi–English CLIR in which Hindi queries are used to search English documents. We used Okapi BM25 for documents ranking, and then by using term selection value, translated queries have been expanded. All experiments have been performed using FIRE 2012 dataset. Our result shows that the relevancy of Hindi–English CLIR can be improved by adding the lowest frequency term.  相似文献   

International crime and terrorism have drawn increasing attention in recent years. Retrieving relevant information from criminal records and suspect communications is important in combating international crime and terrorism. However, most of this information is written in languages other than English and is stored in various locations. Information sharing between countries therefore presents the challenge of cross-lingual semantic interoperability. In this work, we propose a new approach – the associate constraint network – to generate a cross-lingual concept space from a parallel corpus, and benchmark it with a previously developed technique, the Hopfield network. The associate constraint network is a constraint programming based algorithm, and the problem of generating the cross-lingual concept space is formulated as a constraint satisfaction problem. Nodes and arcs in an associate constraint network represent extracted terms from parallel corpora and their associations. Constraints are defined for the nodes in the associate constraint network, and node consistency and network satisfaction are also defined. Backmarking is developed to search for a feasible solution. Our experimental results show that the associate constraint network outperforms the Hopfield network in precision, recall and efficiency. The cross-lingual concept space that is generated with this method can assist crime analysts to determine the relevance of criminals, crimes, locations and activities in multiple languages, which is information that is not available in traditional thesauri and dictionaries.  相似文献   

The importance of reasoning in law is pointed out. Law and jurisprudence belong to the reasoning-conscious disciplines. Accordingly, there is a long tradition of logic in law. The specific methods of professional work in law are to be seen in close connection with legal reasoning. The advent of computers at first did not touch upon legal reasoning (or the professional work in law). At first computers could be used only for general auxiliary functions (e.g., numerical calculations in tax law). Gradually, the use of computers for auxiliary functions in law has become more specific and more sophisticated (e.g., legal information retrieval), touching more closely upon professional legal work. Moreover, renewed interest in AI has also fostered interest in AI in law, especially for legal expert systems. AI techniques can be used in support of legal reasoning. Yet until now legal expert systems have remained in the research and development stage and have hardly succeeded in becoming a profitable tool for the profession. Therefore it is hoped that the two lines of computer support, for auxiliary functions in law and for immediate support of legal reasoning, may unite in the future.Herbert Fiedler is professor of Legal Informatics, general theory of law and penal law in the Department of Economics and Law at the University of Bonn.  相似文献   

In this article we evaluate the BankXX program from several perspectives. BankXX is a case-based legal argument program that retrieves cases and other legal knowledge pertinent to a legal argument through a combination of heuristic search and knowledge-based indexing. The program is described in detail in a companion article in Artificial Intelligence and Law 4: 1--71, 1996. Three perspectives are used to evaluate BankXX:(1) classical information retrieval measures of precision and recall applied against a hand-coded baseline; (2) knowledge-representation and case-based reasoning, where the baseline is provided by the functionality of a well-known case-based argument program, HYPO (Ashley, 1990); and (3) search, in which the performance of BankXX run with various parameter settings, for instance, resource limits, is compared. In this article we report on an extensive series of experiments performed to evaluate the program. We also describe two additional experiments concerning(1) the program's search behavior; and (2) the use of a modified form of precision and recall based on case similarity. Finally we offer some general conclusions that might be drawn from these particular experiments.  相似文献   

Web legal information retrieval systems need the capability to reason with the knowledge modeled by legal ontologies. Using this knowledge it is possible to represent and to make inferences about the semantic content of legal documents. In this paper a methodology for applying NLP techniques to automatically create a legal ontology is proposed. The ontology is defined in the OWL semantic web language and it is used in a logic programming framework, EVOLP+ISCO, to allow users to query the semantic content of the documents. ISCO allows an easy and efficient integration of declarative, object-oriented and constraint-based programming techniques with the capability to create connections with external databases. EVOLP is a dynamic logic programming framework allowing the definition of rules for actions and events. An application of the proposed methodology to the legal web information retrieval system of the Portuguese Attorney General’s Office is described.  相似文献   

A method to identify ontology components is presented in this article. The method relies on Natural Language Processing (NLP) techniques to extract concepts and relations among these concepts. This method is applied in the legal field to build an ontology dedicated to information retrieval. Legal texts on which the method is performed are carefully chosen as describing and conceptualizing the legal domain. We suggest that this method can help legal ontology designers and may be used while building ontologies dedicated to other tasks than information retrieval.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号