首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 36 毫秒
1.
BankXX: Supporting legal arguments through heuristic retrieval   总被引:2,自引:2,他引:0  
The BankXX system models the process of perusing and gathering information for argument as a heuristic best-first search for relevant cases, theories, and other domain-specific information. As BankXX searches its heterogeneous and highly interconnected network of domain knowledge, information is incrementally analyzed and amalgamated into a dozen desirable ingredients for argument (called argument pieces), such as citations to cases, applications of legal theories, and references to prototypical factual scenarios. At the conclusion of the search, BankXX outputs the set of argument pieces filled with harvested material relevant to the input problem situation.This research explores the appropriateness of the search paradigm as a framework for harvesting and mining information needed to make legal arguments. In this article, we describe how legal research fits the heuristic search framework and detail how this model is used in BankXX. We describe the BankXX program with emphasis on its representation of legal knowledge and legal argument. We describe the heuristic search mechanism and evaluation functions that drive the program. We give an extended example of the processing of BankXX on the facts of an actual legal case in BankXX's application domain — the good faith question of Chapter 13 personal bankruptcy law. We discuss closely related research on legal knowledge representation and retrieval and the use of search for case retrieval or tasks related to argument creation. Finally we review what we believe are the contributions of this research to the understanding of the diverse disciplines it addresses.This research was supported in part by grant No. 90-0359 from the Air Force Office of Sponsored Research and NSF grant No. EEC-9209623 State/University/Industry Cooperative Research on Intelligent Information Retrieval.  相似文献   

2.
Developing search strategies for detecting relevant experiments   总被引:2,自引:0,他引:2  
Our goal is to analyze the optimality of search strategies for use in systematic reviews of software engineering experiments. Studies retrieval is an important problem in any evidence-based discipline. This question has not been examined for evidence-based software engineering as yet. We have run several searches exercising different terms denoting experiments to evaluate their recall and precision. Based on our evaluation, we propose using a high recall strategy when there are plenty of resources or the results need to be exhaustive. For any other case, we propose optimal, or even acceptable, search strategies. As a secondary goal, we have analysed trends and weaknesses in terminology used in articles reporting software engineering experiments. We have found that it is impossible for a search strategy to retrieve 100% of the experiments of interest (as happens in other experimental disciplines), because of the shortage of reporting standards in the community.  相似文献   

3.
This paper presents the QA-Pagelet as a fundamental data preparation technique for large-scale data analysis of the deep Web. To support QA-Pagelet extraction, we present the Thor framework for sampling, locating, and partioning the QA-Pagelets from the deep Web. Two unique features of the Thor framework are 1) the novel page clustering for grouping pages from a deep Web source into distinct clusters of control-flow dependent pages and 2) the novel subtree filtering algorithm that exploits the structural and content similarity at subtree level to identify the QA-Pagelets within highly ranked page clusters. We evaluate the effectiveness of the Thor framework through experiments using both simulation and real data sets. We show that Thor performs well over millions of deep Web pages and over a wide range of sources, including e-commerce sites, general and specialized search engines, corporate Web sites, medical and legal resources, and several others. Our experiments also show that the proposed page clustering algorithm achieves low-entropy clusters, and the subtree filtering algorithm identifies QA-Pagelets with excellent precision and recall.  相似文献   

4.
实体属性挖掘(slot filling,SF)旨在从大规模文档集中挖掘给定实体(称作查询)的特定属性信息。实体搜索是SF的重要组成部分,负责检索包含给定查询的文档(称为相关文档),供后续模块从中抽取属性信息。目前,SF领域关于实体搜索的研究较少,使用的基于布尔逻辑的检索模型忽略了实体查询的特点,仅使用查询的词形信息,受限于查询歧义性,检索结果准确率较低。针对这一问题,该文提出一种基于跨文档实体共指消解(cross document coreference resolution,CDCR)的实体搜索模型。该方法通过对召回率较高但准确率较低的候选结果进行CDCR,过滤不包含与给定实体共指实体的文档,提高检索结果的准确率。为了降低过滤造成的召回率损失,该文使用伪相关反馈方法扩充查询实体的描述信息。实验结果显示,相比于基准系统,该方法能有效提升检索结果,准确率和F1分别提升5.63%、2.56%。  相似文献   

5.
This article describes recent jurisprudential accountsof analogical legal reasoning andcompares them in detail to the computational modelof case-based legal argument inCATO. The jurisprudential models provide a theoryof relevance based on low-levellegal principles generated in a process ofcase-comparing reflective adjustment. Thejurisprudential critique focuses on the problemsof assigning weights to competingprinciples and dealing with erroneously decidedprecedents. CATO, a computerizedinstructional environment, employs ArtificialIntelligence techniques to teach lawstudents how to make basic legal argumentswith cases. The computational modelhelps students test legal hypotheses againsta database of legal cases, draws analogiesto problem scenarios from the database, andcomposes arguments by analogy with a setof argument moves. The CATO model accountsfor a number of the important featuresof the jurisprudential accounts, includingimplementing a kind of reflective adjustment.It also avoids some of the problems identifiedin the critique; for instance, it deals withweights in a non-numeric, context-sensitivemanner. The article concludes by describingthe contributions AI research can make tojurisprudential investigations of complexcognitive phenomena of legal reasoning. Forinstance, unlike the jurisprudential models,CATO provides a detailed account of how togenerate multiple interpretations of a citedcase, downplaying or emphasizing the legalsignificance of distinctions in terms of thepurposes of the law as the argument contextdemands.  相似文献   

6.

The distribution of documents over two classes in binary text categorization problem is generally uneven where resampling approaches are shown to improve F 1 scores. The improvement achieved is mainly due to the gain in recall where precision may deteriorate. Since precision is the primary concern in some applications, achieving higher F 1 scores with a desired level of trade-off between precision and recall is important. In this study, we present an analytical comparison between unanimity and majority voting rules. It is shown that unanimity rule can provide better F 1 scores compared to majority voting when an ensemble of high recall but low precision classifiers is considered. Then, category-based undersampling is proposed to generate high recall members. The experiments conducted on three datasets have shown that superior F 1 scores can be realized compared to the support vector machines(SVM)-based baseline system and voting over a random undersampling-based ensemble.

  相似文献   

7.
Detecting topics from Twitter streams has become an important task as it is used in various fields including natural disaster warning, users opinion assessment, and traffic prediction. In this article, we outline different types of topic detection techniques and evaluate their performance. We categorize the topic detection techniques into five categories which are clustering, frequent pattern mining, Exemplar-based, matrix factorization, and probabilistic models. For clustering techniques, we discuss and evaluate nine different techniques which are sequential k-means, spherical k-means, Kernel k-means, scalable Kernel k-means, incremental batch k-means, DBSCAN, spectral clustering, document pivot clustering, and Bngram. Moreover, for matrix factorization techniques, we analyze five different techniques which are sequential Latent Semantic Indexing (LSI), stochastic LSI, Alternating Least Squares (ALS), Rank-one Downdate (R1D), and Column Subset Selection (CSS). Additionally, we evaluate several other techniques in the frequent pattern mining, Exemplar-based, and probabilistic model categories. Results on three Twitter datasets show that Soft Frequent Pattern Mining (SFM) and Bngram achieve the best term precision, while CSS achieves the best term recall and topic recall in most of the cases. Moreover, Exemplar-based topic detection obtains a good balance between the term recall and term precision, while achieving a good topic recall and running time.  相似文献   

8.
This paper studies the use of hypothetical and value-based reasoning in US Supreme-Court cases concerning the United States Fourth Amendment. Drawing upon formal AI & Law models of legal argument a semi-formal reconstruction is given of parts of the Carney case, which has been studied previously in AI & law research on case-based reasoning. As part of the reconstruction, a semi-formal proposal is made for extending the formal AI & Law models with forms of metalevel reasoning in several argument schemes. The result is compared with Rissland’s (1989) analysis in terms of dimensions and Ashley’s (2008) analysis in terms of his process model of legal argument with hypotheticals.  相似文献   

9.
Arguments and cases: An inevitable intertwining   总被引:4,自引:4,他引:0  
We discuss several aspects of legal arguments, primarily arguments about the meaning of statutes. First, we discuss how the requirements of argument guide the specification and selection of supporting cases and how an existing case base influences argument formation. Second, we present,our evolving taxonomy of patterns of actual legal argument. This taxonomy builds upon our much earlier work on argument moves and also on our more recent analysis of how cases are used to support arguments for the interpretation of legal statutes. Third, we show how the theory of argument used by CABARET, a hybrid case-based/rule-based reasoner, can support many of the argument patterns in our taxonomy.This work was supported in part by the National Science Foundation, contract IRI-890841, the Air Force Office of Sponsored Research under contract 90-0359, the Office of Naval Research under a University Research Initiative Grant, contract N00014-87-K-0238, and a grant from GTE Laboratories, Inc., Waltham, Mass.  相似文献   

10.
Automatically generating a brief summary for legal-related public opinion news (LPO-news, which contains legal words or phrases) plays an important role in rapid and effective public opinion disposal. For LPO-news, the critical case elements which are significant parts of the summary may be mentioned several times in the reader comments. Consequently, we investigate the task of comment-aware abstractive text summarization for LPO-news, which can generate salient summary by learning pivotal case elements from the reader comments. In this paper, we present a hierarchical comment-aware encoder (HCAE), which contains four components: 1) a traditional sequenceto-sequence framework as our baseline; 2) a selective denoising module to filter the noisy of comments and distinguish the case elements; 3) a merge module by coupling the source article and comments to yield comment-aware context representation; 4) a recoding module to capture the interaction among the source article words conditioned on the comments. Extensive experiments are conducted on a large dataset of legal public opinion news collected from micro-blog, and results show that the proposed model outperforms several existing state-of-the-art baseline models under the ROUGE metrics.  相似文献   

11.
李文  陈叶旺  彭鑫  赵文耘 《计算机科学》2010,37(10):138-142
词语一概念映射是基于本体的语义检索的重要一环,对语义检索的查准率及查全率有很大的影响。在传统的基于关键词匹配的方法中,通常从词语一概念的共现程度来计算它们的相关度,这种方法没有考虑概念的属性及属性值,即丢失了概念的语义信息。针对这一问题,提出了一种词语一概念映射方法,该方法基于本体三元组一文档标注结果,利用概念一文档与词语一文档两重关系,首先计算出词语一概念的相关度与置信度,再实现词语一概念的映射。实验结果表明,该方法有效地提高了检索的效果。  相似文献   

12.
Due to the large number of spelling variants found in historical texts, standard methods of Information Retrieval (IR) fail to produce satisfactory results on historical document collections. In order to improve recall for search engines, modern words used in queries have to be associated with corresponding historical variants found in the documents. In the literature, the use of (1) special matching procedures and (2) lexica for historical language have been suggested as two alternative ways to solve this problem. In the first part of the paper, we show how the construction of matching procedures and lexica may benefit from each other, leading the way to a combination of both approaches. A tool is presented where matching rules and a historical lexicon are built in an interleaved way based on corpus analysis. In the second part of the paper, we ask if matching procedures alone suffice to lift IR on historical texts to a satisfactory level. Since historical language changes over centuries, it is not simple to obtain an answer. We present experiments where the performance of matching procedures in text collections from four centuries is studied. After classifying missed vocabulary, we measure precision and recall of the matching procedure for each period. Results indicate that for earlier periods, matching procedures alone do not lead to satisfactory results. We then describe experiments where the gain for recall obtained from historical lexica of distinct sizes is estimated.  相似文献   

13.
Mining Web informative structures and contents based on entropy analysis   总被引:3,自引:0,他引:3  
We study the problem of mining the informative structure of a news Web site that consists of thousands of hyperlinked documents. We define the informative structure of a news Web site as a set of index pages (or referred to as TOC, i.e., table of contents, pages) and a set of article pages linked by these TOC pages. Based on the Hyperlink Induced Topics Search (HITS) algorithm, we propose an entropy-based analysis (LAMIS) mechanism for analyzing the entropy of anchor texts and links to eliminate the redundancy of the hyperlinked structure so that the complex structure of a Web site can be distilled. However, to increase the value and the accessibility of pages, most of the content sites tend to publish their pages with intrasite redundant information, such as navigation panels, advertisements, copy announcements, etc. To further eliminate such redundancy, we propose another mechanism, called InfoDiscoverer, which applies the distilled structure to identify sets of article pages. InfoDiscoverer also employs the entropy information to analyze the information measures of article sets and to extract informative content blocks from these sets. Our result is useful for search engines, information agents, and crawlers to index, extract, and navigate significant information from a Web site. Experiments on several real news Web sites show that the precision and the recall of our approaches are much superior to those obtained by conventional methods in mining the informative structures of news Web sites. On the average, the augmented LAMIS leads to prominent performance improvement and increases the precision by a factor ranging from 122 to 257 percent when the desired recall falls between 0.5 and 1. In comparison with manual heuristics, the precision and the recall of InfoDiscoverer are greater than 0.956.  相似文献   

14.
15.
FastMap、SparseMap、BoostMap被认为是适用于任何度量空间的嵌入方法。然而之前的研究者高估了它们的适用性,它们在基于关键词的度量空间中并不适用。为了评估它们在关键词空间中的适用性,通过将它们实例化到基于关键词的相似性搜索的场景中,利用嵌入方法和局部敏感哈希相结合的方法,针对它们的嵌入效果进行了研究。重点从精确度、召回率、应力(stress)和距离保存效率方面,给出了它们在不同数据集上的实验结果。发现它们在基于关键词的度量空间中的嵌入效果并不好,得出了它们并不适用于所有的度量空间的结论,并分析了其效果不好的原因。  相似文献   

16.
查询扩展是一种改善信息检索召回率的重要技术。该文根据维基百科和搜索引擎各自的优点来实现查询词的扩展,试图提高检索结果top N的准确率。由于维基百科篇章中存在着大量的超链接,这些超链接中包含着与主题紧密相关的词条,通过提取这些词条,来实现基于维基百科的扩展。实验基于搜索引擎伪相关反馈的查询扩展作为baseline,分别对单语扩展系统和中英文跨语言扩展系统进行检测。实验结果表明本文的方法相比baseline系统,单语系统中MAP值提高6.41%,跨语言系统中Top10-precision值提高10.90%。  相似文献   

17.
基于概念的论文相似性检索   总被引:1,自引:0,他引:1       下载免费PDF全文
Web上越来越多的论文给我们提出了一个新的课题:如何检索满足需求的论文。传统的基于查询项匹配检索方法往往无法准确地检索出满足用户需求的论文。这里给出了一种基于概念的论文相似性检索方法,有效地改进了传统的论文检索方法。介绍了一种对论文关键词进行层次聚类的算法,首先把论文关键词聚类为概念,从而生成一个概念树,然后用概念向量表示论文,每篇论文对应一个概念子树。在相似性检索时,采用改进的余弦相似性方法,根据概念向量计算论文的相似性,把与给定论文最相似的论文返回给用户。用这种算法,能很好地对论文进行基于概念的相似性检索。算法克服了基于查询项匹配检索的缺点,实验证明其有较高的查全率和查准率。  相似文献   

18.
The objective of this study was to determine the quality of MEDLINE searches done by physicians, physician trainees, and expert searchers (clinicians and librarians). Its design was an analytic survey with independent replication in a setting of self-service online searching from medical wards, an intensive care unit, a coronary care unit, an emergency room, and an ambulatory clinic in a 300-bed teaching hospital. Participating were all M.D. clinical clerks, house, and attending staff responsible for patients in the above settings. Intervention for all participants consisted of a 2-h small group class and 1-h practice session on MEDLINE searching (GRATEFUL MED) before free access to MEDLINE. Search questions from 104 randomly selected novice searches were given to 1 of 13 clinicians with prior search experience and 1 of 3 librarians to run independent searches (triplicated searches). Measurements and main results from these unique citations of the triplicated searches were sent to expert clinicians to rate for relevance (7-point scale). Recall (number of relevant citations retrieved from an individual search divided by the total number of relevant citations from all searches on the same topic) and precision (proportion of relevant citations retrieved in each search) were calculated. Librarians were significantly better than novices for both. Librarians had equivalent recall to, and better precision than, experienced end-users. Unexpectedly, only 20% of relevant citations were retrieved by more than one search of the set of three, with the conclusion that novice searchers on MEDLINE via GRATEFUL MED after brief training have relatively low recall and precision. Recall improves with experience but precision remains suboptimal. Further research is needed to determine the "learning curve," evaluate training interventions, and explore the non-overlapping retrieval of relevant citations by different searchers.  相似文献   

19.
Computational models of relevance in case-based legal reasoning have traditionallybeen based on algorithms for comparing the facts and substantive legal issues of aprior case to those of a new case. In this paper we argue that robust models ofcase-based legal reasoning must also consider the broader social and jurisprudentialcontext in which legal precedents are decided. We analyze three aspects of legalcontext: the teleological relations that connect legal precedents to the socialvalues and policies they serve, the temporal relations between prior andsubsequent cases in a legal domain, and the procedural posture of legal cases,which defines the scope of their precedential relevance. Using real examples drawnfrom appellate courts of New York and Massachusetts, we show with the courts' ownarguments that the doctrine of stare decisis (i.e., similar facts should lead to similar results) is subject to contextual constraints and influences. For each of the three aspects of legal context, we outline an expanded computational framework for case-based legal reasoning that encompasses the reasoning of the examples, and provides a foundation for generating a more robust set of legal arguments.  相似文献   

20.
Rules often contain terms that are ambiguous, poorly defined or not defined at all. In order to interpret and apply rules containing such terms, appeal must be made to their previous constructions, as in the interpretation of legal statutes through relevant legal cases. We describe a system CABARET (CAse-BAsed REasoning Tool) that provides a domain-independent shell that integrates reasoning with rules and reasoning with previous cases in order to apply rules containing ill-defined terms. The integration of these two reasoning paradigms is performed via a collection of control heuristics, which suggest how to interleave case-based methods and rule-based methods to construct an argument to support a particular interpretation. CABARET is currently instantiated with cases and rules from an area of income tax law, the so-called “home office deduction”. An example of CABARET's processing of an actual tax case is provided in some detail. The advantages of CABARET's hybrid approach to interpretation stem from the synergy derived from interleaving case-based and rule-based tasks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号