首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Legal text retrieval traditionally relies upon external knowledge sources such as thesauri and classification schemes, and an accurate indexing of the documents is often manually done. As a result not all legal documents can be effectively retrieved. However a number of current artificial intelligence techniques are promising for legal text retrieval. They sustain the acquisition of knowledge and the knowledge-rich processing of the content of document texts and information need, and of their matching. Currently, techniques for learning information needs, learning concept attributes of texts, information extraction, text classification and clustering, and text summarization need to be studied in legal text retrieval because of their potential for improving retrieval and decreasing the cost of manual indexing. The resulting query and text representations are semantically much richer than a set of key terms. Their use allows for more refined retrieval models in which some reasoning can be applied. This paper gives an overview of the state of the art of these innovativetechniques and their potential for legal text retrieval.  相似文献   

2.
Text retrieval techniques have long focused on the topic of texts rather than the pragmatic role they play per se. In this article, we address two other aspects in text processing that could enhance text retrieval: (a) the detection of functional style in retrieved texts, and (b) the detection of writer"s attitude towards a given topic in retrieved texts. The former is justified by the fact that current text databases have become highly heterogeneous in terms of document inclusion, while the latter is dictated by the need for advanced and intelligent retrieval tools. Towards this aim, two generalised methodologies are presented in order to achieve the implementation of the findings in both aspects in text processing respectively. Particularly, the first one is fully developed and thus is analysed and evaluated in detail, while for the second one the theoretical framework is given for its subsequent computational implementation. Both approaches are as language independent as possible, empirically driven, and can be used, apart from information retrieval purposes, in various natural language processing applications. These include grammar and style checking, natural language generation, summarisation, style verification in real-world texts, recognition of style shift between adjacent portions of text, and author identification.  相似文献   

3.
法律人工智能因其高效、便捷的特点,近年来受到社会各界的广泛关注。法律文书是法律在社会生活中最常见的表现形式,应用自然语言理解方法智能地处理法律文书内容是一个重要的研究和应用方向。该文梳理与总结面向法律文书的自然语言理解技术,首先介绍了五类面向法律文书的自然语言理解任务形式: 法律文书信息提取、类案检索、司法问答、法律文书摘要和判决预测。然后,该文探讨了运用现有自然语言理解技术应对法律文书理解的主要挑战,指出需要解决好法律文书与日常生活语言之间的表述差异性、建模好法律文书中特有的推理与论辩结构,并且需要将法条、推理模式等法律知识融入自然语言理解模型。  相似文献   

4.
There is no task that computers regularly perform that is more affected by the nature of human language than the retrieval of texts in response to a human need. Despite this, the techniques actually in use for this task, as well as most of the techniques proposed by information retrieval (IR) researchers, make little use of knowledge about language. In this article we take the view that IR is an inference task, and that natural language processing (NLP) techniques can produce text representations that enable more accurate inferences about document content. By considering previous work on language-based and knowledge-based techniques from this perspective, some clear lessons are apparent, and we are applying these lessons in the ADRENAL (Augmented Document REtrieval using NAtural Language processing) project. Our initial experiments with hand-coded representations suggest that using NLP-produced representations can result in significant performance increases in IR systems, and also demonstrate the attention that must be given to representational issues in language-oriented IR.  相似文献   

5.
Document ranking and the vector-space model   总被引:2,自引:0,他引:2  
Efficient and effective text retrieval techniques are critical in managing the increasing amount of textual information available in electronic form. Yet text retrieval is a daunting task because it is difficult to extract the semantics of natural language texts. Many problems must be resolved before natural language processing techniques can be effectively applied to a large collection of texts. Most existing text retrieval techniques rely on indexing keywords. Unfortunately, keywords or index terms alone cannot adequately capture the document contents, resulting in poor retrieval performance. Yet keyword indexing is widely used in commercial systems because it is still the most viable way by far to process large amounts of text. Using several simplifications of the vector-space model for text retrieval queries, the authors seek the optimal balance between processing efficiency and retrieval effectiveness as expressed in relevant document rankings  相似文献   

6.
Probabilistic topic models could be used to extract low-dimension aspects from document collections, and capture how the aspects change over time. However, such models without any human knowledge often produce aspects that are not interpretable. In recent years, a number of knowledge-based topic models and dynamic topic models have been proposed, but they could not process concept knowledge and temporal information in Wikipedia. In this paper, we fill this gap by proposing a new probabilistic modeling framework which combines both data-driven topic model and Wikipedia knowledge. With the supervision of Wikipedia knowledge, we could grasp more coherent aspects, namely, concepts, and detect the trends of concepts more accurately, the detected concept trends can reflect bursty content in text and people’s concern. Our method could detect events and discover events specific entities in text. Experiments on New York Times and TechCrunch datasets show that our framework outperforms two baselines.  相似文献   

7.
The keyphrases of a text entity are a set of words or phrases that concisely describe the main content of that text. Automatic keyphrase extraction plays an important role in natural language processing and information retrieval tasks such as text summarization, text categorization, full-text indexing, and cross-lingual text reuse. However, automatic keyphrase extraction is still a complicated task and the performance of the current keyphrase extraction methods is low. Automatic discovery of high-quality and meaningful keyphrases requires the application of useful information and suitable mining techniques. This paper proposes Topical and Structural Keyphrase Extractor (TSAKE) for the task of automatic keyphrase extraction. TSAKE combines the prior knowledge about the input langue learned by an N-gram topical model (TNG) with the co-occurrence graph of the input text to form some topical graphs. Different from most of the recent keyphrase extraction models, TSAKE uses the topic model to weight the edges instead of the nodes of the co-occurrence graph. Moreover, while TNG represents the general topics of the language, TSAKE applies network analysis techniques to each topical graph to detect finer grained sub-topics and extract more important words of each sub-topic. The use of these informative words in the ranking process of the candidate keyphrases improves the quality of the final keyphrases proposed by TSAKE. The results of our experiment studies conducted on three manually annotated datasets show the superiority of the proposed model over three baseline techniques and six state-of-the-art models.  相似文献   

8.
实体消歧作为知识库构建、信息检索等应用的重要支撑技术,在自然语言处理领域有着重要的作用。然而在短文本环境中,对实体的上下文特征进行建模的传统消歧方式很难提取到足够多用以消歧的特征。针对短文本的特点,提出一种基于实体主题关系的中文短文本图模型消歧方法,首先,通过TextRank算法对知识库信息构建的语料库进行主题推断,并使用主题推断的结果作为实体间关系的表示;然后,结合基于BERT的语义匹配模型给出的消歧评分对待消歧文本构建消歧网络图;最终,通过搜索排序得出最后的消歧结果。使用CCKS2020短文本实体链接任务提供的数据集对所提方法进行评测,实验结果表明,该方法对短文本的实体消歧效果优于其他方法,能有效解决在缺乏知识库实体关系情况下的中文短文本实体消歧问题。  相似文献   

9.
多文档文摘是将同一主题下的多个文本描述的主要的信息按压缩比提炼为一个文本的自然语言处理技术,它可以从全局的角度对网络信息进行挖掘。在面对飞速增长的网络资源时,如何准确、高效地从海量数据源内进行自动文摘处理,是多文档自动文摘面临的主要难题之一。MapReduce是Google提出的一种分布式并行计算方法,它可以部署在任意一个普通商用计算机组成的集群上,能够有效地协调集群内各计算机的计算任务,充分利用计算机集群的处理能力,能够对海量数据进行有效的分析处理。提出了一个有效的实验模型,将MapReduce分布式并行框架应用在多文档自动文摘技术中。实验结果表明,MapReduce在保证文摘质量的前提下,能够有效地提高文摘抽取过程的处理性能。  相似文献   

10.
Textual Data Mining to Support Science and Technology Management   总被引:10,自引:0,他引:10  
This paper surveys applications of data mining techniques to large text collections, and illustrates how those techniques can be used to support the management of science and technology research. Specific issues that arise repeatedly in the conduct of research management are described, and a textual data mining architecture that extends a classic paradigm for knowledge discovery in databases is introduced. That architecture integrates information retrieval from text collections, information extraction to obtain data from individual texts, data warehousing for the extracted data, data mining to discover useful patterns in the data, and visualization of the resulting patterns. At the core of this architecture is a broad view of data mining—the process of discovering patterns in large collections of data—and that step is described in some detail. The final section of the paper illustrates how these ideas can be applied in practice, drawing upon examples from the recently completed first phase of the textual data mining program at the Office of Naval Research. The paper concludes by identifying some research directions that offer significant potential for improving the utility of textual data mining for research management applications.  相似文献   

11.
中文文本的信息自动抽取和相似检索机制   总被引:1,自引:0,他引:1  
目前信息抽取成为提供高质量信息服务的重要手段,提出面向中文文本信息的自动抽取和相似检索机制,其基本思想是将用户兴趣表示为语义模板,对关键字进行概念扩充,通过搜索引擎获得初步的候选文本集合,在概念触发机制和部分分析技术基础上,利用语义关系到模板槽的映射机制,填充文本语义模板,形成结构化文本数据库.基于文本数据表述的模糊性,给出用户查询与文本语义模板的相似关系,实现了相似检索,可以更加全面地满足用户的信息需求.  相似文献   

12.

This work introduces a novel approach to extract meaningful content information from video by collaborative integration of image understanding and natural language processing. We developed a person browser system that associates faces and overlaid name texts in videos. This approach takes news videos as a knowledge source, then automatically extracts face and assoicated name text as content information. The proposed framework consists of the text detection module, the face detection module, and the person indexing database module. The successful results of person extraction reveal that the proposed methodology of integrated use of image understanding techniques and natural language processing technique is headed in the right direction to achieve our goal of accessing real content of multimedia information.

  相似文献   

13.
多文档自动文摘综述   总被引:18,自引:9,他引:18  
秦兵  刘挺  李生 《中文信息学报》2005,19(6):15-20,56
多文档文摘是将同一主题下的多个文本描述的主要的信息按压缩比提炼为一个文本的自然语言处理技术。随着互联网上信息的日益丰富,多文档文摘技术成为新的研究热点。本文介绍了多文档文摘的产生和应用背景,阐述了多文档文摘和其他自然语言处理技术的关系,对多文档文摘国内外研究现状进行了分析,在此基础上汇总提出了多文档文摘研究的基本路线及关键技术,并总结了多文档文摘的未来及发展趋势。  相似文献   

14.
林泽琦  邹艳珍  赵俊峰  曹英魁  谢冰 《软件学报》2019,30(12):3714-3729
自然语言文本形式的文档是软件项目的重要组成部分.如何帮助开发者在大量文档中进行高效、准确的信息定位,是软件复用领域中的一个重要研究问题.提出了一种基于代码结构知识的软件文档语义搜索方法.该方法从软件项目的源代码中解析出代码结构图,并以此作为领域特定的知识来帮助机器理解自然语言文本的语义.这一语义信息与信息检索技术相结合,从而实现了对软件文档的语义检索.在StackOverflow问答文档数据集上的实验表明,与多种文本检索方法相比,该方法在平均准确率(mean average precision,简称MAP)上可以取得至少13.77%的提升.  相似文献   

15.
张刚  周昭涛  王斌 《计算机工程》2006,32(12):80-81,84
介绍了一种基于主题的分布式信息检索方法,并对算法的有效性进行了深入的分析。该文通过文本聚类方法,把文档按照主题的方式来划分,经过实验发现查询答案明显地汇聚在少数的文档集合中。由此表明,基于主题的分布式信息检索方法比传统分布式信息检索方法在检索效果上有了显著的提高。  相似文献   

16.
17.
该文针对分布式信息检索时不同集合对最终检索结果贡献度有差异的现象,提出一种基于LDA主题模型的集合选择方法。该方法首先使用基于查询的采样方法获取各集合描述信息;其次,通过建立LDA主题模型计算查询与文档的主题相关度;再次,用基于关键词相关度与主题相关度相结合的方法估计查询与样本集中文档的综合相关度,进而估计查询与各集合的相关度;最后,选择相关度最高的M个集合进行检索。实验部分采用RmP@nMAP作为评价指标,对集合选择方法的性能进行了验证。实验结果表明该方法能更准确的定位到包含相关文档多的集合,提高了检索结果的召回率和准确率。  相似文献   

18.
基于链接描述文本及其上下文的Web信息检索   总被引:20,自引:0,他引:20  
文档之间的超链接结构是Web信息检索和传统信息检索的最大区别之一,由此产生了基于超链接结构的检索技术。描述了链接描述文档的概念,并在此基础上研究链接文本(anchor text)及其上下文信息在检索中的作用。通过使用超过169万篇网页的大规模真实数据集以及TREC 2001提供的相关文档及评价方法进行测试,得到如下结论:首先,链接描述文档对网页主题的概括有高度的精确性,但是对网页内容的描述有极大的不完全性;其次,与传统检索方法相比,使用链接文本在已知网页定位的任务上能够使系统性能提高96%,但是链接文本及其上下文信息无法在未知信息查询任务上改善检索性能;最后,把基于链接描述文本的方法与传统方法相结合,能够在检索性能上提高近16%。  相似文献   

19.
文本分类任务作为文本挖掘的核心问题,已成为自然语言处理领域的一个重要课题.而短文本分类由于稀疏性、实时性和不规范性等特点,已成为文本分类亟待解决的问题之一.在某些特定场景,短文本存在大量隐含语义,由此给挖掘有限文本内的隐含语义特征等任务带来挑战.已有的方法对短文本分类主要采用传统机器学习或深度学习算法,但该类算法的模型构建复杂且工作量大,效率不高.此外,短文本包含有效信息较少且口语化严重,对模型的特征学习能力要求较高.针对以上问题,提出了KAe RCNN模型,该模型在TextRCNN模型的基础上,融合了知识感知与双重注意力机制.知识感知包含了知识图谱实体链接和知识图谱嵌入,可以引入外部知识以获取语义特征,同时,双重注意力机制可以提高模型对短文本中有效信息提取的效率.实验结果表明,KAe RCNN模型在分类准确度、F1值和实际应用效果等方面显著优于传统的机器学习算法.对算法的性能和适应性进行了验证,准确率达到95.54%, F1值达到0.901,对比4种传统机器学习算法,准确率平均提高了约14%, F1值提升了约13%.与TextRCNN相比,KAe RCNN模型在准确性方面提升了约3%...  相似文献   

20.
廖祥文  刘德元  桂林  程学旗  陈国龙 《软件学报》2018,29(10):2899-2914
观点检索是自然语言处理领域中的一个热点研究课题.现有的观点检索模型在检索过程中往往无法根据上下文将词汇进行知识、概念层面的抽象,在语义层面忽略词汇之间的语义联系,观点层面缺乏观点泛化能力.因此,提出一种融合文本概念化与网络表示的观点检索方法.该方法首先利用知识图谱分别将用户查询和文本概念化到正确的概念空间,并利用网络表示将知识图谱中的词汇节点表示成低维向量,然后根据词向量推出查询和文本的向量并用余弦公式计算用户查询与文本的相关度,接着引入基于统计机器学习的分类方法挖掘文本的观点.最后利用概念空间、网络表示空间以及观点分析结果构建特征,并服务于观点检索模型,相关实验表明,本文提出的检索模型可以有效提高多种检索模型的观点检索性能.其中,基于统一相关模型的观点检索方法在两个实验数据集上相比基准方法在MAP评价指标上分别提升了6.1%和9.3%,基于排序学习的观点检索方法在两个实验数据集上相比于基准方法在MAP评价指标上分别提升了2.3%和14.6%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号