首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 234 毫秒
1.
当今互联网中海量的信息使得人们难以在搜索结果列表中进行选择。推荐算法是解决这类信息过载问题的有效方法。而推荐列表的排序直接影响推荐的精度。本文研究推荐列表的排序问题,提出构建用户-对象的二分图模型,在此图上应用资源分配动力学算法学习出各个用户和对象的推荐相关度。通过约束与平滑各个用户和对象的相关度,提出的算法比现有方法在推荐精度上提高了20%。  相似文献   

2.
周丰 《数字社区&智能家居》2013,(11):2605-2606,2617
21世纪科技迅速发展,特别是信息技术的飞速发展,计算机网络的普及以及产生越来越广泛的作用,信息大爆炸使得各种各样的信息充斥着人们的生活。在这些纷繁复杂的信息当中,如何判别哪一些是有用的哪一些是不需要的,如何从海量的信息当中获取最需要、最有价值的信息一直是互联网技术的研究重点。传统的搜索排序算法已经越来越表现出不足与缺陷,无法满足用户的需要。该文将基于用户的查询偏好来探讨搜索排序算法,从用户的角度出发,分析与传统的搜索排序算法有哪些优势,以便更好的满足用户的需求。  相似文献   

3.
多文档文摘中基于时间信息的句子排序策略研究   总被引:1,自引:0,他引:1  
文摘句排序是多文档自动文摘中的一个关键技术,直接影响到文摘的流畅程度和可读性。文本时间信息处理是影响排序算法质量的瓶颈技术,由于无法获得准确的时间信息,传统的句子排序策略均回避了这一问题,而且均无法获得稳定的高质量的排序效果。对此该文从文本时间信息处理入手,首先提出了中文文本时间信息抽取、语义计算以及时序推理算法,并在此算法基础上,借鉴传统的主成分排列的思想和句子相关度计算方法,提出了基于时间信息的句子排序算法。实验表明该算法的质量要明显好于传统的主成分排列算法和时序排列算法。  相似文献   

4.
在互联网信息推荐系统中,为了满足对用户推荐的高精度、普适化的算法设计需求,提出构建用户-对象的二分图模型,在图模型上应用资源分配动力学算法学习出各个用户和对象的推荐相关概率值,作为推荐排序的依据.提出的算法模型可以从已有的用户选择对象的历史数据中,自动的进行无监督挖掘得到相对客观的用户喜好信息,较已有的基于内容的推荐算法具有更好的普适性.实验结果表明,通过约束与平滑各个用户和对象的相关度,提出的算法可实现有效和实时的推荐,比现有方法在推荐精度上提高了20%.  相似文献   

5.
以互联网为代表的信息技术的发展使人们索取信息变得前所未有的便捷,同时也对如何有效利用信息提出了挑战。自动文摘技术通过自动选择文档中的代表句子,可以极大提高信息使用的效率。近年来,基于英文和中文的自动文摘技术获得广泛关注并取得长足进展,而对少数民族语言的自动文摘研究还不够充分,例如维吾尔语。构造了一个面向维吾尔语的自动文摘系统。首先利用维吾尔语的语言学知识对文档进行预处理,之后对文档进行了关键词提取,利用这些关键词进行了抽取式自动文摘。比较了基于TF-IDF和基于TextRank的两种关键词提取算法,证明TextRank方法提取出的关键词更适合自动文摘应用。通过研究证明了在充分考虑到维吾尔语语言信息的前提下,基于关键词的自动文摘方法可以取得让人满意的效果。  相似文献   

6.
郑志蕴  刘博李伦  王振飞 《计算机科学》2015,42(7):234-239, 249
随着语义网数据的海量涌现,人们更加关注RDF图的数据查询效率,通过关键词匹配直接查询RDF数据图成为一个研究热点。针对关键词查询中普遍存在的结果冗余与偏离等问题,提出了一种基于关键词的RDF数据图查询模型。该模型首先采用提出的基于迭代的图查询算法(ISGR)对所查询关键词进行子图匹配,得到唯一且最大的结果子图集合;然后根据关键词图与结果子图之间的结构信息,利用统计语言模型,给出了一种结果子图排序方法(SimLM)。对比实验表明,提出的查询模型及排序方法在一致性和相关性方面的性能优于传统模型。  相似文献   

7.
提高文摘自动生成的准确性,能够帮助人们快速有效地获取有价值的信息。本文根据政府公文结构性强的特点,提出一种基于句子权重和篇章结构的政府公文自动文摘算法,首先通过基于游标的截取字符分句算法,对文档中句子和词语信息进行精确统计,获得对文章内容和篇章结构的基本了解;在此基础上,提出基于篇章结构的词语权重和句子权重计算方法,并根据权重计算结果对句子进行权重排序;然后,根据生成摘要的规模,筛选出一定数量的候选文摘句子;最后,对候选文摘句子进行一定的后处理,输出文摘句。实验结果表明,与同类型自动文摘算法以及Word 2003提供的自动文摘工具相比,本文提出的自动文摘算法在准确率和召回率上都有较大提高。  相似文献   

8.
目前基因拼接软件中应用最广泛的技术是基于De Bruijn图的基因拼接算法,需要对长达数十亿BP长度的基因组测序数据进行处理.针对海量的基因测序数据,快速、高效和可扩展的基因拼接算法非常重要.虽然已出现一些并行拼接算法(如YAGA)开始研究这些问题,但是拼接过程中时间、空间消耗较大的构图和单链化简这两大步骤在海量数据的挑战下仍然是最主要的计算瓶颈.这是因为现有工作在处理这几个步骤时通常使用了并行的表排序(list ranking),而该方法需要多次对De Bruijn图的海量顶点信息进行分布式的排序,产生了大量的计算节点间的通信.单链化简可由1次De Bruijn 图深度优先遍历完成而不再需要表排序,于是提出一种基于分布式海量图遍历方法对单链化简进行优化,极大地减少了处理器间的通信和计算节点之间的数据移动,因而取得较好的扩展性,其算法复杂度为O(g/p),通信复杂度为O(g),这里g为参考序列的长度,p为处理器的核数.当对E.coli和Yeast数据集进行测试,处理器的核数从8个增加到512个时,算法可以得到13倍和10倍的加速比;当对C.elegans和人类1号染色体(chr1)数据集进行测试,处理器的核数从32个增加到512个时,算法可以得到7倍和10倍的加速比.  相似文献   

9.
方萍 《计算机应用研究》2021,38(9):2657-2661
近年的自动摘要算法大多是基于监督学习机制,没有考虑到人工标记语料的烦琐,并且大多数摘要模型在句子嵌入时不能结合上下文来更充分表达语义信息,忽略了文本的整体信息.针对上述问题提出了一种基于改进的BERT双向预训练语言模型与图排序算法结合的抽取式摘要模型.根据句子的位置以及上下文信息将句子映射为结构化的句子向量,再结合图排序算法选取影响程度靠前的句子组成临时摘要;为了避免得到有较高冗余度的摘要,对得到的临时摘要进行冗余消除.实验结果表明在公用数据集CNN/DailyMaily上,所提模型能够提高摘要的得分,相对于其他改进的基于图排序摘要抽取算法,该方法效果更佳.  相似文献   

10.
基于权限提升的网络攻击图生成方法   总被引:7,自引:1,他引:6       下载免费PDF全文
研究已有攻击图生成方法,提出基于权限提升的攻击图关联算法,实现一种有效的网络攻击图自动生成工具。该工具利用数据库对网络攻击进行建模,包括主机描述、网络连接、利用规则3个属性,自动将网络配置和主机信息写入数据库,根据广度优先前向和后向搜索相结合的关联算法生成攻击事件图,实现网络安全的整体分析。  相似文献   

11.
A document-sensitive graph model for multi-document summarization   总被引:1,自引:1,他引:0  
In recent years, graph-based models and ranking algorithms have drawn considerable attention from the extractive document summarization community. Most existing approaches take into account sentence-level relations (e.g. sentence similarity) but neglect the difference among documents and the influence of documents on sentences. In this paper, we present a novel document-sensitive graph model that emphasizes the influence of global document set information on local sentence evaluation. By exploiting document–document and document–sentence relations, we distinguish intra-document sentence relations from inter-document sentence relations. In such a way, we move towards the goal of truly summarizing multiple documents rather than a single combined document. Based on this model, we develop an iterative sentence ranking algorithm, namely DsR (Document-Sensitive Ranking). Automatic ROUGE evaluations on the DUC data sets show that DsR outperforms previous graph-based models in both generic and query-oriented summarization tasks.  相似文献   

12.
《Computers in Industry》2007,58(4):304-312
Among the existing feature recognition approaches, graph-based and hint-based approaches are more popular. While graph-based algorithms are quite successful in recognizing isolated features, hint based approaches intrinsically show better performance in handling interacting features. In this paper, feature traces as defined by hint based approaches are implemented and represented in concave graph forms helping the recognition of interacting features with less computational effort. The concave graphs are also used to handle curved 2.5D features while many of the previous graph-based approaches have merely dealt with polyhedral features. The method begins by decomposing the part graph to generate a set of concave sub-graphs. A feature is then recognized based on the properties of the whole concave graph or the properties of its nodes. Graph-based approaches are not intrinsically suitable to provide volumetric representation for the features, but the complete boundary information of a feature can be more effectively obtained volumetrically. Therefore, in this research a method to generate feature volumes for the recognized sub-graphs is also proposed. The approach shows better recognition ability than sub-graph isomorphism methods.  相似文献   

13.

Text summarization presents several challenges such as considering semantic relationships among words, dealing with redundancy and information diversity issues. Seeking to overcome these problems, we propose in this paper a new graph-based Arabic summarization system that combines statistical and semantic analysis. The proposed approach utilizes ontology hierarchical structure and relations to provide a more accurate similarity measurement between terms in order to improve the quality of the summary. The proposed method is based on a two-dimensional graph model that makes uses statistical and semantic similarities. The statistical similarity is based on the content overlap between two sentences, while the semantic similarity is computed using the semantic information extracted from a lexical database whose use enables our system to apply reasoning by measuring semantic distance between real human concepts. The weighted ranking algorithm PageRank is performed on the graph to produce significant score for all document sentences. The score of each sentence is performed by adding other statistical features. In addition, we address redundancy and information diversity issues by using an adapted version of Maximal Marginal Relevance method. Experimental results on EASC and our own datasets showed the effectiveness of our proposed approach over existing summarization systems.

  相似文献   

14.
刘静  郑铜亚  郝沁汾 《软件学报》2024,35(2):675-710
图数据, 如引文网络, 社交网络和交通网络, 广泛地存在现实生活中. 图神经网络凭借强大的表现力受到广泛关注, 在各种各样的图分析应用中表现卓越. 然而, 图神经网络的卓越性能得益于标签数据和复杂的网络模型, 而标签数据获取困难且计算资源代价高昂. 为了解决数据标签的稀疏性和模型计算的高复杂性问题, 知识蒸馏被引入到图神经网络中. 知识蒸馏是一种利用性能更好的大模型(教师模型)的软标签监督信息来训练构建的小模型(学生模型), 以期达到更好的性能和精度. 因此, 如何面向图数据应用知识蒸馏技术成为重大研究挑战, 但目前尚缺乏对于图知识蒸馏研究的综述. 旨在对面向图的知识蒸馏进行全面综述, 首次系统地梳理现有工作, 弥补该领域缺乏综述的空白. 具体而言, 首先介绍图和知识蒸馏背景知识; 然后, 全面梳理3类图知识蒸馏方法, 面向深度神经网络的图知识蒸馏、面向图神经网络的图知识蒸馏和基于图知识的模型自蒸馏方法, 并对每类方法进一步划分为基于输出层、基于中间层和基于构造图知识方法; 随后, 分析比较各类图知识蒸馏算法的设计思路, 结合实验结果总结各类算法的优缺点; 此外, 还列举图知识蒸馏在计算机视觉、自然语言处理、推荐系统等领域的应用; 最后对图知识蒸馏的发展进行总结和展望. 还将整理的图知识蒸馏相关文献公开在GitHub平台上, 具体参见: https://github.com/liujing1023/Graph-based-Knowledge-Distillation.  相似文献   

15.
Automatic text summarization is a field situated at the intersection of natural language processing and information retrieval. Its main objective is to automatically produce a condensed representative form of documents. This paper presents ArA*summarizer, an automatic system for Arabic single document summarization. The system is based on an unsupervised hybrid approach that combines statistical, cluster-based, and graph-based techniques. The main idea is to divide text into subtopics then select the most relevant sentences in the most relevant subtopics. The selection process is done by an A* algorithm executed on a graph representing the different lexical–semantic relationships between sentences. Experimentation is conducted on Essex Arabic summaries corpus and using recall-oriented understudy for gisting evaluation, automatic summarization engineering, merged model graphs, and n-gram graph powered evaluation via regression evaluation metrics. The evaluation results showed the good performance of our system compared with existing works.  相似文献   

16.
The goal of abstractive summarization of multi-documents is to automatically produce a condensed version of the document text and maintain the significant information. Most of the graph-based extractive methods represent sentence as bag of words and utilize content similarity measure, which might fail to detect semantically equivalent redundant sentences. On other hand, graph based abstractive method depends on domain expert to build a semantic graph from manually created ontology, which requires time and effort. This work presents a semantic graph approach with improved ranking algorithm for abstractive summarization of multi-documents. The semantic graph is built from the source documents in a manner that the graph nodes denote the predicate argument structures (PASs)—the semantic structure of sentence, which is automatically identified by using semantic role labeling; while graph edges represent similarity weight, which is computed from PASs semantic similarity. In order to reflect the impact of both document and document set on PASs, the edge of semantic graph is further augmented with PAS-to-document and PAS-to-document set relationships. The important graph nodes (PASs) are ranked using the improved graph ranking algorithm. The redundant PASs are reduced by using maximal marginal relevance for re-ranking the PASs and finally summary sentences are generated from the top ranked PASs using language generation. Experiment of this research is accomplished using DUC-2002, a standard dataset for document summarization. Experimental findings signify that the proposed approach shows superior performance than other summarization approaches.  相似文献   

17.
Data sparsity, that is a common problem in neighbor-based collaborative filtering domain, usually complicates the process of item recommendation. This problem is more serious in collaborative ranking domain, in which calculating the users' similarities and recommending items are based on ranking data. Some graph-based approaches have been proposed to address the data sparsity problem, but they suffer from two flaws. First, they fail to correctly model the users' priorities, and second, they can't be used when the only available data is a set of ranking instead of rating values.In this paper, we propose a novel graph-based approach, called GRank, that is designed for collaborative ranking domain. GRank can correctly model users’ priorities in a new tripartite graph structure, and analyze it to directly infer a recommendation list. The experimental results show a significant improvement in recommendation quality compared to the state of the art graph-based recommendation algorithms and other collaborative ranking techniques.  相似文献   

18.
随着互联网上信息的爆炸式增长,如何有效提高知识获取效率变得尤为重要。文本自动摘要技术通过对信息的压缩和精炼,为知识的快速获取提供了很好的辅助手段。现有的文本自动摘要方法在处理长文本的过程中,存在准确率低的问题,无法达到令用户满意的性能效果。为此,该文提出一种新的两阶段的长文本自动摘要方法TP-AS,首先利用基于图模型的混合文本相似度计算方法进行关键句抽取,然后结合指针机制和注意力机制构建一种基于循环神经网络的编码器—解码器模型进行摘要生成。通过基于真实大规模金融领域长文本数据上的实验,验证了TP-AS方法的有效性,其自动摘要的准确性在ROUGE-1的指标下分别达到了36.6%(词)和33.9%(字符),明显优于现有其他方法。  相似文献   

19.
Information on the Internet is fragmented and presented in different data sources, which makes automatic knowledge harvesting and understanding formidable for machines, and even for humans. Knowledge graphs have become prevalent in both of industry and academic circles these years, to be one of the most efficient and effective knowledge integration approaches. Techniques for knowledge graph construction can mine information from either structured, semi-structured, or even unstructured data sources, and finally integrate the information into knowledge, represented in a graph. Furthermore, knowledge graph is able to organize information in an easy-to-maintain, easy-to-understand and easy-to-use manner.In this paper, we give a summarization of techniques for constructing knowledge graphs. We review the existing knowledge graph systems developed by both academia and industry. We discuss in detail about the process of building knowledge graphs, and survey state-of-the-art techniques for automatic knowledge graph checking and expansion via logical inferring and reasoning. We also review the issues of graph data management by introducing the knowledge data models and graph databases, especially from a NoSQL point of view. Finally, we overview current knowledge graph systems and discuss the future research directions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号