首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 196 毫秒
1.
查询扩展是提高检索效率的有效方法.但是许多查询扩展方法中扩展词的选择没有充分考虑词项之间以及词项与文档之间的相关性,这样可能在查询扩展时加入太多不相关信息降低检索的性能.通过对文档间相关性和词间相关性的计算,把文档和词关联起来构建Markov网络检索模型,然后根据词项子空间和文档子空间的映射关系提取词团,将提取的词团信息用于查询扩展,使得查询扩展的内容更为相关.实验表明:基于文档团依赖的Markov检索模型能有效地提高检索效果.  相似文献   

2.
正确建立软件文档与代码间的可追踪关系对程序理解、软件维护等非常重要。近年来,软件文档与代码间的可追踪性研究大多基于文本词汇相似度,没有充分利用软件文档和代码所蕴含的结构信息,针对这一问题提出了将软件结构信息与信息检索模型相结合进行文档与代码间可追踪性分析的方法。通过对文档和代码结构信息的分析,改善预处理效果,优化相似度计算过程,进而提高整体方法的有效性。实验结果表明,该方法比单纯基于信息检索的方法在查全率和查准率上都有所提高,而且能提取到更多的可追踪性链。  相似文献   

3.
软件过程产品间可追溯关联挖掘对软件维护及需求跟踪等众多领域至关重要。基于此,提出一种基于潜在语义索引提取程序代码和中文文档关联信息的方法,该方法是对向量空间模型的改进,通过分析文本间隐含的语义结构来确定关联度,而不依赖于词项的匹配。实验结果表明,该方法不依赖于代码和文档预先定义的同义词库和知识库,并能一定程度上提高查全率和查准率。  相似文献   

4.
代码搜索任务旨在通过分析用户需求,结合用户意图来找到满足其需求的软件构件。在加强软件复用性的同时,提高软件开发维护效率,降低成本。与传统的文档检索不同,程序特性往往隐含在标识符和代码结构中,理解程序功能是实现高效代码搜索的关键。从深度程序理解视角切入对代码搜索任务进行定义,并总结梳理近期代码搜索研究进展。针对当前代码搜索研究评估方法和数据集进行了整理。针对研究中存在的问题,对未来代码搜索研究进行展望,为后来研究者提供参考。  相似文献   

5.
为了在语料库中找出源代码的真实作者,提出了一种代码耦合度与程序依赖图特征结合的神经网络模型CPNN来识别源代码作者.首先,使用从源代码中提取的参数、扇入和扇出等特征计算代码的耦合度.其次,从转换的程序依赖图中提取控制和数据依赖项,应用预处理技术将PDG特征转换为具有频率细节的小实例,并且利用逆文档频率技术放大源代码中每...  相似文献   

6.
《计算机科学与探索》2017,(10):1591-1598
开发人员通常通过问答网站的搜索引擎进行相关软件问答文档的搜索。在检索结果中,包含优质代码片段(使用示例)的问答文档往往更受青睐,但如何度量这些文档中代码片段的质量仍是个巨大的挑战。针对这个问题,提出了一种基于代码模式的软件问答文档检索优化方法。该方法能够基于当前检索结果,抽取文档中的代码片段,分析代码片段中的公共代码模式,并基于代码模式度量文档中代码片段的质量,从原有检索结果中向用户推荐高质量的软件问答文档。以软件开发人员在实践过程中遇到的真实问题为基础进行了实验,对比Stack Overflow的搜索结果,所提方法在准确率指标NDCG@5上提升了40%。  相似文献   

7.
程序理解在软件维护和软件复用中扮演着重要的角色,基于编译技术的程序信息抽取与分析是程序理解辅助工具的主要技术手段.为了降低信息抽取和信息分析的代价,提高程序理解工具的质量和构建效率,本文使用Java文档对象模型作为程序代码结构模型,提出并实现了针对Java代码的信息查询语言JPATH.通过构造JPATH查询表达式,信息抽取与分析程序能够定位感兴趣的元素在代码结构模型中的位置.同时,本文对JPATH做了进一步的扩展,提供了一种对象一关系的查询机制,便于编程人员提取具有特定语义关系的语法对象组合.  相似文献   

8.
程序文档合一与动态文档   总被引:2,自引:0,他引:2  
洪海  廖静 《软件世界》1999,(10):60-61
很多企业已经建立了许多庞大的计算机管理系统,而且将不断地推出新的系统。满足经营的需求须不断维护、改造计算机系统,但同时又要不影响现行生产,所以必须建立一整套机制来评价、控制和完成对系统的维护。在软件维护过程中,提出程序与文档台~的概念在软件开发的同时建立动态文档。程序与文档台一概念的指出一、目前软件的状况程序与文档的形式分离,不仅是用各自独立的形式存放,而且使用不同的工具在不同的时间里书写和检索。维护程序时不能方便地得到文档的帮助,不能同步修改立档。程序与文档的内容分离,由于程序与文档采用不同的…  相似文献   

9.
源代码检索是软件工程领域的一项重要研究问题,其主要任务是检索和复用软件项目API(application program interface,应用程序接口).随着软件项目的规模越来越大、越来越复杂,当前,源代码检索一方面需要提高基于自然语言API查询的准确性,另一方面需要定位和展示目标API及其相关代码之间的关联,以更好地辅助用户理解API的实现逻辑和使用场景.为此,提出一种基于图嵌入的软件项目源代码检索方法.该方法能够基于软件项目源代码自动构建其代码结构图,并通过图嵌入对源代码进行信息表示.在此基础上,用户可以输入自然语言问题、检索并返回相关的API及其关联信息构成的连通代码子图,从而提高API检索和复用的效率.在以开源项目Apache Lucene和POI为例的检索实验中,该方法检索结果的F1值比现有基于最短路径的方法提高了10%,同时显著缩短了平均响应时间.  相似文献   

10.
首先提出矩阵加权项集支持度计算方法,给出面向跨语言查询扩展的矩阵加权关联模式挖掘算法.然后提出基于矩阵加权关联规则挖掘的跨语言查询译后扩展算法.借助机器翻译进行首次跨语言检索,得到前列初检文档,并经用户相关性判断后得到相关反馈文档.通过计算支持度从相关反馈文档中挖掘含有原查询词的矩阵加权频繁项集,通过置信度-兴趣度评价框架从频繁项集中提取含有原查询词的关联规则,将规则的后件或前件作为扩展词,利用规则的置信度和兴趣度衡量扩展词的重要性,完成跨语言查询译后扩展.在NTCIR-5 CLIR标准测试集上的实验表明,文中算法可以有效提升跨语言查询扩展性能,有利于长查询的跨语言检索,译后后件扩展性能优于前件.  相似文献   

11.
Abstract

The aim of probabilistic models is to define a retrieval strategy within which documents can be optimally ranked according to their relevance probability, with respect to a given request. In this scheme, the underlying probabilities are estimated according to a history of past queries along with their relevance judgments. Having evolved over the last twenty years, these estimations allow us to take both document frequency and within-document frequency into account.

In the current study, we suggest representing documents not only by index term vectors as proposed by previous probabilistic models but also by considering relevance hypertext links. These relationships, which provide additional evidence on document content, are established according to requests and relevance judgments, and may improve the ranking of the retrieved records, in a sequence most likely to fulfill user intent. Thus, to enhance retrieval effectiveness, our learning retrieval scheme should modify: (1) the weight assigned to each indexing term, (2) the importance attached of each search term, and (3) the relationships between documents. Using a simple additive scheme applied after a ranked list of documents has been determined, with the aid of a probabilistic retrieval strategy, our proposed solution is well suited to a hypertext system. Based on the CACM test collection which includes 3,204 documents and the CISI corpus (1,460 documents), we have built a hypertext and evaluated our proposed retrieval scheme. The retrieval effectiveness of this approach presents interesting results.  相似文献   

12.
Although very important in software engineering, establishing traceability links between software artifacts is extremely tedious, error-prone, and it requires significant effort. Even when approaches for automated traceability recovery exist, these provide the requirements analyst with a, usually very long, ranked list of candidate links that needs to be manually inspected. In this paper we introduce an approach called Estimation of the Number of Remaining Links (ENRL) which aims at estimating, via Machine Learning (ML) classifiers, the number of remaining positive links in a ranked list of candidate traceability links produced by a Natural Language Processing techniques-based recovery approach. We have evaluated the accuracy of the ENRL approach by considering several ML classifiers and NLP techniques on three datasets from industry and academia, and concerning traceability links among different kinds of software artifacts including requirements, use cases, design documents, source code, and test cases. Results from our study indicate that: (i) specific estimation models are able to provide accurate estimates of the number of remaining positive links; (ii) the estimation accuracy depends on the choice of the NLP technique, and (iii) univariate estimation models outperform multivariate ones.  相似文献   

13.
Geographic Information Retrieval is concerned with retrieving documents in response to a spatially related query. This paper addresses the ranking of documents by both textual and spatial relevance. To this end, we introduce multi-dimensional scattered ranking, where textually and spatially similar documents are ranked spread in the list, instead of consecutively. The effect of this is that documents close together in the ranked list have less redundant information. We present various ranking methods of this type, efficient algorithms to implement them, and experiments to show the outcome of the methods.*This research is supported by the EU-IST Project No. IST-2001-35047 (SPIRIT).  相似文献   

14.
We propose a new integral-based source selection algorithm for uncooperative distributed information retrieval environments. The algorithm functions by modeling each source as a plot, using the relevance score and the intra-collection position of its sampled documents in reference to a centralized sample index. Based on the above modeling, the algorithm locates the collections that contain the most relevant documents. A number of transformations are applied to the original plot, in order to reward collections that have higher scoring documents and dampen the effect of collections returning an excessive number of documents. The family of linear interpolant functions that pass through the points of the modified plot is computed for each available source and the area that they cover in the rank-relevance space is calculated. Information sources are ranked based on the area that they cover. Based on this novel metric for collection relevance, the algorithm is tested in a variety of testbeds in both recall and precision oriented settings and its performance is found to be better or at least equal to previous state-of-the-art approaches, overall constituting a very effective and robust solution.  相似文献   

15.
抗万能攻击的安全网络编码   总被引:2,自引:2,他引:0  
徐光宪  付晓 《计算机科学》2012,39(8):88-91,114
提出了一种能够抵抗万能攻击者的安全网络编码算法。在敌人可以窃听所有节点和信道及污染zo个链路的情况下,该算法利用稀疏矩阵对信源信息进行矩阵变换,增强了信息的抗窃听能力,并利用列表译码法在信宿处进行译码,对污染攻击进行检测和排除。理论分析和仿真结果表明,该算法能够在多项式时间内设计完成,能够抵抗窃听和污染等安全性攻击,使得原本的随机网络编码以很高的概率达到弱安全的要求;同时提高了编码速率,减小了存储空间的占用。更重要的是,该算法仅在原随机编码体制的基础上对信源和信宿进行了修改,中间节点保持不变。  相似文献   

16.
非定常Monte Carlo输运问题的并行算法   总被引:1,自引:0,他引:1  
文中给出了非定常MonteCarlo(下文简写为MC)输运问题的并行算法 ,对并行程序的加载运行模式进行了讨论和优化设计 .针对MC并行计算设计了一种理想情况下无通信的并行随机数发生器算法 .动态MC输运问题有大量的I/O操作 ,特别是读取剩余粒子数据文件需要大量的I/O时间 ,文中针对I/O问题 ,提出了三种并行I/O算法 .最后给出了并行算法的性能测试结果 ,对比串行计算时间 ,使用 6 4台处理机时的并行计算时间缩短了 30倍  相似文献   

17.
提出一种基于组件和开放API+脚本可编程的可视化程序产物处理架构,通过加载不同的组件,可形成面向不同领域的代码、配置和建模文本。通过在脚本中调用组件开放的API接口,可灵活定义各个图形化符号的产物输出内容,将和底层软硬件相关部分以脚本的方式从可视化软件源程序中剥离,实现平滑升级和灵活扩展,提高了可视化编程软件的通用性和二次开发能力。通过交直流工程应用和EMTDC仿真证明了此方案的可行性和有效性。  相似文献   

18.
梁鹏鹏  柴玉梅  王黎明 《计算机工程》2011,37(21):124-125,130
针对传统文本分类方法对文档间关联关系考虑不充分的问题,提出一种基于iTopicModel的关联文本分类算法。根据类信息已知的文档归属于各个主题的概率判断主题代表的类信息,利用待分类文档归属于各个主题的概率及文本信息对文档进行分类。实验结果表 明,当文档间的关联关系对类信息影响较大时,TC-iTM的分类性能优于传统文本分类方法。  相似文献   

19.
Software developers rely on a fast build system to incrementally compile their source code changes and produce modified deliverables for testing and deployment. Header files, which tend to trigger slow rebuild processes, are most problematic if they also change frequently during the development process, and hence, need to be rebuilt often. In this paper, we propose an approach that analyzes the build dependency graph (i.e., the data structure used to determine the minimal list of commands that must be executed when a source code file is modified), and the change history of a software system to pinpoint header file hotspots—header files that change frequently and trigger long rebuild processes. Through a case study on the GLib, PostgreSQL, Qt, and Ruby systems, we show that our approach identifies header file hotspots that, if improved, will provide greater improvement to the total future build cost of a system than just focusing on the files that trigger the slowest rebuild processes, change the most frequently, or are used the most throughout the codebase. Furthermore, regression models built using architectural and code properties of source files can explain 32–57 % of these hotspots, identifying subsystems that are particularly hotspot-prone and would benefit the most from architectural refinement.  相似文献   

20.
In this paper, we study the effect of taking the user into account in a query-by-example handwritten word spotting framework. Several off-the-shelf query fusion and relevance feedback strategies have been tested in the handwritten word spotting context. The increase in terms of precision when the user is included in the loop is assessed using two datasets of historical handwritten documents and two baseline word spotting approaches both based on the bag-of-visual-words model. We finally present two alternative ways of presenting the results to the user that might be more attractive and suitable to the user's needs than the classic ranked list.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号