首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
针对传统的论文检索方法缺乏语义理解,检索结果相关度不高的缺点,采用基于语义网络的文档语义表达模型,提出一种基于领域本体的检索方法。首先结合学科分类体系构建领域本体,并对论文文档进行语义索引;然后根据本体知识和索引信息构建基于语义网络的文档语义表达模型;最后改进用户查询与语义网络的相关度算法,综合关键词和语义的方法实现结果排序。实验结果表明,该方法能有效地提高论文检索的准确率和召回率。  相似文献   

2.
自动文本摘要是继信息检索之后信息或知识获取的一个重要步骤,对高质量的文档文摘十分重要。该文提出以句子为基本抽取单位,以位置和标题关键词为句子的加权特征,对句子基于潜语义聚类,提出语义结构的摘要方法。同时给出了较为客观和有效的摘要评价方法。实验表明了该方法的有效性。  相似文献   

3.
为了实现基于语义的密文检索,提高密文检索的准确率和效率,本文提出了一种基于biterm主题模型(biterm topic model, BTM)的多关键词可排序对称可搜索加密方案(BTM-MRSE).通过主题模型对关键词和文档之间的潜在语义进行建模,用户利用查询关键词的概率分布作为检索陷门,根据查询关键词与文档之间的语义相关性得分来获得最相关的文档.本方案将密文检索中的特定关键词替换为基于语义的主题,实现了关键词和文档标识符的分离,从而增强了文档关键词与查询关键词的隐私保护.为了减小索引规模,我们提出两层索引结构,利用平衡二叉树构造关键词-主题安全索引,结合倒排索引构造主题-文档安全索引.一方面,主题模型减小了索引节点中向量的维数,从而提高了检索效率,同时基于平衡二叉树的二级索引机制也进一步改善了密文检索效率.安全性分析证明了所提方案是安全有效的,同时利用真实数据集进行实验对比,表明本方案的密文检索准确率和效率都有极大提升.  相似文献   

4.
传统的XML文档检索方法主要是基于关键词匹配的检索,忽略了关键词的语义信息和蕴涵于信息组合中的潜在信息。针对上述问题,提出了基于D-S证据理论的XML文档潜在信息的获取算法。该算法通过引入本体定义了概念间的语义关系和信息的组合方式,提出了基于D-S证据理论的检索模型和指标权重的计算方法,并结合似然函数设计了一个动态的阈值,有效地消除语义匹配过程中存在的不确定性,解决了信息组合中潜在信息的获取问题。此外,还将该算法应用于电子政务领域个人和企业敏感信息的检测中,实验证明了该算法比传统的方法有着更高的查准率和查全率。  相似文献   

5.
学术引文推荐是指通过论文间的匹配关系为查询论文提供深度匹配的引文文献列表,提高学者科研工作效率.现有方法主要基于短文本匹配(如关键词、标题等),缺乏对论文结构和整体语义的表示能力,导致检索结果语义相关性差.本文从长文本的深层次数据特征出发,提出一种基于层次化交互注意力匹配的引文推荐算法.基于深度神经网络构建单词、句子、文章的层次化表示框架,提升长文本的结构化表示能力;使用内部注意力机制增强学术论文的内部语义表示;使用交互注意力机制挖掘引文间细粒度匹配特征.在计算机、自然语言处理、医学等学术文献数据集上进行实验验证,提出的方法在ACC和F1等指标均优于短文本匹配模型,结果表明层次化交互注意力能获得更好的引文匹配效果.  相似文献   

6.
综合文献自身客观价值和相对于查询用户的主观价值,提出了基于用户行为分析的文献阅读价值评估方法。首先,分析用户行为并构建下载文献标题的潜在语义空间,计算文献与下载文献的标题语义相关性,计算文献摘要、关键词中出现下载文献标题词的频次,进而计算其与用户行为的相关性;其次,采用期刊影响因子、论文加权被引频次、时间因子等指标,定量评估文献的自身价值;最后,综合主客观因素定量评估的结果,评估文献的阅读价值。实验表明,提出的文献阅读价值评估方法比基于单一方面因素评估文献阅读价值的传统方法更加合理、有效。  相似文献   

7.
一种篇章结构指导的中文Web文档自动摘要方法   总被引:29,自引:0,他引:29  
“摘要”、“关键词”是对文档内容提供简要概括的元数据,在Web信息检索中起着重要作用。针对Web信息检索的需求和Web文档的特点,采用拟人思维,提出了一种以篇章结构为指导的自动摘要方法。该方法对段落之间的内容语义关系进行分析,进而划分出文档的主题层次,得到文档的篇章结构;在篇章结构的指导下,使用统计方法和启发式规则来提取文档的关键词、关键句,生成文档的摘要。在实验评估中,该方法取得了令人满意的摘要质量和速度。  相似文献   

8.
针对现有多文档抽取方法不能很好地利用句子主题信息和语义信息的问题,提出一种融合多信息句子图模型的多文档摘要抽取方法。首先,以句子为节点,构建句子图模型;然后,将基于句子的贝叶斯主题模型和词向量模型得到的句子主题概率分布和句子语义相似度相融合,得到句子最终的相关性,结合主题信息和语义信息作为句子图模型的边权重;最后,借助句子图最小支配集的摘要方法来描述多文档摘要。该方法通过融合多信息的句子图模型,将句子间的主题信息、语义信息和关系信息相结合。实验结果表明,该方法能够有效地改进抽取摘要的综合性能。  相似文献   

9.
基于本体的文档引文元数据信息抽取   总被引:5,自引:6,他引:5  
郭志鑫 《微计算机信息》2006,22(18):304-306
结合本体技术,提出了一种新的从文档中抽取引文元数据信息的方法。该方法采用模式匹配方式,可以从文档中提取作者、标题、日期等信息,并使用OWL本体描述语言进行形式化,为进一步的语义搜索和语义存储奠定基础。实验数据证明了该方法的有效性。  相似文献   

10.
摘 要: 为了从日益丰富的蒙古文信息中快速准确地检索用户需求的主题信息,提出了一种融合主题模型LDA与语言模型的方法。该方法首先对蒙古文文本建立一元和二元语言模型,得到文本的语言概率分布;然后基于LDA建立主题模型,利用吉普斯抽样方法计算模型的参数,挖掘得到文档隐含的主题概率分布;最后,计算出文档主题分布与语言分布的线性组合概率分布,以此分布来计算文档主题与查询关键词之间的相似度,返回与查询关键词主题最相关的文档。语言模型充分利用蒙古文语法特征,而主题模型LDA又具有良好的潜在语义挖掘及主题发现的泛化学习能力,从而结合两种方法更好的实现蒙古文文档的主题语义检索,提高检索准确性。实验结果表明,融合LDA模型与语言模型的方法相比单一模型体现主题语义方面取得了较好的效果。  相似文献   

11.
目前基于科技文献的专家检索方法大多数是静态地获取专家信息,而动态演化的分析方法很少考虑文献的作者、引文作者等外部信息,且很少应用于专家检索领域。基于此,在CAT和ToT模型的基础上构建了引文作者主题演化(CAToT)模型,并给出了一种估计CAToT模型参数的吉布斯采样方法以及该模型在专家检索方面应用的方法。该模型集成了CAT和ToT模型的优势,不仅可以揭示科技文献中隐含的主题、与主题相关的作者和引文作者,而且可以挖掘主题随时间变化的规律以及专家排名的演化规律。以1 557篇ACL、CONLL、EMNLP的会议论文集作为实验数据,通过与CAT模型的对比分析验证了CAToT模型的可行性和有效性。  相似文献   

12.
The need for academic researchers to retrieve patents and research papers is increasing, because applying for patents is now considered an important research activity. However, retrieving patents using keywords is a laborious task for researchers, because the terms used in patents for the purpose of enlarging the scope of the claims are generally more abstract than those used in research papers. Therefore, we have constructed a framework that facilitates patent retrieval for researchers, and have integrated research papers and patents by analysing the citation relationships between them. We obtained cited research papers in patents using two steps: (1) detection of sentences containing bibliographic information, and (2) extraction of bibliographic information from those sentences. To investigate the effectiveness of our method, we conducted two experiments. In the experiment involving Step 1, we prepared 42,073 sentences, among which a human subject manually identified 1,476 sentences containing citations of papers. For Step 2, we prepared 3,000 sentences, in which the titles, authors, and other bibliographic information were manually identified. We obtained a precision of 91.6%, and a recall of 86.9% in Step 1, and a precision of 86.2% and a recall of 85.1% in Step 2. Finally, we constructed an information retrieval system that provided two methods of retrieving research papers and patents. One method was retrieval by query, and another was from the citation relationships between research papers and patents.  相似文献   

13.
学术腐败已日渐成为社会瞩目的焦点,除了加强科技人员道德自律,还应有相应的技术手段加以监督,由此,该文将信息检索领域中向量空间模型进行了分析改进,并用改进的模型实现了学术论文相似性辨别系统,可供一些机构用来进行学术论文抄袭排查,提高工作效率,根治学术腐败,营造更好的创新环境。  相似文献   

14.
O'Leary  Daniel 《Software, IEEE》2009,26(1):12-14
This paper investigates the most cited papers in IEEE Software over its 25 year history. I find that the most cited paper has 135 citations. Further, using the H-Index to determine how many papers to analyze, I find an H-Index of 35 for IEEE Software, without omitting self citations, and an H-Index of 33 if self-citations are omitted. The percentage decrease between the total number of citations to the 35 and 33 papers respectively, was less than the percent decrease in the H-Index when self citations were omitted.  相似文献   

15.
ContextAccording to the search reported in this paper, as of this writing (May 2015), a very large number of papers (more than 70,000) have been published in the area of Software Engineering (SE) since its inception in 1968. Citations are crucial in any research area to position the work and to build on the work of others. Identification and characterization of highly-cited papers are common and are regularly reported in various disciplines.ObjectiveThe objective of this study is to identify the papers in the area of SE that have influenced others the most as measured by citation count. Studying highly-cited SE papers helps researchers to see the type of approaches and research methods presented and applied in such papers, so as to be able to learn from them to write higher quality papers which will likely receive high citations.MethodTo achieve the above objective, we conducted a study, comprised of five research questions, to identify and classify the top-100 highly-cited SE papers in terms of two metrics: total number of citations and average annual number of citations.ResultsBy total number of citations, the top paper is "A metrics suite for object-oriented design", cited 1817 times and published in 1994. By average annual number of citations, the top paper is "QoS-aware middleware for Web services composition", cited 154.2 times on average annually and published in 2004.ConclusionIt is concluded that it is important to identify the highly-cited SE papers and also to characterize the overall citation landscape in the SE field. We hope that this paper will encourage further discussions in the SE community towards further analysis and formal characterization of the highly-cited SE papers.  相似文献   

16.
In this study, information retrieval and genetic algorithm are integrated to propose a new knowledge structure construction method and this method is further applied to the papers published on the journal of Expert Systems with Applications (ESWA). The purpose of this method is to explore the major topics as well as the related techniques and methods of the papers published in various periods of time and help understand the research tendencies among these published papers. We use vector space model to present the published papers and feature and adopt chi-square test to examine the independence of topics. Then we apply genetic algorithm to facilitate automatic topic selections used to construct the knowledge structures for the journal in study.We select ESWA as the source of samples mainly because the papers published on this journal feature an extensive use of methods and techniques and have been widely applied in many domains. Moreover, it is also a SCI listed journal. In recent years, more and more outstanding scholars have published their papers on this journal, so citation of the papers on this journal by other papers is frequent. It can be viewed as an internationally prominent journal.In our experiment, knowledge structures are constructed and analyzed. The representativeness of the selected topics and whether published papers have been classified into appropriate topics are also evaluated. From the experimental results, we discover that the constructed knowledge structures could not only effectively present representative topics, related techniques and issues, but also help understand the research tendencies.  相似文献   

17.
随着各大国企,甚至是私营企业的快速发展,其科研项目和科技成果的数量呈现指数级增长,然而,企业的内部查新工作还是人工来完成,难度较大,因此文中从提高企业内部科技成果的查新效率出发,基于Solr搜索应用服务器这一核心平台,研究基于Solr的科技成果查新系统的设计和实现。首先简要介绍了Solr的概念、特性和系统架构,然后介绍了基于Solr引擎的科技成果检索查新系统的功能结构和系统架构,最后阐述了系统的界面和功能的具体实现,特别是检索查新和对比查看功能的设计和实现。  相似文献   

18.
刘大有  薛锐青  齐红 《自动化学报》2012,38(10):1654-1662
论文引用网络是一个动态变化的网络,不断有新的论文加入引用网络中.传统的论文评 价标准如引用次数、PageRank值等"终身评价标准"存在排挤新结点的问题,如何在海量论文中寻找有 价值、被持续关注的论文,成为人们感兴趣的问题. Sayyadi提出了FutureRank算法,该算法通过预测论文未来"一段时间"的被引次数排名和PageRank值排 名来达到这一目的.但FutureRank算法需提前计算PageRank值,要耗费大量运算时间.据此,我们尝 试在不计算论文现有PageRank值的条件下,从论文的撰写者以及引用者的权威值的角度来预测论文未来 的被引次数排名和PageRank值排名.实验结果表明,我们的算法与FutureRank相比,不但缩短了运算时间,而且提高了预测准确率.  相似文献   

19.
The scientificometric aspect of the “Matthews Effect,” i.e., the difference in the citations of the papers of Russian and foreign scientists published in the same publications, is studied. Publications in foreign journals on physics and chemistry are considered. The “Matthew Index,” which characterizes nonuniform distribution of citations over countries, is calculated. A conclusion on the poor “competitiveness” of Russian articles in chemistry and inadequate conformance of publications in physics to the world level is made.  相似文献   

20.
This paper reports a work that was intended to reveal the connection between topics investigated by conference papers and journal papers. This work selected hundreds of papers in data mining and information retrieval from well-known databases and showed that the topics covered by conference papers in a year often leads to similar topics covered by journal papers in the subsequent year and vice versa. This study used some existing algorithms and combination of these algorithms to proposed a new detective procedure for the researchers to detect the new trend and get the academic intelligence from conferences and journals.The goal of this research is fourfold: First, the research investigates if the conference papers’ themes lead the journal papers’. Second, the research examines how the new research themes can be identified from the conference papers. Third, the research looks at a specific area such as information retrieval and data mining as an illustration. Fourth, the research studies any inconsistencies of the correlation between the conference papers and the journal papers.This study explores the connections between the academic publications. The methodologies of information retrieval and data mining can be exploited to discover the relationships between published papers among all topics. By discovering the connections between conference papers and journal papers, researchers can improve the effectiveness of their research by identifying academic intelligence.This study discusses how conference papers and journal papers are related. The topics of conference papers are identified to determine whether they represent new trend discussed in journal papers. An automatic examination procedure based on information retrieval and data mining is also proposed to minimize the time and human resources required to predict further research developments. This study develops a new procedure and collects a dataset to verify those problems. Analytical results demonstrate that the conference papers submitted to journals papers are similar each year. Conference papers certainly affect the journal papers published over three years. About 87.23% of data points from papers published in 1991–2007 support our assumption. The research is intended to help researchers identify new trend in their research fields, and focus on the urgent topics. This is particularly valuable for new researchers in their field, or those who wish to perform cross-domain studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号