首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
基于隐马尔可夫模型的中文科研论文信息抽取   总被引:1,自引:1,他引:0       下载免费PDF全文
随着大量的科研论文出现在互联网上,从中精确地抽取论文头部信息和引文信息显得十分重要。该文提出了一种基于隐马尔可夫模型的中文科研论文头部信息和引文信息抽取算法,分析了模型结构的学习和参数估计方法。在进行信息抽取时,利用分隔符、特定标识符等格式信息对文本进行分块,利用隐马尔可夫模型进行指定域的抽取。实验结果表明,该算法具有良好的准确率和召回率。  相似文献   

针对科技研发人员从事创新活动而需要频繁检索专利的需求,以及当今专利检索智能程度不高的现状,提出一种专利智能推荐算法并开发了相应的软件.算法的输入是用户输入的检索词,输出结果中不仅包括检索系统输出的专利还包括一批推荐的专利.本算法首先实现专利间的关联,进而计算专利关联度,并根据关联度对推荐专利进行排序,构成一个有序的推荐专利集合.实验表明推荐的专利与检索词之间的确存在关联.  相似文献   

传统论文自动推荐算法仅从单视图角度实现分类,缺乏特征融合及多视图语义知识,上下文信息和长距离依赖利用不明显,较难挖掘到深层次文本特征,从而限制学术论文推荐的准确度。针对这些问题,提出了一种基于多视图融合TextRCNN的论文自动推荐模型,该模型融合论文标题、关键词和摘要三个视图特征,利用卷积神经网络(CNN)、双向长短时记忆网络(BiLSTM)和注意力机制构建模型,实现对不同学科方向论文的自动分类及推荐。实验结果表明,设计的论文推荐模型在精确率、召回率和F1值上均有所提升,比机器学习方法平均提高3.40%、3.57%和3.49%,也优于单视图和已有经典的深度学习方法。该方法有效利用多视图知识和上下文语义信息,提高论文推荐的准确率,进而节约科研工作者检索所需论文所花费时间和精力,进一步提高科研人员的效率,推荐符合其研究需求的学术论文,具有良好的学术价值和应用扩展。  相似文献   

In this paper, we presented a novel image representation method to capture the information about spatial relationships between objects in a picture. Our method is more powerful than all other previous methods in terms of accuracy, flexibility, and capability of discriminating pictures. In addition, our method also provides different degrees of granularity for reasoning about directional relations in both 8- and 16-direction reference frames. In similarity retrieval, our system provides twelve types of similarity measures to support flexible matching between the query picture and the database pictures. By exercising a database containing 3600 pictures, we successfully demonstrated the effectiveness of our image retrieval system. Experiment result showed that 97.8% precision rate can be achieved while maintaining 62.5% recall rate; and 97.9% recall rate can be achieved while maintaining 51.7% precision rate. On an average, 86.1% precision rate and 81.2% recall rate can be achieved simultaneously if the threshold is set to 0.5 or 0.6. This performance is considered to be very good as an information retrieval system.  相似文献   

This paper compares bibliographic retrieval using current MeSH (Medical Subject Headings) to bibliographic retrieval using explicitly coded semantic relationships between index terms. In a previous study, ten lists of abstracts, each list containing 20–40 papers discussing a specific pair of terms, were analyzed to identify the specific relationship(s) between those terms discussed in each paper. In the present study, we analyze how well current MeSH coding, using topical subheadings and check tags, can selectively retrieve those papers discussing each semantic relationship.  相似文献   

A controlled experiment was conducted comparing information retrieval using a Galois lattice structure with two more conventional retrieval methods: navigating in a manually built hierarchical classification and Boolean querying with index terms. No significant performance difference was found between Boolean querying and the Galois lattice retrieval method for subject searching with the three measures used for the experiment: user searching time, recall and precision. However, hierarchical classification retrieval did show significantly lower recall compared to the other two methods. This experiment suggests that retrieval using a Galois lattice structure may be an attractive alternative since it combines a good performance for subject searching along with browsing potential.  相似文献   

为了解决论文推荐领域中的数据稀疏性问题,研究人员通常会引入论文的辅助信息进行改进。然而,目前的研究大多集中于辅助信息的语义关联性,没有考虑到不同辅助信息对论文的重要性也不同。同时,在论文的网络表示领域中,随机游走的方法忽略了论文属性对论文引用关系的影响。针对这两个问题,提出了一种基于引文辅助信息嵌入的推荐方法(CERec)。首先提取论文的多种质量因素构成影响力数值,将其作为论文权重来构造影响力网络。然后将论文的影响力与引文信息结合,利用论文的多种辅助信息进行图嵌入。最后通过论文嵌入向量的余弦相似度得到推荐结果。离线实验结果表明,结合辅助信息的方法优于不结合辅助信息的方法,同时CERec相较于目前比较流行的向量表示推荐算法在召回率和NDCG上平均提高了5.054%和5.246%。  相似文献   

This paper reports a work that was intended to reveal the connection between topics investigated by conference papers and journal papers. This work selected hundreds of papers in data mining and information retrieval from well-known databases and showed that the topics covered by conference papers in a year often leads to similar topics covered by journal papers in the subsequent year and vice versa. This study used some existing algorithms and combination of these algorithms to proposed a new detective procedure for the researchers to detect the new trend and get the academic intelligence from conferences and journals.The goal of this research is fourfold: First, the research investigates if the conference papers’ themes lead the journal papers’. Second, the research examines how the new research themes can be identified from the conference papers. Third, the research looks at a specific area such as information retrieval and data mining as an illustration. Fourth, the research studies any inconsistencies of the correlation between the conference papers and the journal papers.This study explores the connections between the academic publications. The methodologies of information retrieval and data mining can be exploited to discover the relationships between published papers among all topics. By discovering the connections between conference papers and journal papers, researchers can improve the effectiveness of their research by identifying academic intelligence.This study discusses how conference papers and journal papers are related. The topics of conference papers are identified to determine whether they represent new trend discussed in journal papers. An automatic examination procedure based on information retrieval and data mining is also proposed to minimize the time and human resources required to predict further research developments. This study develops a new procedure and collects a dataset to verify those problems. Analytical results demonstrate that the conference papers submitted to journals papers are similar each year. Conference papers certainly affect the journal papers published over three years. About 87.23% of data points from papers published in 1991–2007 support our assumption. The research is intended to help researchers identify new trend in their research fields, and focus on the urgent topics. This is particularly valuable for new researchers in their field, or those who wish to perform cross-domain studies.  相似文献   

基于期望与K次方差的信息检索质量评估模型的研究   总被引:1,自引:0,他引:1  
查全率和查准率是评估信息检索系统检索质量的两个基本标准,长期以来,基于这两个标准,存在着多种评价方法,但是,这些方法基本上是对查全率和查准率做简单的处理,仅反映检索的平均, 对检索稳定性没有分析,并且缺乏一套科学的,系统的评估体系,针对这种情况,借鉴概率学中的期望和方差的思想,用数学语言严格定义了查全期望,查准期望,K次查全方差和K次查准方差等概念,在这些概念的基础上,给出了信息检索质量评估准则,与其它模型相比,该模型能从检索的平均质量和检索的稳定性两方面反映检索系统的性能,因此,对检索质量的评估更加完善和全面。  相似文献   

Patents are a type of intellectual property with ownership and monopolistic rights that are publicly accessible published documents, often with illustrations, registered by governments and international organizations. The registration allows people familiar with the domain to understand how to re-create the new and useful invention but restricts the manufacturing unless the owner licenses or enters into a legal agreement to sell ownership of the patent. Patents reward the costly research and development efforts of inventors while spreading new knowledge and accelerating innovation. This research uses artificial intelligence natural language processing, deep learning techniques and machine learning algorithms to extract the essential knowledge of patent documents within a given domain as a means to evaluate their worth and technical advantage. Manual patent abstraction is a time consuming, labor intensive, and subjective process which becomes cost and outcome ineffective as the size of the patent knowledge domain increases. This research develops an intelligent patent summarization methodology using artificial intelligence machine learning approaches to allow patent domains of extremely large sizes to be effectively and objectively summarized, especially for cases where the cost and time requirements of manual summarization is infeasible. The system learns to automatically summarize patent documents with natural language texts for any given technical domain. The machine learning solution identifies technical key terminologies (words, phrases, and sentences) in the context of the semantic relationships among training patents and corresponding summaries as the core of the summarization system. To ensure the high performance of the proposed methodology, ROUGE metrics are used to evaluate precision, recall, accuracy, and consistency of knowledge generated by the summarization system. The Smart machinery technologies domain, under the sub-domains of control intelligence, sensor intelligence and intelligent decision-making provide the case studies for the patent summarization system training. The cases use 1708 training pairs of patents and summaries while testing uses 30 randomly selected patents. The case implementation and verification have shown the summary reports achieve 90% and 84% average precision and recall ratios respectively.  相似文献   

《Information & Management》2002,39(7):559-570
Search performance can be greatly improved by using domain knowledge to assist users in developing a problem specification tailored to the information contained in the system. A methodology is presented for utilizing intelligent information retrieval techniques and domain-specific knowledge to improve user searching. For databases involving a relatively narrow domain, a “system thesaurus” combined with expert systems technology can be used to create an intelligent front end to assist the user in retrieving information with greater precision and recall. Evaluation of the prototype showed greatly improved search effectiveness and satisfaction over the traditional catalog system.  相似文献   

基于规约匹配的构件检索   总被引:14,自引:0,他引:14  
在基于构件/构架的软件开发模式中,源代码级的构件组装是其中非常重要的一个环节,采用传统的构件检索技术,如刻面检索,关键词检索等来检索满足组装需求的构件存在查全率和查准率低的缺点,基于构件规约的检索是解决这些缺点的有效途径。本文在青鸟构件描述语言(JBCDL)的基础上,详细研究了规约语法匹配的基本原理和匹配策略,提出了构件接口匹配度和冗余度的概念用于评估检索结果,最后给出一种构件接口的规范化表示方法,用于提高规约语法匹配的响应速度,本文的研究结果也同样适用于Ada,COM,CORBA等构件规约。  相似文献   

短语复述自动抽取是自然语言处理领域的重要研究课题之一,已广泛应用于信息检索、问答系统、文档分类等任务中。而专利语料作为人类知识和技术的载体,内容丰富,实现基于中英平行专利语料的短语复述自动抽取对于技术主题相关的自然语言处理任务的效果提升具有积极意义。该文利用基于统计机器翻译的短语复述抽取技术从中英平行专利语料中抽取短语复述,并利用基于组块分析的技术过滤短语复述抽取结果。而且,为了处理对齐错误和翻译歧义引起的短语复述抽取错误,我们利用分布相似度对短语复述抽取结果进行重排序。实验表明,基于统计机器翻译的短语复述抽取在中英文上准确率分别为43.20%和43.60%,而经过基于组块分析的过滤技术后准确率分别提升至75.50%和52.40%。同时,利用分布相似度的重排序算法也能够有效改进抽取效果。  相似文献   

针对构件检索的特点,结合模式分析中的树匹配思想,提出了构件树匹配模型,并在此基础上针对基于XML的刻面描述构件表示,实现了基于XML的树匹配构件匹配检索算法。该算法可以在保持构件查准率的前提下有效提高构件的查全率。实验结果证明了该算法的可行性与有效性。  相似文献   

近年来,科研社交网络的兴起在一定程度上转变了科研人员原有的科研交流合作模式,深受科研人员的欢迎;然而,科研社交网络上激增的研究成果数量使得科研人员很难找到自己真正感兴趣的学术论文。因此,为科研人员推荐其感兴趣的学术论文,成为一项重要任务。考虑到科研社交网络中科研人员阅读论文数据的特殊性,文中从单类协同过滤角度考虑科研社交网络中的论文推荐问题。一方面,利用科研人员的标签信息进行更精确的负例抽取,并在此基础上考虑科研人员的活跃度以确定负例数量;另一方面,基于添加完负例的科研人员-学术论文评分矩阵进行概率矩阵分解,在概率矩阵分解阶段融合科研人员标签关联矩阵以及论文相似度信息来进行约束,以缓解数据稀疏对最终结果的不利影响。最后,在科研社交网络“科研之友”上进行实验,采用准确率、召回率、平均准确率、平均倒数排名这4项评价指标对推荐结果的准确性及推荐排序进行验证。实验结果表明,所提方法相较于主流方法取得了更好的结果,在准确率指标上提升了4.19%,验证了所提方法将论文推荐考虑为单类协同过滤问题的有效性,以及社会化信息对推荐的有效辅助作用;并且,所提方法在推荐系统中具有良好的可扩展性,能够在科研社交网络中为科研人员进行有效的论文推荐。  相似文献   

Legal texts usually comprise many kinds of texts, such as contracts, patents and treaties. These texts usually include a huge quantity of unstructured information written in natural language. Thanks to automatic analysis and Information Retrieval (IR) techniques, it is possible to filter out information that is not relevant and, therefore, to reduce the amount of documents that users need to browse to find the information they are looking for. In this paper we adapted the JIRS passage retrieval system to work with three kinds of legal texts: treaties, patents and contracts, studying the issues related with the processing of this kind of information. In particular, we studied how a passage retrieval system might be linked up to automated analysis based on logic and algebraic programming for the detection of conflicts in contracts. In our set-up, a contract is translated into formal clauses, which are analysed by means of a model checking tool; then, the passage retrieval system is used to extract conflicting sentences from the original contract text.  相似文献   

藏文疑问句的句法分析在藏文问答系统、搜索引擎、信息的抽取和检索等领域有着广泛的应用前景。该文通过分析藏文疑问句的构成特点,对藏文疑问句进行了分类,归纳了各类藏文疑问句的结构特征,进而利用PCFG对藏文疑问句进行了句法分析。经测试,在封闭测试集上的准确率、召回率和F1值分别达97.6%、97.3%和97.4%,在开放测试集上的准确率、召回率和F1值分别达96.0%、95.4%和95.7%。  相似文献   

Internet中文个人信息搜索   总被引:5,自引:0,他引:5  
本文构造了一个用于自动生成Internet个人信息索引的实验系统PersonIndexer。在CERNET两个网址上进行的初步实验表明, PersonIndexer对中文姓名、拼音人名、中文机构名的召回率和精确率平均分别为97.8%和61.9%、100%和64.5%、94.5%和92.1% ,对电子邮件地址和电话传真号码的召回率和精确率均为100%。鉴于Internet上的信息检索以及自然语言处理这两个领域都互向对方提出了要求,我们相信,面向大规模真实文本的汉语分析技术与Internet的结合,将是今后几年中文信息处理一个新的研究热点。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号