首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
目前各省市各部门分别拥有各自专用的专家信息系统,分别管理所在地区和所属领域的专家信息,其形式各异且专家信息分布异构.针对实际应用中共享专家信息困难,基于关键字查询信息效率不高等问题,提出基于语义的专家信息系统解决方案.采用5W1H分析法归纳领域本体中概念和关系,建立基于5W1H的专家本体概念模型并生成专家领域本体,实现了专家语义信息统一建模.构建系统的四层体系结构,研究语义推理规则的定义方法,设计了基于语义的规则推理算法和基于SPAR-QL的语义信息查询算法,实现了系统基于语义的信息查询和推理等功能.实验表明,基于语义的专家信息查询在查全率、查准率优于基于关键字的查询.  相似文献   

2.
同一个字符拥有不同的计算机内部代码,这意味着有两个或两个以上字形在人的眼中是同一个字,而计算机却认为是不同的字。这种“人机看法不一致”会给语言信息处理带来混乱,导致信息检索不全,统计数字不准,字词分类排序不一致等情况。该文结合Unicode实例专题讨论当前计算机上存在的中文同形异码字问题,包括 (a) 私人造字公有化所形成的同形异码字,(b) 兼容编码所形成的同形异码字,(c) 建立专门的笔画部首表而形成的同形异码字,(d) 半宽和全宽字形分别编码而造成的同形异码字等,并探讨解决问题的方法。  相似文献   

3.
The goal of object retrieval is to rank a set of images by the similarity of their contents to those of a query image. However, it is difficult to measure image content similarity due to visual changes caused by varying viewpoint and environment. In this paper, we propose a simple, efficient method to more effectively measure content similarity from image measurements. Our method is based on the ranking information available from existing retrieval systems. We observe that images within the set which, when used as queries, yield similar ranking lists are likely to be relevant to each other and vice versa. In our method, ranking consistency is used as a verification method to efficiently refine an existing ranking list, in much the same fashion that spatial verification is employed. The efficiency of our method is achieved by a list-wise min-Hash scheme, which allows rapid calculation of an approximate similarity ranking. Experimental results demonstrate the effectiveness of the proposed framework and its applications.  相似文献   

4.
信息检索的效果很大程度上取决于用户能否输入恰当的查询来描述自身信息需求。很多查询通常简短而模糊,甚至包含噪音。查询推荐技术可以帮助用户提炼查询、准确描述信息需求。为了获得高质量的查询推荐,在大规模“查询-链接”二部图上采用随机漫步方法产生候选集合。利用摘要点击信息对候选列表进行重排序,使得体现用户意图的查询排在比较高的位置。最终采用基于学习的算法对推荐查询中可能存在的噪声进行过滤。基于真实用户行为数据的实验表明该方法取得了较好的效果。  相似文献   

5.
哈希技术被视为最有潜力的相似性搜索方法,其可以用于大规模多媒体数据搜索场合。为了解决在大规模图像情况下,数据检索效率低下的问题,提出了一种基于分段哈希码的倒排索引树结构,该索引结构将哈希码进行分段处理,对每段哈希码维护一个倒排索引树结构,并结合高效的布隆过滤器构建哈希索引结构。为了进一步提高检索准确性,设计了一种准确的排序融合算法,对多个哈希算法的排序结果分别构建加权无向图,采用PageRank的思想对基于多个哈希算法的排序列表的融合技术进行了详细的说明。实验结果表明,基于分段哈希码的倒排索引树结构能极大地提升数据的检索速度。此外,相比于传统的单个哈希算法排序技术,基于多个哈希算法的排序列表融合技术的检索准确率优势显著。  相似文献   

6.
Abstract

This paper describes a method of simplifying inductively generated discrimination trees using a measure of tree quality based on the principle of information economy, which takes into account both the size of the tree and the size of the outcome data after (notional) encoding by that tree. Results of testing this method on a selection of data sets show that it has some practical advantages over previously used techniques for tree-pruning. Some of the theoretical implications of the present method are also discussed.  相似文献   

7.
Due to its storage efficiency and fast query speed, cross-media hashing methods have attracted much attention for retrieving semantically similar data over heterogeneous datasets. Supervised hashing methods, which utilize the labeled information to promote the quality of hashing functions, achieve promising performance. However, the existing supervised methods generally focus on utilizing coarse semantic information between samples (e.g. similar or dissimilar), and ignore fine semantic information between samples which may degrade the quality of hashing functions. Accordingly, in this paper, we propose a supervised hashing method for cross-media retrieval which utilizes the coarse-to-fine semantic similarity to learn a sharing space. The inter-category and intra-category semantic similarity are effectively preserved in the sharing space. Then an iterative descent scheme is proposed to achieve an optimal relaxed solution, and hashing codes can be generated by quantizing the relaxed solution. At last, to further improve the discrimination of hashing codes, an orthogonal rotation matrix is learned by minimizing the quantization loss while preserving the optimality of the relaxed solution. Extensive experiments on widely used Wiki and NUS-WIDE datasets demonstrate that the proposed method outperforms the existing methods.  相似文献   

8.
Traditional Chinese text retrieval systems return a ranked list of documents in response to a user‘s request. While a ranked list of documents may be an appropriate response for the user, frequently it is not.Usually it would be better for the system to provide the answer itself instead of requiring the user to search for the answer in a set of documents. Since Chinese text retrieval has just been developed lately, and due to various specific characteristics of Chinese language, the approaches to its retrieval are quite different from those studies and researches proposed to deal with Western language. Thus, an architecture that augments existing search engines is developed to support Chinese natural language question answering. In this paper a new approach to building Chinese question-answering system is described, which is the general-purpose, fully-automated Chinese question-answering system available on the web. In the approach, we attempt to represent Chinese text by its characteristics, and try to convert the Chinese text into ERE (E: entity, R: relation) relation data lists, and then to answer the question through ERE relation model. The system performs quite well giving the simplicity of the techniques being utilized. Experimental results show that question-answering accuracy can be greatly improved by analyzing more and more matching ERE relation data lists. Simple ERE relation data extraction techniques work well in our system making it efficient to use with many backend retrieval engines.  相似文献   

9.
该文采用联合熵算法(Union Entropy,UE)初步确定了蒙古文停用词,接着从初步确定的蒙古文停用词中去掉蒙古文实体名词及同形异义词,再通过对英文停用词和蒙古文停用词的词性比较,确定了蒙古文停用词表。最后用蒙古文停用词表和英文停用词表进行了文档信息检索的对比实验。实验结果表明,用该文所述方法确定的蒙古文停用词表进行蒙古文文档检索,比用英文停用词翻译成蒙古文进行蒙古文文档检索的准确率更高。  相似文献   

10.
We propose a new image retrieval system using partitioned iterated function system (PIFS) codes. In PIFS encoding, a compression code contains mapping information between similar regions in the same image. This mapping information can be treated as vectors, and representative vectors can be generated using them. Representative vectors describe the features of the image. Hence, the similarity between images is directly calculable from representative vectors. This similarity is applicable to image retrieval. In this article, we explain this scheme and demonstrate its efficiency experimentally. This work was presented, in part, at the 8th International Symposium on Artificial Life and Robotics, Oita, Japan, January 24#x2013;26, 2003  相似文献   

11.
To facilitate access to information, companies usually try to anticipate and answer most typical customer’s questions by creating Frequently Asked Questions (FAQs) lists. In this scenario, FAQ retrieval is the area of study concerned with recovering the most relevant Question/Answer pairs contained in FAQ compilations. Despite the amount of effort that has been devoted to investigate FAQ retrieval methods, how to create an maintain high quality FAQs has received less attention. In this article, we propose an entire framework to use, create and maintain intelligent FAQs. Usage mining techniques have been developed to take advantage of usage information in order to provide FAQ managers with meaningful information to improve their FAQs. Usage mining techniques include weaknesses detection and knowledge gaps discovery. In this way, the management of the FAQ is no longer directed only by expert knowledge but also by users requirements.  相似文献   

12.
有大量的关系信息存在于各种各样的Web列表中,但使用目前的搜索引擎却难以找到它们。本文提出了一种基于语义和数据特征的方法,用于识别和抽取Web列表中的关系信息。我们首先建立一个模型,描述所要的关系信息,然后寻找Web上的列表并估计它们是否包含所要的关系信息,当估计值足够大时.则从中抽取所要的关系信息。  相似文献   

13.
张思思  刘宇  赵志滨 《计算机科学》2015,42(12):292-296, 311
分形码用来描述图像内跨尺度相似性冗余信息。通过分形码记录图像特征并将其用于图像相似度判断及图像检索。基于自适应四叉树分割方法,提出了图像快速分形编码方法。所提方法通过邻域内固定块的相似性判别快速提取分形码,减少了图像分割层次,缩短了编码时间并保证了图像解码质量。同时提出了一种新的快速判别图像间相似块的距离公式,提升了图像相似度判断的准确性。实验结果表明,相对于灰度直方图判别法,本算法大幅提高了图像检索的查全-查准率。相比于文献中的分形检索算法,本算法缩短了编码时间并降低了分割块数,从而提高了检索效率。  相似文献   

14.
In order to make a recommendation, a recommender system typically first predicts a user’s ratings for items and then recommends a list of items to the user which have high predicted ratings. Quality of predictions is measured by accuracy, that is, how close the predicted ratings are to actual ratings. On the other hand, quality of recommendation lists is evaluated from more than one perspective. Since accuracy of predicted ratings is not enough for customer satisfaction, metrics such as novelty, serendipity, and diversity are also used to measure the quality of the recommendation lists. Aggregate diversity is one of these metrics which measures the diversity of items across the recommendation lists of all users. Increasing aggregate diversity is important because it leads a more even distribution of items in the recommendation lists which prevents the long-tail problem. In this study, we propose two novel methods to increase aggregate diversity of a recommender system. The first method is a reranking approach which takes a ranked list of recommendations of a user and reranks it to increase aggregate diversity. While the reranking approach is applied after model generation as a wrapper the second method is applied in model generation phase which has the advantage of being more efficient in the generation of recommendation lists. We compare our methods with the well-known methods in the field and show the superiority of our methods using real-world datasets.  相似文献   

15.
袁涛  曲强  姜青山 《集成技术》2024,13(3):4-24
在这个海量数据时代,DNA 是一种很好的新信息存储媒介。与传统的物理存储介质相比,它具有能耗低、存储密度高、存储寿命长等固有的优点。随着 DNA 存储技术的快速发展,如何保障新技术下的信息安全至关重要。为此,该文结合加密领域研究和 DNA 编码领域研究,提出了一种基于混沌系统和喷泉码的 DNA 加密编码方法,利用混沌系统加密原理,在 DNA 喷泉码编码过程中进行加密,在保留 DNA 喷泉码特性的同时,保障了编码信息的安全性。该方法可用于任意类型数据,可实现高信息密度和任意约束条件的 DNA 编码。同时,通过仿真实验证明,该方法可以有效抵抗多种密码学攻击,并对 DNA 存储过程产生的数据错误有一定纠错能力。  相似文献   

16.
传统的推荐算法多以优化推荐列表的精确度为目标,而忽略了推荐算法的另一个重要指标——多样性。提出了一种新的提高推荐列表多样性的方法。该方法将列表生成步骤转换为N次概率选择过程,每次概率选择通过两个步骤完成:类型选择与项目选择。在类型选择中,引入项目的类型信息,根据用户对不同项目类型的喜好计算概率矩阵,并依照该概率矩阵选择一个类型;在项目选择中,根据项目的预测评分、项目的历史流行度、项目的推荐流行度3个因素重新计算项目的最终得分,选择得分最高的项目推荐给用户。通过阈值TR来调节多样性与精确度之间的折中。最后,通过对比实验证明了该方法的有效性。  相似文献   

17.
信息检索是数据空间必须提供的一个重要功能,本文介绍一个数据空间信息检索子系统的索引方案.利用数据空间中数据之间的关系,抽取相关的数据,组成一个个基本信息单元,为多源异构异质数据建立一个基于基本信息单元的扩展的倒排索引.实验结果表明:利用基于基本信息单元的索引,系统能返回语义信息相对比较完整的查询结果.  相似文献   

18.
作为信息处理领域的一项应用基础研究,专家检索在科学研究和企业管理等场合具有重要的应用价值,近年来受到广泛关注和持续研究。旨在总结专家检索的研究目标与内容、研究方法,指出存在的问题,从而为其他研究提供借鉴。具体就专家检索模型、隐含主体模型在专家检索中的应用、专家检索测试集作了分析讨论,着重归纳其研究方法为专家建模、链接分析、查询扩展和专家证据识别。最后,展望了专家检索的未来发展趋势。  相似文献   

19.
In this paper, we present a query-driven indexing/retrieval strategy for efficient full text retrieval from large document collections distributed within a structured P2P network. Our indexing strategy is based on two important properties: (1) the generated distributed index stores posting lists for carefully chosen indexing term combinations that are frequently present in user queries, and (2) the posting lists containing too many document references are truncated to a bounded number of their top-ranked elements. These two properties guarantee acceptable latency and bandwidth requirements, essentially because the number of indexing term combinations remains scalable and the posting lists transmitted during retrieval never exceed a constant size. A novel index update mechanism efficiently handles adding of new documents to the document collection. Thus, the generated distributed index corresponds to a constantly evolving query-driven indexing structure that efficiently follows current information needs of the users and changes in the document collection.We show that the size of the index and the generated indexing/retrieval traffic remains manageable even for Web-size document collections at the price of a marginal loss in precision for rare queries. Our theoretical analysis and experimental results provide convincing evidence about the feasibility of the query-driven indexing strategy for large scale P2P text retrieval.  相似文献   

20.
高效检索是数字图书馆的核心业务之一,其中排序是高效信息检索的核心问题。给定一系列的书目列表,利用排序模型生成目标书目的排序列表。将学习排序算法应用于信息检索领域时,常用方法是通过最小化pairwise损失函数值来优化排序模型。然而,已有结论表明,pairwise损失值最小化不一定能得到listwise算法的最佳排序性能。并且将在线学习排序算法与listwise算法相结合也非常困难。提出了一种基于listwise的在线学习排序算法,旨在保证listwise算法性能优势的前提下,实现在线学习排序算法,从而降低检索复杂度。首先解决将在线学习排序算法与listwise算法相结合的问题;然后通过最小化基于预测列表和真实列表定义的损失函数来优化排序模型;最后提出基于online-listwise算法的自适应学习率。实验结果表明,所提出算法具有较好的检索性能和检索速度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号