首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In the paper, a systematic discussion is made of state of the art theory and applications of fine-grained (massive) parallelism, which is permanently being adopted in computational mathematics. All known models of fine-grained computations (cellular automaton, neural and cellular neural networks, statistical automata, etc.) are represented in terms of a unique formalism, the so-called parallel substitution algorithm (PSA), which made it possible, on the one hand, to highlight common properties of the models and, on the other hand, to demonstrate expressiveness capabilities of the PSA. Theoretical and experimental results of studies on applications of fine-grained algorithms to modeling of spatial dynamics of reaction–diffusion and molecular processes are presented. Promising prospects of their application are substantiated both for the creation of special processors designed for the implementation of the algorithms and for the implementation of the algorithms on multiprocessor systems.  相似文献   

2.
在计算广告学中,为用户查询返回相关的广告一直是研究的热点。然而用户的查询一般比较简短,广告的表示也局限在简短的创意和一些竞价词上,返回符合用户查询意图的广告十分困难。为了解决这个问题,该文提出利用多特征融合的方法进行广告查询扩展,先将查询输入到搜索引擎中,获得Top-k网页查询结果,将它们作为获取扩展词的外部资源,由于采用一般的特征选取方法获取扩展词采用的特征比较单一,缺乏语义信息,容易产生主题漂移现象,该文通过计算扩展词和查询词在网页查询结果中的共现度,并融合传统的TF特征和词性信息,获得与原始查询语义相关的扩展词。在真实的广告语料上的实验结果显示,基于多特征融合的选择广告扩展词的方法能有效地提高返回广告的相关性。  相似文献   

3.
沈玺  王永成 《计算机仿真》2006,23(2):222-226
使用语音识别技术为搜索引擎提供语音查询接口,使得奇询概念的输入更为简便。但是,由于查询概念中存在大量的专有名词和名称,识别精度往往不高,影响搜索结果的准确率。该文提出一种在新闻领域内,利用新闻领域知识提高查询概念识别率的方法,通过计算语音识别结果与新闻概念库中概念的语音相似度确定备选结果,计算备选结果与辅助概念的新闻相关度来确定最终的查询概念。实验证明,该方法对新闻搜索引擎的查询概念的纠错收到了良好的效果。  相似文献   

4.
Web检索查询意图分类技术综述   总被引:8,自引:1,他引:7  
查询分类是近年来信息检索领域的研究热点,并且在很多领域得到了广泛地关注。主要讨论根据查询的意图进行分类的研究工作,从查询分类的诞生背景、关键技术、所使用的分类方法和评价方法方面进行综述评论,提出了查询意图分类面临的问题和挑战。认为缺乏权威的评测标准、在大规模数据集上的未经全面测试的性能、如何准确地获取查询的特征以及如何证明分类体系的完备性和独立性是目前查询意图分类研究的关键问题。  相似文献   

5.
网页搜索引擎查询日志的Session划分研究   总被引:4,自引:1,他引:3  
搜索引擎查询日志中的session (以下简称session)是指某特定用户为得到某个信息需求而在一段时间内的搜索行为的连续序列。Session的正确划分是进行用户搜索行为分析等一系列工作的重要基础,目前尚没有关于session的系统研究工作。本文针对相关研究工作的问题重新统一定义了session的概念并进行探索和比较研究,得出结论(1)统计语言模型因数据稀疏问题不适合做session划分;(2)利用多种属性的决策树方法可以得到比较理想的结果,以session为单位进行评价,F值达到了78.6%。  相似文献   

6.
网络上的专业搜索引擎数量众多,普通用户在选择时往往无所适从。文章提出了一个自动的查询导向系统,可以将用户查询自动导向到合适的专业搜索引擎,解决了这个矛盾。  相似文献   

7.
一种基于半监督学习的多模态Web查询精化方法   总被引:1,自引:0,他引:1  
Web搜索系统往往通过与用户的交互来精化查询以提高搜索性能.除文字之外,网页中还存在着大量其它模态的信息,如图像、音频和视频等.以往对于查询精化的研究很少涉及对多模态信息的利用.文中提出了一种基于半监督学习的多模态Web查询精化方法M2S2QR,将Web查询精化转化为一个机器学习问题加以解决.首先,基于用户判断后的网页信息,分别为不同模态训练相应的学习器,然后利用未经用户判断的网页信息来提高学习器性能,最后将不同模态学习器结合起来使用.实验验证了文中方法的有效性.  相似文献   

8.
Large web search engines have to answer thousands of queries per second with interactive response times. Due to the sizes of the data sets involved, often in the range of multiple terabytes, a single query may require the processing of hundreds of megabytes or more of index data. To keep up with this immense workload, large search engines employ clusters of hundreds or thousands of machines, and a number of techniques such as caching, index compression, and index and query pruning are used to improve scalability. In particular, two-level caching techniques cache results of repeated identical queries at the frontend, while index data for frequently used query terms are cached in each node at a lower level. We propose and evaluate a three-level caching scheme that adds an intermediate level of caching for additional performance gains. This intermediate level attempts to exploit frequently occurring pairs of terms by caching intersections or projections of the corresponding inverted lists. We propose and study several offline and online algorithms for the resulting weighted caching problem, which turns out to be surprisingly rich in structure. Our experimental evaluation based on a large web crawl and real search engine query log shows significant performance gains for the best schemes, both in isolation and in combination with the other caching levels. We also observe that a careful selection of cache admission and eviction policies is crucial for best overall performance. Work supported by NSF CAREER Award CCR-0093400 and the New York State Center for Advanced Technology in Telecommunications (CATT) at Polytechnic University.  相似文献   

9.
dentifying ambiguous queries is crucial to research on personalized Web search and search result diversity. Intuitively, query logs contain valuable information on how many intentions users have when issuing a query. However, previous work showed user clicks alone are misleading in judging a query as being ambiguous or not. In this paper, we address the problem of learning a query ambiguity model by using search logs. First, we propose enriching a query by mining the documents clicked by users and the relevant follow up queries in a session. Second, we use a text classifier to map the documents and the queries into predefined categories. Third, we propose extracting features from the processed data. Finally, we apply a state-of-the-art algorithm, Support Vector Machine (SVM), to learn a query ambiguity classifier. Experimental results verify that the sole use of click based features or session based features perform worse than the previous work based on top retrieved documents. When we combine the two sets of features, our proposed approach achieves the best effectiveness, specifically 86% in terms of accuracy. It significantly improves the click based method by 5.6% and the session based method by 4.6%.  相似文献   

10.
Search engines continue to struggle with the challenges presented by Web search: vague queries, impatient users and an enormous and rapidly expanding collection of unmoderated, heterogeneous documents all make for an extremely hostile search environment. In this paper we argue that conventional approaches to Web search -- those that adopt a traditional, document-centric, information retrieval perspective -- are limited by their refusal to consider the past search behaviour of users during future search sessions. In particular, we argue that in many circumstances the search behaviour of users is repetitive and regular; the same sort of queries tend to recur and the same type of results are often selected. We describe how this observation can lead to a novel approach to a more adaptive form of search, one that leverages past search behaviours as a means to re-rank future search results in a way that recognises the implicit preferences of communities of searchers. We describe and evaluate the I-SPY search engine, which implements this approach to collaborative, community-based search. We show that it offers potential improvements in search performance, especially in certain situations where communities of searchers share similar information needs and use similar queries to express these needs. We also show that I-SPY benefits from important advantages when it comes to user privacy. In short, we argue that I-SPY strikes a useful balance between search personalization and user privacy, by offering a unique form of anonymous personalization, and in doing so may very well provide privacy-conscious Web users with an acceptable approach to personalized search.  相似文献   

11.
有效的多关键字查询路由是P2PWeb搜索中的一个关键问题。文章提出一种基于收益代价比的查询处理方法。该方法基于DHT的P2P覆盖网,挖掘关键字的关联性和节点间覆盖度和重叠度。利用最小独立置换进行重叠检测,因此避免了对相同记录的冗余路由。实验证明了该方法显著减少了查询时间,同时提高了查全率和查准率。  相似文献   

12.
The conventional approaches of finding related search engine queries rely on the common terms shared by two queries to measure their relatedness. However, search engine queries are usually short and the term overlap between two queries is very small. Using query terms as a feature space cannot accurately estimate relatedness. Alternative feature spaces are needed to enrich the term based search queries. In this paper, given a search query, first we extract the Web pages accessed by users from Japanese Web access logs which store the users individual and collective behavior. From these accessed Web pages we usually can get two kinds of feature spaces, i.e, content-sensitive (e.g., nouns) and content-ignorant (e.g., URLs), to enrich the expressions of search queries. Then, the relatedness between search queries can be estimated on their enriched expressions. Our experimental results show that the URL feature space produces much lower precision scores than the noun feature space which, however, is not applicable in non-text pages, dynamic pages and so on. It is crucial to improve the quality of the URL (content-ignorant) feature space since it is generally available in all types of Web pages. We propose a novel content-ignorant feature space, called Web community which is created from a Japanese Web page archive by exploiting link analysis. Experimental results show that the proposed Web community feature space generates much better results than the URL feature space.  相似文献   

13.
针对搜索引擎查询结果缓存与预取问题,与传统的基于查询特性相关的方法不同,提出了一种基于用户特性的缓存与预取方法,用于提高搜索引擎系统性能,尤其针对部分用户效果更显著。通过对国内某著名商业搜索引擎用户的查询贡献分析得出,用户对搜索引擎的贡献具有长尾分布特性,结合该特性设计查询结果预测模型来进行预取和分区缓存。在该搜索引擎两个月的大规模真实用户查询日志上的实验结果表明,与传统的基于查询特性的典型方法相比,该方法可以获得3.03%~4.17%的命中率提升,对于查询贡献最大的0.25%的用户群体,可以获得20.52%~28.2%的命中率提升。  相似文献   

14.
应用链接分析的web搜索结果聚类   总被引:3,自引:0,他引:3  
随着web上信息的急剧增长,如何有效地从web上获得高质量的web信息已经成为很多研究领域里的热门研究主题之一,比如在数据库,信息检索等领域。在信息检索里,web搜索引擎是最常用的工具,然而现今的搜索引擎还远不能达到满意的要求,使用链接分析,提出了一种新的方法用来聚类web搜索结果,不同于信息检索中基于文本之间共享关键字或词的聚类算法,该文的方法是应用文献引用和匹配分析的方法,基于两web页面所共享和匹配的公共链接,并且扩展了标准的K-means聚类算法,使它更适合于处理噪音页面,并把它应用于web结果页面的聚类,为验证它的有效性,进行了初步实验,实验结果显示通过链接分析对web搜索结果聚类取得了预期效果  相似文献   

15.
As modern search engines are approaching the ability to deal with queries expressed in natural language, full support of natural language interfaces seems to be the next step in the development of future systems. The vision is that of users being able to tell a computer what they would like to find, using any number of sentences and as many details as requested. In this article we describe our effort to move towards this future using currently available technology. The Semantic Web framework was chosen as the best means to achieve this goal. We present our approach to building a complete Semantic Web Search Using Natural Language (SWSNL) system. We cover the complete process which includes preprocessing, semantic analysis, semantic interpretation, and executing a SPARQL query to retrieve the results. We perform an end-to-end evaluation on a domain dealing with accommodation options. The domain data come from an existing accommodation portal and we use a corpus of queries obtained by a Facebook campaign. In our paper we work with written texts in the Czech language. In addition to that, the Natural Language Understanding (NLU) module is evaluated on another domain (public transportation) and language (English). We expect that our findings will be valuable for the research community as they are strongly related to issues found in real-world scenarios. We struggled with inconsistencies in the actual Web data, with the performance of the Semantic Web engines on a decently sized knowledge base, and others.  相似文献   

16.
一些数字信号处理程序存在强数据相关性,在将这些数字信号处理程序划分到多核DSP上时,需要开发细粒度并行性,而细粒度并行性的开发需要快速的核间通信机制支持。本文提出了一种新的面向多核DSP的快速核间通信机制:标记式共享寄存器文件TSRF,TSRF由所有的DSP核共享,寄存器文件中的每个寄存器同一个有效标记位关联,该标记位提供了核间通信同步支持。本文构建了集成TSRF机制的多核DSP原型的周期精确模拟器,该多核DSP原型包含的处理器核数目为4个。通过详细模拟,我们使用数据相关性较强的数字信号处理算法:IIR滤波和ADPCM编解码,对TSRF机制的性能进行了测试,与单核DSP相比,TSDB机制性能提升分别为1.8、1.2和1.9左右。  相似文献   

17.
陈海燕 《计算机科学》2015,42(1):261-267
词汇语义相似度的计算在网页浏览和查询推荐等网络相关工作中起着重要的作用.传统的基于分类的方法不能处理持续出现的新词.由于网络数据中隐藏着大量的噪音和冗余,鲁棒性和准确性仍然是一个挑战,因此提出了一种基于搜索引擎的词汇语义相似度计算方法.语义片段和检索结果的页数被用来去除词汇语义相似度计算过程中的噪音和冗余.此外,还提出了一种方法来整合查询结果页数、语义片段和显示的搜索结果的数量,该方法不需要任何先验知识与本体.实验结果显示,所提出的方法在Rubenstein-Goodenough测试集的相关系数为0.851,优于现有的基于网络的词汇语义相似度计算方法,同时在搜索引擎的查询扩展任务中具有较为良好的应用效果.  相似文献   

18.
基于链接路径预测的聚焦Web实体搜索   总被引:1,自引:1,他引:0  
实体搜索是一个有前景的研究领域,因为它能够为用户提供更为详细的Web信息.快速、完全地收集特定领域实体所在的网页是实体搜索中的一个关键问题.为了解决这个问题,将Web网站建模为一组互连的状态构成的图,提出一种链接路径预测学习算法LPC,该模型能够学习大型网站中从主页通向目标网页的最优路径,从而指导爬虫快速定位到含有Web实体的目标网页.LPC算法分为两个阶段:首先,使用概率无向图模型CRF,学习从网站主页通往目标网页的链接路径模型,CRF模型能够融合超连接和网页中的各种特征,包括状态特征和转移特征;其次,结合增强学习技术和训练的CRF模型对爬行前端队列的超链接进行优先级评分.一种来自增强学习的折扣回报方法通过利用路径分类阶段学习的CRF模型来计算连接的回报值。在多个领域大量真实数据上的实验结果表明,所提出的适用CRF模型指导的链接路径预测爬行算法LPC的性能明显优于其他聚焦爬行算法.  相似文献   

19.
Deep Web中蕴含着大量高质量的数据,然而只有通过Web查询接口对Web数据库提交查询才能获取这些数据,因此,自动获取Web查询接口模式是实现Web数据库集成的关键.将Web查询接口模式的抽取过程看作一个词法分析的过程,通过构建EGLM-FA(元素分组及标签匹配有限状态自动机)来完成对Web查询接口模式的抽取.首先应用Html呈现引擎将Web查询接口所在页面进行解析,利用查询接口Form中的DOM节点及其坐标信息构建相应的NSS(节点空间结构),之后再将所有的NSS组成NSS列表,将NSS列表作为EGLM-FA的输入,进而抽取出Web查询接口的模式.  相似文献   

20.
一种基于ASP在WWW上实现的数据库查询   总被引:3,自引:2,他引:3  
在MicrosoftWindowsNTServer4.0上集成IIS,由ASP脚本启动ADO控作,通过ODB驱动程序,实现数据库服务器与Web服务器的连接,是在Internet上发布数据库的一种有效方法。该文介绍了一种用该方法实现通用数据库查询的程序设计技术,给出了源代码。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号