期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Exploring large-scale small file storage for search engines

Weizhe Zhang Gangzhao Lu Hui He Qizhen Zhang Chuanliang Yu 《The Journal of supercomputing》2016,72(8):2911-2923

相似文献

2.

A graph-based cache for large-scale similarity search engines

Veronica Gil-Costa Mauricio Marin Carolina Bonacic Roberto Solar 《The Journal of supercomputing》2018,74(5):2006-2034

Large-scale similarity search engines are complex systems devised to process unstructured data like images and videos. These systems are deployed on clusters of distributed processors communicated through high-speed networks. To process a new query, a distance function is evaluated between the query and the objects stored in the database. This process relays on a metric space index distributed among the processors. In this paper, we propose a cache-based strategy devised to reduce the number of computations required to retrieve the top-k object results for user queries by using pre-computed information. Our proposal executes an approximate similarity search algorithm, which takes advantage of the links between objects stored in the cache memory. Those links form a graph of similarity among pre-computed queries. Compared to the previous methods in the literature, the proposed approach reduces the number of distance evaluations up to 60%. 相似文献

3.

Using manual and automated annotations to search images by semantic similarity

Jo?o Magalh?es Stefan Rüger 《Multimedia Tools and Applications》2012,56(1):109-129

Finding semantically similar images is a problem that relies on image annotations manually assigned by amateurs or professionals, or automatically computed by some algorithm using low-level image features. These image annotations create a keyword space where a dissimilarity function quantifies the semantic relationship among images. In this setting, the objective of this paper is two-fold. First, we compare amateur to professional user annotations and propose a model of manual annotation errors, more specifically, an asymmetric binary model. Second, we examine different aspects of search by semantic similarity. More specifically, we study the accuracy of manual annotations versus automatic annotations, the influence of manual annotations with different accuracies as a result of incorrect annotations, and revisit the influence of the keyword space dimensionality. To assess these aspects we conducted experiments on a professional image dataset (Corel) and two amateur image datasets (one with 25,000 Flickr images and a second with 269,648 Flickr images) with a large number of keywords, with different similarity functions and with both manual and automatic annotation methods. We find that Amateur-level manual annotations offers better performance for top ranked results in all datasets (MP@20). However, for full rank measures (MAP) in the real datasets (Flickr) retrieval by semantic similarity with automatic annotations is similar or better than amateur-level manual annotations. 相似文献

4.

融合《知网》和搜索引擎的词汇语义相似度计算

张硕望欧阳纯萍阳小华刘永彬刘志明《计算机应用》2017,37(4):1056-1060

针对当前《知网》的词语语义描述与人们对词汇的主观认知之间存在诸多不匹配的问题,在充分利用丰富的网络知识的背景下,提出了一种融合《知网》和搜索引擎的词汇语义相似度计算方法。首先,考虑了词语与词语义原之间的包含关系,利用改进的概念相似度计算方法得到初步的词语语义相似度结果;然后,利用基于搜索引擎的相关性双重检测算法和点互信息法得出进一步的语义相似度结果;最后,设计了拟合函数并利用批量梯度下降法学习权值参数,融合前两步的相似度计算结果。实验结果表明,与单纯的基于《知网》和基于搜索引擎的改进方法相比,融合方法的斯皮尔曼系数和皮尔逊系数均提升了5%,同时提升了具体词语义描述与人们对词汇的主观认知之间的匹配度,验证了将网络知识背景融入到概念相似度计算方法中能有效提高中文词汇语义相似度的计算性能。相似文献

5.

Creating explorable extended reality environments with semantic annotations

Flotyński Jakub 《Multimedia Tools and Applications》2021,80(5):6959-6989

Multimedia Tools and Applications - The main element of extended reality (XR) environments is behavior-rich 3D content consisting of objects that act and interact with one another as well as with... 相似文献

6.

Mining temporal explicit and implicit semantic relations between entities using web search engines

《Future Generation Computer Systems》2014

In this paper, we study the problem of mining temporal semantic relations between entities. The goal of the studied problem is to mine and annotate a semantic relation with temporal, concise, and structured information, which can release the explicit, implicit, and diversity semantic relations between entities. The temporal semantic annotations can help users to learn and understand the unfamiliar or new emerged semantic relations between entities. The proposed temporal semantic annotation structure integrates the features from IEEE and Renlifang. We propose a general method to generate temporal semantic annotation of a semantic relation between entities by constructing its connection entities, lexical syntactic patterns, context sentences, context graph, and context communities. Empirical experiments on two different datasets including a LinkedIn dataset and movie star dataset show that the proposed method is effective and accurate. Different from the manually generated annotation repository such as Wikipedia and LinkedIn, the proposed method can automatically mine the semantic relation between entities and does not need any prior knowledge such as ontology or the hierarchical knowledge base. The proposed method can be used on some applications, which proves the effectiveness of the proposed temporal semantic relations on many web mining tasks. 相似文献

7.

Paid search [search engines]

Jansen B.J. 《Computer》2006,39(7):88-90

With paid search, the content provider, search engine, and user have mutually supporting goals. With paid or sponsored search, content providers pay Web search engines to display sponsored links in response to user queries alongside the algorithmic links, also known as organic or nonsponsored links. 相似文献

8.

Comparing Internet search engines

Kingoff A. 《Computer》1997,30(4):117-118

Search engines are sophisticated utilities designed expressly to find information on the global Internet. An expensive combination of high-speed computer networks and specialized software, they are usually created by large corporations and occasionally by universities. They are freely available to anyone with Internet access, and there are no search restrictions. With more than 150 search engines available, choosing the right one (or ones) is important. As with most products, no single engine is best for all searches and all users all the time. After comparing 50 of the most popular and powerful engines, I narrowed the field down to the four I found most useful: Alta Vista, Deja News, Excite and Yahoo 相似文献

9.

Theory of search engines 总被引：4，自引：0，他引：4

K. K. Nambiar 《Computers & Mathematics with Applications》2001,42(12):1523-1526

Four different stochastic matrices, useful for ranking the pages of the web are defined. The theory is illustrated with examples. 相似文献

10.

Tailored semantic annotation for semantic search

《Journal of Web Semantics》2015

This paper presents a novel method for semantic annotation and search of a target corpus using several knowledge resources (KRs). This method relies on a formal statistical framework in which KR concepts and corpus documents are homogeneously represented using statistical language models. Under this framework, we can perform all the necessary operations for an efficient and effective semantic annotation of the corpus. Firstly, we propose a coarse tailoring of the KRs w.r.t the target corpus with the main goal of reducing the ambiguity of the annotations and their computational overhead. Then, we propose the generation of concept profiles, which allow measuring the semantic overlap of the KRs as well as performing a finer tailoring of them. Finally, we propose how to semantically represent documents and queries in terms of the KRs concepts and the statistical framework to perform semantic search. Experiments have been carried out with a corpus about web resources which includes several Life Sciences catalogs and Wikipedia pages related to web resources in general (e.g., databases, tools, services, etc.). Results demonstrate that the proposed method is more effective and efficient than state-of-the-art methods relying on either context-free annotation or keyword-based search. 相似文献

11.

User reactions to search engines logos: investigating brand knowledge of web search engines

Bernard J. Jansen Lu Zhang Anna S. Mattila 《Electronic Commerce Research》2012,12(4):429-454

In this work, we investigate consumer reaction to web search engine logos. Our research is motivated by a small number of search engines dominating a market in which there are little switching costs. The major research goal is to investigate the effect that brand logos have on search engine brand knowledge, which includes brand image and brand awareness. To investigate this goal, we employ a survey of 207 participants and use a mixed method approach of sentiment analysis and mutual information statistic to investigate our research questions. Our findings reveal that some search engines have logos that do not communicate a clear meaning, resulting in a confused brand message. Brand image varies among the top search engines, with consumers possessing generally extremely positive or negative brand opinions. Google elicited a string of positive comments from the participants, to the point of several uses of the term ‘love.’ This is in line with the ultimate brand equity that Google has achieved (i.e., the generic term for web search). Most of the other search engines, including Microsoft, had primarily negative terms associated with them, although AOL, Ask, and Yahoo! had a mix of both positive and negative comments. Implications are that the brand logo may be an important interplay component with the technology for both established search engines and those entering the market. 相似文献

12.

Neural memories and search engines

Eduardo Mizraji 《国际通用系统杂志》2013,42(6):715-732

In this article, we show the existence of a formal convergence between the matrix models of biological memories and the vector space models designed to extract information from large collections of documents. We first show that, formally, the term-by-document matrix (a mathematical representation of a set of codified documents) can be interpreted as an associative memory. In this framework, the dimensionality reduction of the term-by-document matrices produced by the latent semantic analysis (LSA) has a common factor with the matrix biological memories. This factor consists in the generation of a statistical ‘conceptualisation’ of data using little dispersed weighted averages. Then, we present a class of matrix memory that built up thematic blocks using multiplicative contexts. The thematic memories define modular networks that can be acceded using contexts as passwords. This mathematical structure emphasises the contacts between LSA and matrix memory models and invites to interpret LSA, and similar procedures, as a reverse engineering applied on context-deprived cognitive products, or on biological objects (e.g. genomes) selected during large evolutionary processes. 相似文献

13.

Analyzing the emotional outcomes of the online search behavior with search engines

Carlos Flavián-Blanco Raquel Gurrea-Sarasa Carlos Orús-Sanclemente 《Computers in human behavior》2011,27(1):540-551

The affective component has been acknowledged as critical to understand information search behavior and user-computer interactions. There is a lack of studies that analyze the emotions that the user feels when searching for information about products with search engines. The present study analyzes the emotional outcomes of the online search process, taking into account the user’s (a) perceptions of success and effort exerted on the search process, (b) initial affective state, and (c) emotions felt during the search process. In addition, we identify profiles of online searchers based on the emotional outcomes of the search process, which allow us to differentiate the emotional processes and behavioral patterns that lead to such emotions. The results of the study stress the importance of the affective component of the online search behavior, given that these emotional outcomes are likely to influence all the subsequent actions that users perform on the Web. 相似文献

14.

Improving the freshness of the search engines by a probabilistic approach based incremental crawler

G.?Pavai Email author T.?V.?Geetha 《Information Systems Frontiers》2017,19(5):1013-1028

Web is flooded with data. While the crawler is responsible for accessing these web pages and giving it to the indexer for making them available to the users of search engine, the rate at which these web pages change has created the necessity for the crawler to employ refresh strategies to give updated/modified content to the search engine users. Furthermore, Deep web is that part of the web that has alarmingly abundant amounts of quality data (when compared to normal/surface web) but not technically accessible to a search engine’s crawler. The existing deep web crawl methods helps to access the deep web data from the result pages that are generated by filling forms with a set of queries and accessing the web databases through them. However, these methods suffer from not being able to maintain the freshness of the local databases. Both the surface web and the deep web needs an incremental crawl associated with the normal crawl architecture to overcome this problem. Crawling the deep web requires the selection of an appropriate set of queries so that they can cover almost all the records in the data source and in addition the overlapping of records should be low so that network utilization is reduced. An incremental crawl adds to an increase in the network utilization with every increment. Therefore, a reduced query set as described earlier should be used in order to minimize the network utilization. Our contributions in this work are the design of a probabilistic approach based incremental crawler to handle the dynamic changes of the surface web pages, adapting the above mentioned method with a modification to handle the dynamic changes in the deep web databases, a new evaluation measure called the ‘Crawl-hit rate’ to evaluate the efficiency of the incremental crawler in terms of the number of times the crawl is actually necessary in the predicted time and a semantic weighted set covering algorithm for reducing the queries so that the network cost is reduced for every increment of the crawl without any compromise in the number of records retrieved. The evaluation of incremental crawler shows a good improvement in the freshness of the databases and a good Crawl-hit rate (83 % for web pages and 81 % for deep web databases) with a lesser over head when compared to the baseline. 相似文献

15.

On-chip real-time feature extraction using semantic annotations for object recognition

Ying-Hao Yu Tsu-Tian Lee Pei-Yin Chen Ngaiming Kwok 《Journal of Real-Time Image Processing》2018,15(2):249-264

Describing image features in a concise and perceivable manner is essential to focus on candidate solutions for classification purpose. In addition to image recognition with geometric modeling and frequency domain transformation, this paper presents a novel 2D on-chip feature extraction named semantics-based vague image representation (SVIR) to reduce the semantic gap of content-based image retrieval. The development of SVIR aims at successively deconstructing object silhouette into intelligible features by pixel scans and then evolves and combines piecewise features into another pattern in a linguistic form. In addition to semantic annotations, SVIR is free of complicated calculations so that on-chip designs of SVIR can attain real-time processing performance without making use of a high-speed clock. The effectiveness of SVIR algorithm was demonstrated with timing sequences and real-life operations based on a field-programmable-gate-array (FPGA) development platform. With low hardware resource consumption on a single FPGA chip, the design of SVIR can be used on portable machine vision for ambient intelligence in the future. 相似文献

16.

Accessibility-based reranking in multimedia search engines

Kalamaras Ilias Dimitriou Nikolaos Drosou Anastasios Tzovaras Dimitrios 《Multimedia Tools and Applications》2017,76(14):15923-15949

Multimedia Tools and Applications - Traditional multimedia search engines retrieve results based mostly on the query submitted by the user, or using a log of previous searches to provide... 相似文献

17.

Web search engines. Part 1

Hawking D. 《Computer》2006,39(6):86-88

In this article, we go behind the scenes and explain how this data processing "miracle" is possible. We focus on whole-of-Web search but note that enterprise search tools and portal search interfaces use many of the same data structures and algorithms. Search engines cannot and should not index every page on the Web. After all, thanks to dynamic Web page generators such as automatic calendars, the number of pages is infinite. To provide a useful and cost-effective service, search engines must reject as much low-value automated content as possible. In addition, they can ignore huge volumes of Web-accessible data, such as ocean temperatures and astrophysical observations, without harm to search effectiveness. Finally, Web search engines have no access to restricted content, such as pages on corporate intranets. What follows is not an inside view of any particular commercial engine - whose precise details are jealously guarded secrets - but a characterization of the problems that whole-of-Web search services face and an explanation of the techniques available to solve these problems. 相似文献

18.

Introducing lateral thinking in search engines

Yann Landrin-Schweitzer Pierre Collet Evelyne Lutton 《Genetic Programming and Evolvable Machines》2006,7(1):9-31

Decomposing a very complex problem into smaller subproblems that are much easier to solve is not a new idea. The “Parisian Approach”[9] applies this principle extensively to shatter complexity by cutting down the original problem into many small subproblems that are then globally optimized thanks to an evolutionary algorithm. This paper describes how this approach has been used to interactively evolve a user profile to be used by a search engine. User queries are rewritten thanks to the evolved profile, resulting in an increased diversity in the retrieved documents that is showing an interesting property: even though precision is lost, retrieved documents relate both to the user’s query and to his areas of interest in a manner that evokes “lateral thinking”. This paper describes ELISE, an Evolutionary Learning Interactive Search Engine that interactively evolves rewriting modules and rules (some kind of elaborated user profile) along a Parisian Approach. Results obtained over a public domain benchmark (Cystic Fibrosis Database) are presented and discussed. This research is partly funded by Novartis-Pharma (IK@N/KE) 相似文献

19.

Location-aware query reformulation for search engines

Zhipeng Huang Yuqiu Qian Nikos Mamoulis 《GeoInformatica》2018,22(4):869-893

Query reformulation, including query recommendation and query auto-completion, is a popular add-on feature of search engines, which provide related and helpful reformulations of a keyword query. Due to the dropping prices of smartphones and the increasing coverage and bandwidth of mobile networks, a large percentage of search engine queries are issued from mobile devices. This makes it possible to improve the quality of query recommendation and auto-completion by considering the physical locations of the query issuers. However, limited research has been done on location-aware query reformulation for search engines. In this paper, we propose an effective spatial proximity measure between a query issuer and a query with a location distribution obtained from its clicked URLs in the query history. Based on this, we extend popular query recommendation and auto-completion approaches to our location-aware setting, which suggest query reformulations that are semantically relevant to the original query and give results that are spatially close to the query issuer. In addition, we extend the bookmark coloring algorithm for graph proximity search to support our proposed query recommendation approaches online, and we adapt an A* search algorithm to support our query auto-completion approach. We also propose a spatial partitioning based approximation that accelerates the computation of our proposed spatial proximity. We conduct experiments using a real query log, which show that our proposed approaches significantly outperform previous work in terms of quality, and they can be efficiently applied online. 相似文献

20.

Very large-scale neighborhood search

R.K. Ahuja J.B. Orlin D. Sharma 《International Transactions in Operational Research》2000,7(4-5):301-317

Neighborhood search algorithms are often the most effective approaches available for solving partitioning problems, a difficult class of combinatorial optimization problems arising in many application domains including vehicle routing, telecommunications network design, parallel machine scheduling, location theory, and clustering. A critical issue in the design of a neighborhood search algorithm is the choice of the neighborhood structure, that is, the manner in which the neighborhood is defined. Currently, the two-exchange neighborhood is the most widely used neighborhood for solving partitioning problems. The paper describes the cyclic exchange neighborhood , which is a generalization of the two-exchange neighborhood in which a neighbor is obtained by performing a cyclic exchange . The cyclic exchange neighborhood has substantially more neighbors compared to the two-exchange neighborhood. This paper outlines a network optimization based methodology to search the neighborhood efficiently and presents a proof of concept by applying it to the capacitated minimum spanning tree problem, an important problem in telecommunications network design. 相似文献