首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Service scheduling is one of the crucial issues in E-commerce environment. E-commerce web servers often get overloaded as they have to deal with a large number of customers’ requests—for example, browse, search, and pay, in order to make purchases or to get product information from E-commerce web sites. In this paper, we propose a new approach in order to effectively handle high traffic load and to improve web server’s performance. Our solution is to exploit networking techniques and to classify customers’ requests into different classes such that some requests are prioritised over others. We contend that such classification is financially beneficial to E-commerce services as in these services some requests are more valuable than others. For instance, the processing of “browse” request should get less priority than “payment” request as the latter is considered to be more valuable to the service provider. Our approach analyses the arrival process of distinct requests and employs a priority scheduling service at the network nodes that gives preferential treatment to high priority requests. The proposed approach is tested through various experiments which show significant decrease in the response time of high priority requests. This also reduces the probability of dropping high priority requests by a web server and thus enabling service providers to generate more revenue.  相似文献   

2.
Service-oriented computing (SOC) suggests that the Internet will be an open repository of many modular capabilities realized as web services. Organizations may be able to leverage this SOC paradigm if their employees are able to ubiquitously incorporate such capabilities and their resulting information into their daily practices. It is impractical to assume that human users will be able to manually search vast distributed repositories at real-time. This paper presents an architecture, Software Agent-Based Groupware using E-services (SAGE), that incorporates the use of intelligent agents to integrate human users with web services. SAGE provides background search and discovery approaches, thus enabling human users to exploit service-based capabilities that were previously too time-consuming to locate and integrate. We present a multi-agent system where each agent learns the rule-based preferences of a human user with regards to their current operational “context” and manages the incorporation of relevant web services. Recommended by: Djamal Benslimane and Zakaria Maamar  相似文献   

3.
For people with non-ordinary interests, it is hard to search for information on the Internet because search engines are impersonalized and are more focused on “average” individuals with “standard” preferences. In order to improve web search for a community of people with similar but specific interests, we propose to use the implicit knowledge contained in the search behavior of groups of users. We developed a multi-agent recommendation system called Implicit, which supports web search for groups or communities of people. In Implicit, agents observe behavior of their users to learn about the “culture” of the community with specific interests. They facilitate sharing of knowledge about relevant links within the community by means of recommendations. The agents also recommend contacts, i.e., who in the community is the right person to ask for a specific topic. Experimental evaluation shows that Implicit improves the quality of the web search in terms of precision and recall.  相似文献   

4.
Conformance metrics for the mobile web can play a crucial role as far as engineering mobile websites are concerned, especially if they are automatically obtained. In this way, developers can have an idea in numeric terms of how suitable their developments are for mobile devices. However, there are a plethora of devices with their own particular features (screen size, formats support, etc.) that restrict a unified automatic assessment process. This paper proposes a tool-supported method for device-tailored assessment in terms of conformance with Mobile Web Best Practices 1.0, including the definition of five quantitative metrics for automatically measuring mobile web conformance: Navigability, Page layout, Page definition, User input and Overall score. The behaviour of these metrics was analysed for different devices and different web paradigms, both mobile web pages and their equivalent desktop pages. As expected, the results show that mobile web pages on more capable devices score higher. In addition, 20 users took part in an experiment aimed at discovering how conformance-based scores relate to usability dimensions. The results demonstrate that automatic scoring approaches strongly correlate with usability scores obtained by direct observation, such as task completion time and user satisfaction. This correlation is even stronger for the device-tailored assessment than the one that assumes a general profile for all devices. For instance, results show a strong negative correlation between Overall score and task completion time: ρ (9) = −0.81, (p < 0.05) for the generalist approach and ρ (9) = −0.88 for the device-tailored one, entailing that mobile web guidelines and the metrics based on their conformance capture usability aspects. This result challenges the widely accepted belief that conformance to guidelines does not imply more usable web pages, at least for web accessibility conformance.  相似文献   

5.
Mining linguistic browsing patterns in the world wide web   总被引:2,自引:0,他引:2  
 World-wide-web applications have grown very rapidly and have made a significant impact on computer systems. Among them, web browsing for useful information may be most commonly seen. Due to its tremendous amounts of use, efficient and effective web retrieval has thus become a very important research topic in this field. Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for a certain purpose. In this paper, we use the data mining techniques to discover relevant browsing behavior from log data in web servers, thus being able to help make rules for retrieval of web pages. The browsing time of a customer on each web page is used to analyze the retrieval behavior. Since the data collected are numeric, fuzzy concepts are used to process them and to form linguistic terms. A sophisticated web-mining algorithm is thus proposed to find relevant browsing behavior from the linguistic data. Each page uses only the linguistic term with the maximum cardinality in later mining processes, thus making the number of fuzzy regions to be processed the same as the number of the pages. Computational time can thus be greatly reduced. The patterns mined out thus exhibit the browsing behavior and can be used to provide some appropriate suggestions to web-server managers.  相似文献   

6.
A service-oriented architecture (SOA) for web applications is often implemented using web services (WSs) and consists of different operations the executions of which are perceived as events. The order and time-appropriateness of occurrences of these events play a vital role for the proper functioning of a real-time SOA. This paper presents an event-based approach to modeling and testing of functional behavior of WSs by event sequence graphs (ESG). Nodes of an ESG represent events, e.g., “request” or “response”, and arcs give the sequence of these events. For representing parameter values, e.g., for time-out of operation calls, ESG are augmented by decision tables. A case study carried out on a commercial web system with SOA validates the approach and analyzes its characteristic issues. The novelty of the approach stems from (i) its simplicity and lucidity in representing complex real-time web applications based on WSs in SOA, and (ii) its modeling that considers also testing and thus enables a comfortable fault management leading to a holistic view.  相似文献   

7.
Content distribution networks (CDNs) improve scalability and reliability, by replicating content to the “edge” of the Internet. Apart from the pure networking issues of the CDNs relevant to the establishment of the infrastructure, some very crucial data management issues must be resolved to exploit the full potential of CDNs to reduce the “last mile” latencies. A very important issue is the selection of the content to be prefetched to the CDN servers. All the approaches developed so far, assume the existence of adequate content popularity statistics to drive the prefetch decisions. Such information though, is not always available, or it is extremely volatile, turning such methods problematic. To address this issue, we develop self-adaptive techniques to select the outsourced content in a CDN infrastructure, which requires no apriori knowledge of request statistics. We identify clusters of “correlated” Web pages in a site, called Web site communities, and make these communities the basic outsourcing unit. Through a detailed simulation environment, using both real and synthetic data, we show that the proposed techniques are very robust and effective in reducing the user-perceived latency, performing very close to an unfeasible, off-line policy, which has full knowledge of the content popularity.  相似文献   

8.
In this work we propose a model to represent the web as a directed hypergraph (instead of a graph), where links connect pairs of disjointed sets of pages. The web hypergraph is derived from the web graph by dividing the set of pages into non-overlapping blocks and using the links between pages of distinct blocks to create hyperarcs. A hyperarc connects a block of pages to a single page, in order to provide more reliable information for link analysis. We use the hypergraph model to create the hypergraph versions of the Pagerank and Indegree algorithms, referred to as HyperPagerank and HyperIndegree, respectively. The hypergraph is derived from the web graph by grouping pages by two different partition criteria: grouping together the pages that belong to the same web host or to the same web domain. We compared the original page-based algorithms with the host-based and domain-based versions of the algorithms, considering a combination of the page reputation, the textual content of the pages and the anchor text. Experimental results using three distinct web collections show that the HyperPagerank and HyperIndegree algorithms may yield better results than the original graph versions of the Pagerank and Indegree algorithms. We also show that the hypergraph versions of the algorithms were slightly less affected by noise links and spamming.  相似文献   

9.
Abstract. The analysis of web usage has mostly focused on sites composed of conventional static pages. However, huge amounts of information available in the web come from databases or other data collections and are presented to the users in the form of dynamically generated pages. The query interfaces of such sites allow the specification of many search criteria. Their generated results support navigation to pages of results combining cross-linked data from many sources. For the analysis of visitor navigation behaviour in such web sites, we propose the web usage miner (WUM), which discovers navigation patterns subject to advanced statistical and structural constraints. Since our objective is the discovery of interesting navigation patterns, we do not focus on accesses to individual pages. Instead, we construct conceptual hierarchies that reflect the query capabilities used in the production of those pages. Our experiments with a real web site that integrates data from multiple databases, the German SchulWeb, demonstrate the appropriateness of WUM in discovering navigation patterns and show how those discoveries can help in assessing and improving the quality of the site. Received June 21, 1999 / Accepted December 24, 1999  相似文献   

10.
The World Wide Web (WWW) has been recognized as the ultimate and unique source of information for information retrieval and knowledge discovery communities. Tremendous amount of knowledge are recorded using various types of media, producing enormous amount of web pages in the WWW. Retrieval of required information from the WWW is thus an arduous task. Different schemes for retrieving web pages have been used by the WWW community. One of the most widely used scheme is to traverse predefined web directories to reach a user's goal. These web directories are compiled or classified folders of web pages and are usually organized into hierarchical structures. The classification of web pages into proper directories and the organization of directory hierarchies are generally performed by human experts. In this work, we provide a corpus-based method that applies a kind of text mining techniques on a corpus of web pages to automatically create web directories and organize them into hierarchies. The method is based on the self-organizing map learning algorithm and requires no human intervention during the construction of web directories and hierarchies. The experiments show that our method can produce comprehensible and reasonable web directories and hierarchies.  相似文献   

11.
Data-intensive web-based information systems usually employ database systems to store the contents forming the basis for web page construction. Generating web pages on the fly, especially in peak times, can lead to severe performance problems. Thus, pre-generation of web pages has been suggested to be ready for prime time, allowing to reliably deliver several hundred pre-generated pages per second. Maintaining the consistency of these web pages with respect to changes within the database in an efficient way, however, represents a major challenge. This paper presents a novel approach for “self-maintaining” web pages that is, different to previous approaches, characterized by a simple (and thus, easy to maintain) database-to-web page mapping and very low page re-generation costs. This is achieved by utilizing fragmentation techniques from distributed databases, by allocating parameterized fragment classes to web page classes (rather than individual fragments to single web pages), and using the Extensible Markup Language (XML) as an intermediate layer between the database and the final web pages.  相似文献   

12.
An XML-enabled data extraction toolkit for web sources   总被引:7,自引:0,他引:7  
The amount of useful semi-structured data on the web continues to grow at a stunning pace. Often interesting web data are not in database systems but in HTML pages, XML pages, or text files. Data in these formats are not directly usable by standard SQL-like query processing engines that support sophisticated querying and reporting beyond keyword-based retrieval. Hence, the web users or applications need a smart way of extracting data from these web sources. One of the popular approaches is to write wrappers around the sources, either manually or with software assistance, to bring the web data within the reach of more sophisticated query tools and general mediator-based information integration systems. In this paper, we describe the methodology and the software development of an XML-enabled wrapper construction system—XWRAP for semi-automatic generation of wrapper programs. By XML-enabled we mean that the metadata about information content that are implicit in the original web pages will be extracted and encoded explicitly as XML tags in the wrapped documents. In addition, the query-based content filtering process is performed against the XML documents. The XWRAP wrapper generation framework has three distinct features. First, it explicitly separates tasks of building wrappers that are specific to a web source from the tasks that are repetitive for any source, and uses a component library to provide basic building blocks for wrapper programs. Second, it provides inductive learning algorithms that derive or discover wrapper patterns by reasoning about sample pages or sample specifications. Third and most importantly, we introduce and develop a two-phase code generation framework. The first phase utilizes an interactive interface facility to encode the source-specific metadata knowledge identified by individual wrapper developers as declarative information extraction rules. The second phase combines the information extraction rules generated at the first phase with the XWRAP component library to construct an executable wrapper program for the given web source.  相似文献   

13.
Users of web sites often do not know exactly which information they are looking for nor what the site has to offer. The purpose of their interaction is not only to fulfill but also to articulate their information needs. In these cases users need to pass through a series of pages before they can use the information that will eventually answer their questions. Current systems that support navigation predict which pages are interesting for the users on the basis of commonalities in the contents or the usage of the pages. They do not take into account the order in which the pages must be visited. In this paper we propose a method to automatically divide the pages of a web site on the basis of user logs into sets of pages that correspond to navigation stages. The method searches for an optimal number of stages and assigns each page to a stage. The stages can be used in combination with the pages’ topics to give better recommendations or to structure or adapt the site. The resulting navigation structures guide the users step by step through the site providing pages that do not only match the topic of the user’s search, but also the current stage of the navigation process.  相似文献   

14.
We present Juxtaposed approximate PageRank (JXP), a distributed algorithm for computing PageRank-style authority scores of Web pages on a peer-to-peer (P2P) network. Unlike previous algorithms, JXP allows peers to have overlapping content and requires no a priori knowledge of other peers’ content. Our algorithm combines locally computed authority scores with information obtained from other peers by means of random meetings among the peers in the network. This computation is based on a Markov-chain state-lumping technique, and iteratively approximates global authority scores. The algorithm scales with the number of peers in the network and we show that the JXP scores converge to the true PageRank scores that one would obtain with a centralized algorithm. Finally, we show how to deal with misbehaving peers by extending JXP with a reputation model. Partially supported by the EU within the 6th Framework Programme under contract 001907 “Dynamically Evolving, Large Scale Information Systems” (DELIS).  相似文献   

15.
Organizations considering the adoption of the web services framework for their Information Technology (IT) applications are confronted with a period of technological ferment, as standards for supporting non-trivial business process functionality are not yet in place. Evolving standardization poses challenges in the form of inter-temporal dependencies as organizations’ conformance to the standards that emerge in the future is contingent on their current design choices that need to be made ex-ante without complete information of how standards will evolve. At the same time, there are significant early-mover benefits to be gained by executing an IT strategy using web services as a cornerstone. This paper draws upon coordination theory to develop a conceptual framework outlining three approaches for organizations to deal with changing standardization regimes: (a) The dependencies across components, conforming to different standardization regimes, are continually bridged through intermediary services (e.g., using a protocol adapter that translates to an unanticipated emergent standard), (b) The dependencies across components are minimized through loose coupling so that standardization regime changes for any component have a minimal impact on other components (e.g., encapsulating the functionality susceptible to design change into a module with abstract interfaces), and (c) The impacted components are rapidly reconfigurable as and when standardization regime changes (e.g., by building in “extension” features into applications). The risk for organizations investing in web services can be further managed by mechanisms such as organization’s attention to signals from the periphery, undertaking low-risk experiments to learn in different areas, and bricolage-like improvisations of their legacy components at hand.
Sanjay GosainEmail:
  相似文献   

16.
17.
片段缓存机制是加速动态网页分发的有效解决方案之一,但是实施片段缓存需要有效的共享片段检测机制。针对这种情况,提出了一种高效的共享片段检测算法,介绍了基于片段缓存的动态网页传送模型。该模型能够自动识别共享片段和有效的缓存单元,更好地消除冗余数据,提高缓存命中率。实验和分析表明,与现有方案ESI和Silo相比,该模型能够有效节约带宽,缩短用户请求的响应时间。  相似文献   

18.
19.
Although caching has been shown as an efficient technique to reduce the delay in generating web pages to meet the page requests from web users, it becomes less effective if the pages are dynamic and contain dynamic contents. In this paper, instead of using caching, we study the effectiveness of using pre-fetching to resolve the problems in handling dynamic web pages. Pre-fetching is a proactive caching scheme since a page is cached before the receipt of any page request for the page. In addition to the problem of which pages to be pre-fetched, another equally important question is when to perform the pre-fetching. To resolve the prediction and timing problems, we explore the temporal properties of the dynamic web pages and the timing issues in accessing the pages to determine which pages to be pre-fetched and the best time to pre-fetch the pages to maximize the cache hit probability of the pre-fetched page. If the required pages can be found in the cache validly, the response times of the requests can be greatly reduced. The proposed scheme is called temporal pre-fetching (TPF) in which we prioritize pre-fetching requests based on the predicted usability of the to-be pre-fetched pages. To minimize the impact of incorrect prediction in pre-fetching on processing of on-demand page requests, a qualifying examination is performed to remove unnecessary and low usability pre-fetching requests while they are waiting to be processed and just before their processing. We have implemented the proposed TPF scheme in a web server system and experiments have been performed to study its performance characteristics compared with conventional cache-only scheme using a benchmark auction application under different system and application settings. As shown in the experiment results, the overall system performance, i.e., response time, is improved as more page requests can be served immediately using pre-fetched pages.  相似文献   

20.
Fake content is flourishing on the Internet, ranging from basic random word salads to web scraping. Most of this fake content is generated for the purpose of nourishing fake web sites aimed at biasing search engine indexes: at the scale of a search engine, using automatically generated texts render such sites harder to detect than using copies of existing pages. In this paper, we present three methods aimed at distinguishing natural texts from artificially generated ones: the first method uses basic lexicometric features, the second one uses standard language models and the third one is based on a relative entropy measure which captures short range dependencies between words. Our experiments show that lexicometric features and language models are efficient to detect most generated texts, but fail to detect texts that are generated with high order Markov models. By comparison our relative entropy scoring algorithm, especially when trained on a large corpus, allows us to detect these “hard” text generators with a high degree of accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号