首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
More and more (semi) structured information is becoming available on the web in the form of documents embedding metadata (e.g., RDF, RDFa, Microformats and others). There are already hundreds of millions of such documents accessible and their number is growing rapidly. This calls for large scale systems providing effective means of searching and retrieving this semi-structured information with the ultimate goal of making it exploitable by humans and machines alike.This article examines the shift from the traditional web document model to a web data object (entity) model and studies the challenges faced in implementing a scalable and high performance system for searching semi-structured data objects over a large heterogeneous and decentralised infrastructure. Towards this goal, we define an entity retrieval model, develop novel methodologies for supporting this model and show how to achieve a high-performance entity retrieval system. We introduce an indexing methodology for semi-structured data which offers a good compromise between query expressiveness, query processing and index maintenance compared to other approaches. We address high-performance by optimisation of the index data structure using appropriate compression techniques. Finally, we demonstrate that the resulting system can index billions of data objects and provides keyword-based as well as more advanced search interfaces for retrieving relevant data objects in sub-second time.This work has been part of the Sindice search engine project at the Digital Enterprise Research Institute (DERI), NUI Galway. The Sindice system currently maintains more than 200 million pages downloaded from the web and is being used actively by many researchers within and outside of DERI.  相似文献   

2.
Over the past few years, a large and ever increasing number of Web sites have incorporated one or more social login platforms and have encouraged users to log in with their Facebook, Twitter, Google, or other social networking identities. Research results suggest that more than two million Web sites have already adopted Facebook’s social login platform, and the number is increasing sharply. Although one might theoretically refrain from such social login features and cross-site interactions, usage statistics show that more than 250 million people might not fully realize the privacy implications of opting-in. To make matters worse, certain Web sites do not offer even the minimum of their functionality unless users meet their demands for information and social interaction. At the same time, in a large number of cases, it is unclear why these sites require all that personal information for their purposes. In this paper, we mitigate this problem by designing and developing a framework for minimum information disclosure in social login interactions with third-party sites. Our example case is Facebook, which combines a very popular single sign-on platform with information-rich social networking profiles. Whenever users want to browse to a Web site that requires authentication or social interaction using a Facebook identity, our system employs, by default, a Facebook session that reveals the minimum amount of information necessary. Users have the option to explicitly elevate that Facebook session in a manner that reveals more or all of the information tied to their social identity. This enables users to disclose the minimum possible amount of personal information during their browsing experience on third-party Web sites.  相似文献   

3.
Current object-oriented approaches to distributed programs may be criticized in several respects. First, method calls are generally synchronous, which leads to much waiting in distributed and unstable networks. Second, the common model of thread concurrency makes reasoning about program behavior very challenging. Models based on concurrent objects communicating by asynchronous method calls, have been proposed to combine object orientation and distribution in a more satisfactory way. In this paper, a high-level language and proof system are developed for such a model, emphasizing simplicity and modularity. In particular, the proof system is used to derive external specifications of observable behavior for objects, encapsulating their state. A simple and compositional proof system is paramount to allow verification of real programs. The proposed proof rules are derived from the Hoare rules of a standard sequential language by a semantic encoding preserving soundness and relative completeness. Thus, the paper demonstrates that these models not only address the first criticism above, but also the second.  相似文献   

4.
Recently, the number of online databases and the amount of personal information they store have escalated; the potential uses and misuses of these databases have consequently multiplied, and in Europe, there are now calls for a public rediscussion of their legal status. In this context, this paper uses survey data from a Portuguese sample to investigate some psychosocial processes involved in decision-taking related to the disclosure of personal data to governmental institutions. The study tests (1) what societal-level variables (e.g. Post-Materialistic values and System Justification motives) help predict trust and concern felt towards public institutions; (2) whether these societal-level variables are better predictors of disclosure-related decisions than are socio-demographic aspects and knowledge of the legal framework and (3) the capacity of both societal-level variables and trust and concern for predicting the willingness to disclose personal data and to complain about the misuse of data by governmental institutions. Findings show that trust is a key incentive for the disclosure of personal data to governmental institutions and is linked to a more passive engagement with citizenship. Concern, in turn, being negatively linked to system justification and positively to willingness to complain, seems associated with a more active civic citizenship. Implications of this pattern are discussed.  相似文献   

5.
高温摩擦磨损试验机主要用于测试摩擦副材料在高温工况下的摩擦磨损性能.采用LabVIEW软件,通过调用动态链接库驱动数据采集卡,设计开发了用于高温摩擦磨损试验机的虚拟仪器测试系统.测试系统包括摩擦力、温度、载荷、转速等数据采集,数据存储,数据处理、图形显示及生成实验报告等功能.在高温摩擦磨损试验机上,利用该测试系统测量并记录摩擦系数,观察所采集的数据以及多次重复性试验.实验结果表明该测试系统具有较高的稳定性、准确性及可靠性.  相似文献   

6.
A method for the introspection of virtual machines is proposed. The main distinctive feature of this method is that it makes it possible to obtain information about the system operation using the minimum knowledge about its internal organization. The proposed approach uses rarely changing parts of the application binary interface, such as identifiers and parameters of system calls, calling conventions, and the formats of executable files. The lightweight property of the introspection method is due to the minimization of the knowledge about the system and by its high performance. The introspection infrastructure is based on the QEMU emulator, version 2.8. Presently, monitoring the file operations, processes, and API function calls are implemented. The available introspection tools (RTKDSM, Panda, and DECAF) get data for the analysis using kernel structures. All the data obtained (addresses of structures, etc.) is written to special profiles. Since the addresses and offsets strongly depend not only on the version of the operating system but also on the parameters of its assembly, these tools have to store a large number of profiles. We propose to use parts of the application binary interface because they are rarely modified and it is often possible to use one profile for a family of OSs. The main idea underlying the proposed method is to intercept the system and library function calls and read parameters and returned values. The processor provides special instructions for calling system and user defined functions. The capabilities of QEMU are extended by an instrumentation mechanism to enable one to track each executed instruction and find the instructions of interest among them. When a system call occurs, the control is passed to the system call detector that checks the number of the call and decides to which module the job should be forwarded. In the case of an API function call, the situation is similar, but the API function detector checks the function address. An introspection tool consisting of a set of modules is developed. These modules are dynamic libraries that are plugged in QEMU. The modules can interact by exchanging data.  相似文献   

7.
针对当前方法无法对系统调用参数和返回值等信息进行捕获和分析的问题,在Nitro的基础上建立了一个实时监视客户机内系统调用的系统.该系统通过修改硬件规范和指令重写,实现对快速系统调用进入和退出指令的捕捉和分析.之后,结合VCPU的上下文信息和系统调用的语义模板解析各参数;捕获到系统调用退出指令后,则根据VCPU寄存器信息解析返回值.实验证明,与同类捕获系统调用的方法相比,该系统可以实时捕获客户机内的系统调用序列,解析得到完整的系统调用信息,包括系统调用名、系统调用号、参数和返回值.该系统还能区分不同进程产生的系统调用,并在宿主机中引入了不超过15%的性能开销.  相似文献   

8.
智能手机用户对通信安全的需求日益增加,本文研究目前骚扰电话过滤系统现状,提出一种基于云安全技术的设计方案,对其中的技术做了深入的研究。该系统将过滤服务器集群和大量手机整合到一个云安全体系中,实现对骚扰电话的快速和有效过滤。  相似文献   

9.
While display designers tend to agree that the communication of large amounts of quantitative information calls for the use of graphs, there is less consensus about whether graphs should be used for small, summarized data sets. In the present study, three groups of 16 subjects viewed 11 sets of time series data presented as tables, bar charts, or line graphs. Data sets varied in size (4, 7, or 13 values) and complexity (number and type of departures from linearity). Subjects provided written interpretations of each of the data sets, and these interpretations were scored for (1) overall number of propositions pertaining to the data set as a whole (global content), (2) number of propositions describing relations within a subset of the data (local content), and (3) number of references to specific data values (numeric content). For the larger (7- and 13-point) data sets, interpretations based on bar charts included the greatest overall global content, but line graph interpretations proved to be most sensitive to the actual information content (complexity) of the data sets. The greater sensitivity of the line graphs was still obtained with four-point data sets; however, this advantage was greater for men than for women. For data sets of all sizes, but especially for the smallest sets, gender differences in interpretation content were obtained. These differences are discussed within the context of more general individual differences presumed to exist in graphreading strategies.  相似文献   

10.
The collaborative emergency call-taking information system in the Czech Republic forms a network of cooperating emergency call centres processing emergency calls to the European 112 emergency number. Large amounts of various incident records are stored in its databases. The data can be used for mining spatial and temporal anomalies, as well as for the monitoring and analysis of the performance of the emergency call-taking system. In this paper we describe a method for knowledge discovery and visualisation targeted at the performance analysis of the system with respect to the organisation of the emergency call-taking information system and its data characteristics. The method is based on the Kohonen Self-Organising Map (SOM) algorithm and its extension, the Growing Grid algorithm.  相似文献   

11.
基于扩充数据源的系统调用异常检测算法   总被引:1,自引:0,他引:1  
扩充了传统异常检测算法的数据源,将系统调用参数和系统调用频率信息纳入异常检测算法中。新的算法通过训练统计系统调用的频率信息,建立程序正常运行时的文件访问分布模型,以传统的基于系统调用的异常检测方法为基础,结合训练时得到的信息,确定攻击的优先级。实验结果表明,该方法有效的改善了原方法的检测率和误报率等指标。  相似文献   

12.
A multi-layer model is used to study the effect of management structure on the performance of connection-oriented packet-switched networks managed via fixed threshold call admission policies. Call admission decisions are based on estimates of the number and characteristics of currently held calls, which may be inaccurate due to uncertainties in the measurement process or to the use of untimely information. Let the state estimate be represented as the true state value offset by a noise factor. The standard deviation of the estimation error serves as a key parameter in representing the complexity, coverage, extensiveness, and cost of the implemented network management and information collection procedure. For a single-domain network the effects of Gaussian noise on blocking, throughput, and the probability of excess calls are examined and used to define a measure of performance, the throughput capacity trajectory, which gives maximum throughput levels for fixed packet blocking, packet delay, and probability of excess call constraints. In a multi-domain network a particular network manager/controller may have complete information about its own domain, but limited, aggregated, or untimely information about other domains. Tradeoffs between centralized and distributed decision-making are discussed and a mechanism is provided for comparing various management structures as well as determining good values for admission control thresholds.  相似文献   

13.
函数调用相关信息识别是二进制代码静态分析的基础,也是恶意代码分析的重要线索。二进制代码混淆技术通过对函数调用指令call、参数传递过程和调用返回过程的混淆来隐藏代码中函数的信息。这大大增加了程序逆向分析的难度,此技术被广泛应用在变形和多态病毒中,使其逃脱杀毒软件的查杀。论文给出了一种静态分析方法,引入了抽象栈图的概念,给出了其构造算法,利用它能够有效识别出代码中对函数调用的混淆。  相似文献   

14.

While display designers tend to agree that the communication of large amounts of quantitative information calls for the use of graphs, there is less consensus about whether graphs should be used for small, summarized data sets. In the present study, three groups of 16 subjects viewed 11 sets of time series data presented as tables, bar charts, or line graphs. Data sets varied in size (4, 7, or 13 values) and complexity (number and type of departures from linearity). Subjects provided written interpretations of each of the data sets, and these interpretations were scored for (1) overall number of propositions pertaining to the data set as a whole (global content), (2) number of propositions describing relations within a subset of the data (local content), and (3) number of references to specific data values (numeric content). For the larger (7- and 13-point) data sets, interpretations based on bar charts included the greatest overall global content, but line graph interpretations proved to be most sensitive to the actual information content (complexity) of the data sets. The greater sensitivity of the line graphs was still obtained with four-point data sets; however, this advantage was greater for men than for women. For data sets of all sizes, but especially for the smallest sets, gender differences in interpretation content were obtained. These differences are discussed within the context of more general individual differences presumed to exist in graphreading strategies.  相似文献   

15.
A problem with the location-free nature of cell phones is that callers have difficulty predicting receivers' states, leading to inappropriate calls. One promising solution involves helping callers decide when to interrupt by providing them contextual information about receivers. We tested the effectiveness of different kinds of contextual information by measuring the degree of agreement between receivers' desires and callers' decisions. In a simulation, five groups of participants played the role of 'Callers', choosing between making calls or leaving messages, and a sixth group played the role of 'Receivers', choosing between receiving calls or receiving messages. Callers were provided different contextual information about Receivers' locations, their cell phones' ringer state, the presence of others, or no information at all. Callers provided with contextual information made significantly more accurate decisions than those without it. Our results suggest that different contextual information generates different kinds of improvements: more appropriate interruptions or better avoidance of inappropriate interruptions. We discuss the results and implications for practice in the light of other important considerations, such as privacy and technological simplicity.  相似文献   

16.
话务量是度量用户使用电话设备频繁程度的一个重要参量,由于目前话务分布呈现出显著的立体性、多业务性和非泊松流等特点,不能直接应用欧兰B公式进行计算。为此,从计算智能出发提出一种基于PSO算法的进化神经计算方法,主要包括话务量及其相关参量的获取、神经网络结构的优化、基于PSO算法的网络训练,以及话务量计算等步骤。通过对河北省某市小灵通业务的详细研究,利用近半年来的话务量与无线阻塞率、来话接通率和掉话率等参量构成的样本信息进行建模,所计算的话务量精度高,表明其方法切实可行且效果显著。  相似文献   

17.
Applications of multi-objective genetic algorithms (MOGAs) in engineering optimization problems often require numerous function calls. One way to reduce the number of function calls is to use an approximation in lieu of function calls. An approximation involves two steps: design of experiments (DOE) and metamodeling. This paper presents a new approach where both DOE and metamodeling are integrated with a MOGA. In particular, the DOE method reduces the number of generations in a MOGA, while the metamodeling reduces the number of function calls in each generation. In the present approach, the DOE locates a subset of design points that is estimated to better sample the design space, while the metamodeling assists in estimating the fitness of design points. Several numerical and engineering examples are used to demonstrate the applicability of this new approach. The results from these examples show that the proposed improved approach requires significantly fewer function calls and obtains similar solutions compared to a conventional MOGA and a recently developed metamodeling-assisted MOGA.  相似文献   

18.
In the last decades, an increasing number of employers and job seekers have been relying on Web resources to get in touch and to find a job. If appropriately retrieved and analyzed, the huge number of job vacancies available today on on-line job portals can provide detailed and valuable information about the Web Labor Market dynamics and trends. In particular, this information can be useful to all actors, public and private, who play a role in the European Labor Market. This paper presents WoLMIS, a system aimed at collecting and automatically classifying multilingual Web job vacancies with respect to a standard taxonomy of occupations. The proposed system has been developed for the Cedefop European agency, which supports the development of European Vocational Education and Training (VET) policies and contributes to their implementation. In particular, WoLMIS allows analysts and Labor Market specialists to make sense of Labor Market dynamics and trends of several countries in Europe, by overcoming linguistic boundaries across national borders. A detailed experimental evaluation analysis is also provided for a set of about 2 million job vacancies, collected from a set of UK and Irish Web job sites from June to September 2015.  相似文献   

19.
基于Win32 API的未知病毒检测   总被引:3,自引:1,他引:2  
陈亮  郑宁  郭艳华  徐明  胡永涛 《计算机应用》2008,28(11):2829-2831
提出了一个基于行为特征向量的病毒检测方法。特征向量的每一维用于表示一种恶意行为事件,每一事件由相应的Win32应用程序编程接口(API)调用及其参数表示,并实现了一个自动化行为追踪系统(Argus)用于行为特征的提取。试验中,通过对样本数据的分析,利用互信息对特征向量进行属性约简,减少特征维数。试验结果表明,约简后的模型对于发生行为事件数大于1的病毒程序仍有着较好的检测效果。  相似文献   

20.
Message forwarding (e.g.,retweeting on Twitter.com) is one of the most popular functions in many existing microblogs,and a large number of users participate in the propagation of information,for any given messages.While this large number can generate notable diversity and not all users have the same ability to diffuse the messages,this also makes it challenging to find the true users with higher spreadability,those generally rated as interesting and authoritative to diffuse the messages.In this paper,a novel method called SpreadRank is proposed to measure the spreadability of users in microblogs,considering both the time interval of retweets and the location of users in information cascades.Experiments were conducted on a real dataset from Twitter containing about 0.26 million users and 10 million tweets,and the results showed that our method is consistently better than the PageRank method with the network of retweets and the method of retweetNum which measures the spreadability according to the number of retweets.Moreover,we find that a user with more tweets or followers does not always have stronger spreadability in microblogs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号