首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.

This paper is about people. It is about understanding how learning and communication mutually influence one another, allowing people to infer each other's communicative behavior. In order to understand how people learn to communicate, we refer to existing theories. They are the logical theories of learning and communication, situated cognition, and activity theory. Thus, this paper is about applying existing theories of analyzing conversations, human learning, and memory to a range of scenarios of actual human conversations. It also introduces a new way of analyzing conversations. We have recorded and observed actual human communications on the Web. We have applied those theories to analyze these communication scenarios. We describe the preliminary results on the analyses of the communication scenarios. In particular, we show our analysis of the recorded conversational structures. We illustrate how the re-enacting and re-sequencing of conversational structures is adapted to the context (i.e., environment) moment by moment. From our analyses, we found that people have internal rules (e.g., a combinatorial rule system). These internal rules can be related to how a person learns, adapts, and merges protocols situated in their context of communication. Our long term goal is to make use of these analyses to improve human communication on the Grid.  相似文献   

2.
How can we discover interesting patterns from time-evolving high-speed data streams? How to analyze the data streams quickly and accurately, with little space overhead? How to guarantee the found patterns to be self-consistent? High-speed data stream has been receiving increasing attention due to its wide applications such as sensors, network traffic, social networks, etc. The most fundamental task on the data stream is frequent pattern mining; especially, focusing on recentness is important in real applications. In this paper, we develop two algorithms for discovering recently frequent patterns in data streams. First, we propose TwMinSwap to find top-k recently frequent items in data streams, which is a deterministic version of our motivating algorithm TwSample providing theoretical guarantees based on item sampling. TwMinSwap improves TwSample in terms of speed, accuracy, and memory usage. Both require only O(k) memory spaces and do not require any prior knowledge on the stream such as its length and the number of distinct items in the stream. Second, we propose TwMinSwap-Is to find top-k recently frequent itemsets in data streams. We especially focus on keeping self-consistency of the discovered itemsets, which is the most important property for reliable results, while using O(k) memory space with the assumption of a constant itemset size. Through extensive experiments, we demonstrate that TwMinSwap outperforms all competitors in terms of accuracy and memory usage, with fast running time. We also show that TwMinSwap-Is is more accurate than the competitor and discovers recently frequent itemsets with reasonably large sizes (at most 5–7) depending on datasets. Thanks to TwMinSwap and TwMinSwap-Is, we report interesting discoveries in real world data streams, including the difference of trends between the winner and the loser of U.S. presidential candidates, and temporal human contact patterns.  相似文献   

3.
In this paper, we present Para Miner which is a generic and parallel algorithm for closed pattern mining. Para Miner is built on the principles of pattern enumeration in strongly accessible set systems. Its efficiency is due to a novel dataset reduction technique (that we call EL-reduction), combined with novel technique for performing dataset reduction in a parallel execution on a multi-core architecture. We illustrate Para Miner’s genericity by using this algorithm to solve three different pattern mining problems: the frequent itemset mining problem, the mining frequent connected relational graphs problem and the mining gradual itemsets problem. In this paper, we prove the soundness and the completeness of Para Miner. Furthermore, our experiments show that despite being a generic algorithm, Para Miner can compete with specialized state of the art algorithms designed for the pattern mining problems mentioned above. Besides, for the particular problem of gradual itemset mining, Para Miner outperforms the state of the art algorithm by two orders of magnitude.  相似文献   

4.
Given a large collection of co-evolving online activities, such as searches for the keywords “Xbox”, “PlayStation” and “Wii”, how can we find patterns and rules? Are these keywords related? If so, are they competing against each other? Can we forecast the volume of user activity for the coming month? We conjecture that online activities compete for user attention in the same way that species in an ecosystem compete for food. We present EcoWeb, (i.e., Ecosystem on the Web), which is an intuitive model designed as a non-linear dynamical system for mining large-scale co-evolving online activities. Our second contribution is a novel, parameter-free, and scalable fitting algorithm, EcoWeb-Fit, that estimates the parameters of EcoWeb. Extensive experiments on real data show that EcoWeb is effective, in that it can capture long-range dynamics and meaningful patterns such as seasonalities, and practical, in that it can provide accurate long-range forecasts. EcoWeb consistently outperforms existing methods in terms of both accuracy and execution speed.  相似文献   

5.
ReFlO is a framework and interactive tool to record and systematize domain knowledge used by experts to derive complex pipe-and-filter (PnF) applications. Domain knowledge is encoded as transformations that alter PnF graphs by refinement (adding more details), flattening (removing modular boundaries), and optimization (substituting inefficient PnF graphs with more efficient ones). All three kinds of transformations arise in reverse-engineering legacy PnF applications. We present the conceptual foundation and tool capabilities of ReFlO, illustrate how parallel PnF applications are designed and generated, and how domain-specific libraries of transformations are developed.  相似文献   

6.
PeopleViews is a Human Computation based environment for the construction of constraint-based recommenders. Constraint-based recommender systems support the handling of complex items where constraints (e.g., between user requirements and item properties) can be taken into account. When applying such systems, users are articulating their requirements and the recommender identifies solutions on the basis of the constraints in a recommendation knowledge base. In this paper, we provide an overview of the PeopleViews environment and show how recommendation knowledge can be collected from users of the environment on the basis of micro-tasks. We also show how PeopleViews exploits this knowledge for automatically generating recommendation knowledge bases. In this context, we compare the prediction quality of the recommendation approaches integrated in PeopleViews using a DSLR camera dataset.  相似文献   

7.
In this paper we study the construction and the enumeration of binary words in $\{0,1\}^*$ having more 1’s than 0’s and avoiding a set of cross-bifix-free patterns. We give a particular succession rule, called jumping and marked succession rule, which describes the growth of such words according to their number of ones. Moreover, the problem of associating a word to a path in the generating tree obtained by the succession rule is solved by introducing an algorithm which constructs all binary words having more 1’s than 0’s and then kills those containing the forbidden patterns. Finally, we give the generating function of such words according to the number of ones.  相似文献   

8.
Precision-oriented search results such as those typically returned by the major search engines are vulnerable to issues of polysemy. When the same term refers to different things, the dominant sense is preferred in the rankings of search results. In this paper, we propose a novel two-box technique in the context of Web search that utilizes contextual terms provided by users for query disambiguation, making it possible to prefer other senses without altering the original query. A prototype system, Bobo, has been implemented. In Bobo, contextual terms are used to capture domain knowledge from users, help estimate relevance of search results, and route them towards a user-intended domain. A vast advantage of Bobo is that a wide range of domain knowledge can be effectively utilized, where helpful contextual terms do not even need to co-occur with query terms on any page. We have extensively evaluated the performance of Bobo on benchmark datasets that demonstrates the utility and effectiveness of our approach.  相似文献   

9.
In this paper, we present Esteem (Emergent Semantics and cooperaTion in multi-knowledgE EnvironMents), a community-based P2P platform for supporting semantic collaboration among a set of independent peers, without prior reciprocal knowledge and no predefined relationships. Goal of Esteem is to go beyond the existing state-of-the-art solutions for P2P knowledge sharing and to provide an integrated platform for both data and service discovery. A distinguishing feature of Esteem is the use of semantic communities to explicitly give shape to the collective knowledge and expertise of peer groups with similar interests. Key techniques of Esteem will be presented in the paper and concern: shuffling-based communication, ontology and service matchmaking, context management, and quality-aware data integration. An application example of data and service discovery in the health-care domain will be presented, by also discussing results of system and user evaluation.  相似文献   

10.
Forecasting air-pollutant levels is an important issue, due to their adverse effects on public health, and often a legislative necessity. The advantage of Bayesian methods is their ability to provide density predictions which can easily be transformed into ordinal or binary predictions given a set of thresholds. We develop a Bayesian approach to forecasting PM\(_{10}\) and O\(_3\) levels that efficiently deals with extensive amounts of input parameters, and test whether it outperforms classical models and experts. The new approach is used to fit models for PM\(_{10}\) and O\(_3\) level forecasting that can be used in daily practice. We also introduce a novel approach for comparing models to experts based on estimated cost matrices. The results for diverse air quality monitoring sites across Slovenia show that Bayesian models outperform classical models in both PM\(_{10}\) and O\(_3\) predictions. The proposed models perform better than experts in PM\(_{10}\) and are on par with experts in O\(_3\) predictions—where experts already base their predictions on predictions from a statistical model. A Bayesian approach—especially using Gaussian processes—offers several advantages: superior performance, robustness to overfitting, more information, and the ability to efficiently adapt to different cost matrices.  相似文献   

11.
12.
The NetMine framework allows the characterization of traffic data by means of data mining techniques. NetMine performs generalized association rule extraction to profile communications, detect anomalies, and identify recurrent patterns. Association rule extraction is a widely used exploratory technique to discover hidden correlations among data. However, it is usually driven by frequency constraints on the extracted correlations. Hence, it entails (i) generating a huge number of rules which are difficult to analyze, or (ii) pruning rare itemsets even if their hidden knowledge might be relevant. To overcome these issues NetMine exploits a novel algorithm to efficiently extract generalized association rules, which provide a high level abstraction of the network traffic and allows the discovery of unexpected and more interesting traffic rules. The proposed technique exploits (user provided) taxonomies to drive the pruning phase of the extraction process. Extracted correlations are automatically aggregated in more general association rules according to a frequency threshold. Eventually, extracted rules are classified into groups according to their semantic meaning, thus allowing a domain expert to focus on the most relevant patterns. Experiments performed on different network dumps showed the efficiency and effectiveness of the NetMine framework to characterize traffic data.  相似文献   

13.
We present a technique for numerically solving convection-diffusion problems in domains $\varOmega $ with curved boundary. The technique consists in approximating the domain $\varOmega $ by polyhedral subdomains $\mathsf{{D}}_h$ where a finite element method is used to solve for the approximate solution. The approximation is then suitably extended to the remaining part of the domain $\varOmega $ . This approach allows for the use of only polyhedral elements; there is no need of fitting the boundary in order to obtain an accurate approximation of the solution. To achieve this, the boundary condition on the border of $\varOmega $ is transferred to the border of $\mathsf{D }_h$ by using simple line integrals. We apply this technique to the hybridizable discontinuous Galerkin method and provide extensive numerical experiments showing that, whenever the distance of $\mathsf{{D}}_h$ to $\partial \varOmega $ is of order of the meshsize $h$ , the convergence properties of the resulting method are the same as those for the case in which $\varOmega =\mathsf{{D}}_h$ . We also show numerical evidence indicating that the ratio of the $L^2(\varOmega )$ norm of the error in the scalar variable computed with $d>0$ to that of that computed with $d=0$ remains constant (and fairly close to one), whenever the distance $d$ is proportional to $\min \{h,Pe^{-1}\}/(k+1)^2$ , where $Pe$ is the so-called Péclet number.  相似文献   

14.
Mining interesting user behavior patterns in mobile commerce environments   总被引:6,自引:6,他引:0  
Discovering user behavior patterns from mobile commerce environments is an essential topic with wide applications, such as planning physical shopping sites, maintaining e-commerce on mobile devices and managing online shopping websites. Mobile sequential pattern mining is an emerging issue in this topic, which considers users’ moving paths and purchased items in mobile commerce environments to find the complete set of mobile sequential patterns. However, an important factor, namely users’ interests, has not been considered yet in past studies. In practical applications, users may only be interested in the patterns with some user-specified constraints. The traditional methods without considering the constraints pose two crucial problems: (1) Users may need to filter out uninteresting patterns within huge amount of patterns, (2) Finding the complete set of patterns containing the uninteresting ones needs high computational cost and runtime. In this paper, we address the problem of mining mobile sequential patterns with two kinds of constraints, namely importance constraints and pattern constraints. Here, we consider the importance of an item as its utility (i.e., profit) in the mobile commerce environment. An efficient algorithm, IM-Span (I nteresting M obile S equential Pa tter n mining), is proposed for dealing with the two kinds of constraints. Several effective strategies are employed to reduce the search space and computational cost in different aspects. Experimental results show that the proposed algorithms outperform state-of-the-art algorithms significantly under various conditions.  相似文献   

15.
The problem of mining collections of trees to identify common patterns, called frequent subtrees (FSTs), arises often when trying to interpret the results of phylogenetic analysis. FST mining generalizes the well-known maximum agreement subtree problem. Here we present EvoMiner, a new algorithm for mining frequent subtrees in collections of phylogenetic trees. EvoMiner is an Apriori-like levelwise method, which uses a novel phylogeny-specific constant-time candidate generation scheme, an efficient fingerprinting-based technique for downward closure, and a lowest-common-ancestor-based support counting step that requires neither costly subtree operations nor database traversal. Our algorithm achieves speedups of up to 100 times or more over Phylominer, the current state-of-the-art algorithm for mining phylogenetic trees. EvoMiner can also work in depth-first enumeration mode to use less memory at the expense of speed. We demonstrate the utility of FST mining as a way to extract meaningful phylogenetic information from collections of trees when compared to maximum agreement subtrees and majority-rule trees—two commonly used approaches in phylogenetic analysis for extracting consensus information from a collection of trees over a common leaf set.  相似文献   

16.
In this paper, we identify the problems of current semantic and hybrid search systems, which seek to bridge structure and unstructured data, and propose solutions. We introduce a novel input mechanism for hybrid semantic search that combines the clean and concise input mechanisms of keyword-based search engines with the expressiveness of the input mechanisms provided by semantic search engines. This interactive input mechanism can be used to formulate ontology-aware search queries without prior knowledge of the ontology. Furthermore, we propose a system architecture for automatically fetching relevant unstructured data, complementing structured data stored in a Knowledge Base, to create a combined index. This combined index can be used to conduct hybrid semantic searches which leverage information from structured and unstructured sources. We present the reference implementation Hybrid Semantic Search System ( \(HS^3\) ), which uses the combined index to put hybrid semantic search into practice and implements the interactive ontology-enhanced keyword-based input mechanism. For demonstration purpose, we apply \(HS^3\) to the tourism domain. We present performance test results and the results of a user evaluation. Finally, we provide instructions on how to apply \(HS^3\) to arbitrary domains.  相似文献   

17.
18.
Sequential pattern mining (SPM) is an important data mining problem with broad applications. SPM is a hard problem due to the huge number of intermediate subsequences to be considered. State of the art approaches for SPM (e.g., PrefixSpan Pei et al. 2001) are largely based on the pattern-growth approach, where for each frequent prefix subsequence, only its related suffix subsequences need to be considered, and the database is recursively projected into smaller ones. Many authors have promoted the use of constraints to focus on the most promising patterns according to the interests of the end user. The top-k SPM problem is also used to cope with the difficulty of thresholding and to control the number of solutions. State of the art methods developed for SPM and top-k SPM, though efficient, are locked into a rather rigid search strategy, and suffer from the lack of declarativity and flexibility. Indeed, adding new constraints usually amounts to changing the data-structures used in the core of the algorithm, and combining these new constraints often require new developments. Recent works (e.g. Kemmar et al. 2014; Négrevergne and Guns 2015) have investigated the use of Constraint Programming (CP) for SPM. However, despite their nice declarative aspects, all these modelings have scaling problems, due to the huge size of their constraint networks. To address this issue, we propose the Prefix-Projection global constraint, which encapsulates both the subsequence relation as well as the frequency constraint. Its filtering algorithm relies on the principle of projected databases which allows to keep in the variables domain, only values leading to a frequent pattern in the database. Prefix-Projection filtering algorithm enforces domain consistency on the variable succeeding the current frequent prefix in polynomial time. This global constraint also allows for a straightforward implementation of additional constraints such as size, item membership, regular expressions and any combination of them. Experimental results show that our approach clearly outperforms existing CP approaches and competes well with the state-of-the-art methods on large datasets for mining frequent sequential patterns, sequential patterns under various constraints, and top-k sequential patterns. Unlike existing CP methods, our approach achieves a better scalability.  相似文献   

19.
Source code terms such as method names and variable types are often different from conceptual words mentioned in a search query. This vocabulary mismatch problem can make code search inefficient. In this paper, we present COde voCABUlary (CoCaBu), an approach to resolving the vocabulary mismatch problem when dealing with free-form code search queries. Our approach leverages common developer questions and the associated expert answers to augment user queries with the relevant, but missing, structural code entities in order to improve the performance of matching relevant code examples within large code repositories. To instantiate this approach, we build GitSearch, a code search engine, on top of GitHub and Stack Overflow Q&A data. We evaluate GitSearch in several dimensions to demonstrate that (1) its code search results are correct with respect to user-accepted answers; (2) the results are qualitatively better than those of existing Internet-scale code search engines; (3) our engine is competitive against web search engines, such as Google, in helping users solve programming tasks; and (4) GitSearch provides code examples that are acceptable or interesting to the community as answers for Stack Overflow questions.  相似文献   

20.
The NYU Tvoc project applies the method of translation validation to verify that optimized code is semantically equivalent to the unoptimized code, by establishing, for each run of the optimizing compiler, a set of verification conditions (VCs) whose validity implies the correctness of the optimized run. The core of Tvoc is Tvoc-sp, that handles structure preserving optimizations, i.e., optimizations that do not alter the inner loop structures. The underlying proof rule, Val, on whose soundness Tvoc-sp is based, requires, among other things, to generating invariants at each “cutpoint” of the control graph of both source and target codes. The current implementation of Tvoc-sp employs somewhat naïve fix-point computations to obtain the invariants. In this paper, we propose an alternative method to compute invartiants which is based on simple data-flow analysis techniques.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号