首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 859 毫秒
1.
We present a directed Markov random field (MRF) model that combines n‐gram models, probabilistic context‐free grammars (PCFGs), and probabilistic latent semantic analysis (PLSA) for the purpose of statistical language modeling. Even though the composite directed MRF model potentially has an exponential number of loops and becomes a context‐sensitive grammar, we are nevertheless able to estimate its parameters in cubic time using an efficient modified Expectation‐Maximization (EM) method, the generalized inside–outside algorithm, which extends the inside–outside algorithm to incorporate the effects of the n‐gram and PLSA language models. We generalize various smoothing techniques to alleviate the sparseness of n‐gram counts in cases where there are hidden variables. We also derive an analogous algorithm to find the most likely parse of a sentence and to calculate the probability of initial subsequence of a sentence, all generated by the composite language model. Our experimental results on the Wall Street Journal corpus show that we obtain significant reductions in perplexity compared to the state‐of‐the‐art baseline trigram model with Good–Turing and Kneser–Ney smoothing techniques.  相似文献   

2.
The semantic web vision is one in which rich, ontology-based semantic markup will become widely available. The availability of semantic markup on the web opens the way to novel, sophisticated forms of question answering. AquaLog is a portable question-answering system which takes queries expressed in natural language and an ontology as input, and returns answers drawn from one or more knowledge bases (KBs). We say that AquaLog is portable because the configuration time required to customize the system for a particular ontology is negligible. AquaLog presents an elegant solution in which different strategies are combined together in a novel way. It makes use of the GATE NLP platform, string metric algorithms, WordNet and a novel ontology-based relation similarity service to make sense of user queries with respect to the target KB. Moreover it also includes a learning component, which ensures that the performance of the system improves over the time, in response to the particular community jargon used by end users.  相似文献   

3.
After having recalled some well-known shortcomings linked with the Semantic Web approach to the creation of (application oriented) systems of “rules” – e.g., limited expressiveness, adoption of an Open World Assumption (OWA) paradigm, absence of variables in the original definition of OWL – this paper examines the technical solutions successfully used for implementing advanced reasoning systems according to the NKRL’s methodology. NKRL (Narrative Knowledge Representation Language) is a conceptual meta-model and a Computer Science environment expressly created to deal, in an ‘intelligent’ and complete way, with complex and content-rich non-fictional ‘narrative’ data sources. These last include corporate memory documents, news stories, normative and legal texts, medical records, surveillance videos, actuality photos for newspapers and magazines, etc. In this context, we will expound first the need for distinguishing between “plain/static” and “structured/dynamic” knowledge and for introducing appropriate (and different) knowledge representation structures for these two types of knowledge. In a structured/dynamic context, we will then show how the introduction of “functional roles” – associated with the possibility of making use of n-ary structures – allows us to build up highly ‘expressive’ rules whose “atoms” can directly represent complex situations, actions, etc. without being restricted to the use of binary clauses. In an NKRL context, “functional roles” are primitive symbols interpreted as “relations” – like “subject”, “object”, “source”, “beneficiary”, etc. – that link a semantic predicate with its arguments within an n-ary conceptual formula. Functional roles contrast then with the “semantic roles” that are equated to ordinary concepts like “student”, to be inserted into the “non-sortal” (no direct instances) branch of a traditional ontology.  相似文献   

4.
With social media at the forefront of today's media context, citizens may perceive they don't need to actively seek news because they will be exposed to news and remain well‐informed through their peers and social networks. We label this the “news‐finds‐me perception,” and test its implications for news seeking and political knowledge: “news‐finds‐me effects.” U.S. panel‐survey data show that individuals who perceive news will find them are less likely to use traditional news sources and are less knowledgeable about politics over time. Although the news‐finds‐me perception is positively associated with news exposure on social media, this behavior doesn't facilitate political learning. These results suggest news continues to enhance political knowledge best when actively sought.  相似文献   

5.
Recent years have witnessed the development of large knowledge bases (KBs). Due to the lack of information about the content and schema semantics of KBs, users are often not able to correctly formulate KB queries that return the intended result. In this paper, we consider the problem of failing RDF queries, i.e., queries that return an empty set of answers. Query relaxation is one cooperative technique proposed to solve this problem. In the context of RDF data, several works proposed query relaxation operators and ranking models for relaxed queries. But none of them tried to find the causes of an RDF query failure given by Minimal Failing Subqueries (MFSs) as well as successful queries that have a maximal number of triple patterns named Ma \(\underline{x}\) imal Succeeding Subqueries (XSSs). Inspired by previous work in the context of relational databases and recommender systems, we propose two complementary approaches to fill this gap. The lattice-based approach (LBA) leverages the theoretical properties of MFSs and XSSs to efficiently explore the subquery lattice of the failing query. The matrix-based approach computes a matrix that records alternative answers to the failing query with the triple patterns they satisfy. The skyline of this matrix directly gives the XSSs of the failing query. This matrix can also be used as an index to improve the performance of LBA. The practical interest of these two approaches are shown via a set of experiments conducted on the LUBM benchmark and a comparative study with baseline and related work algorithms.  相似文献   

6.
We propose multicontext systems (MC systems) as a formal framework for the specification of complex reasoning. MC systems provide the ability to structure the specification of “global” reasoning in terms of “local” reasoning subpatterns. Each subpattern is modeled as a deduction in a context, formally defined as an axiomatic formal system. the global reasoning pattern is modeled as a concatenation of contextual deductions via bridge rules, i.e., inference rules that infer a fact in one context from facts asserted in other contexts. Besides the formal framework, in this article we propose a three-layer architecture designed to specify and automatize complex reasoning. At the first level we have object-level contexts (called s-contexts) for domain specifications. Problem-solving principles and, more in general, meta-level knowledge about the application domain is specified in a distinct context, called Problem-Solving Context (PSC). On top of s-contexts and PSC, we have a further context, called MT, where it is possible to specify strategies to control multicontext reasoning spanning through s-contexts and PSC. We show how GETFOL can be used as a computer tool for the implementation of MC systems and for the automatization of multicontext deductions. © 1995 John Wiley & Sons, Inc.  相似文献   

7.
This paper gives a precise mathematical analysis of the behaviour of “hiring above the median” strategies for a problem in the context of “on-line selection under uncertainty” that is known (at least in computer science related literature) as the “hiring problem”. Here a sequence of candidates is interviewed sequentially and based on the “score” of the current candidate an immediate decision whether to hire him or not has to be made. For “hiring above the median” selection rules, a new candidate will be hired if he has a score better than the median score of the already recruited candidates. Under the natural probabilistic model assuming that the ranks of the first n candidates are forming a random permutation, we show exact and asymptotic results for various quantities of interest to describe the dynamics of the hiring process and the quality of the hired staff. In particular we can characterize the limiting distribution of the number of hired candidates of a sequence of n candidates, which reveals the somewhat surprising effect, that slight changes in the selection rule, i.e., assuming the “lower” or the “upper” median as the threshold, have a strong influence on the asymptotic behaviour. Thus we could extend considerably previous analyses (Krieger et al., Ann. Appl. Probab., 17:360–385, 2007; Broder et al., Proceedings of the 19th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1184–1193, ACM/SIAM, New York/Philadelphia, 2008 and Archibald and Martinez, Proceedings of the 21st International Conference on Formal Power Series and Algebraic Combinatorics (FPSAC 2009), Discrete Mathematics and Theoretical Computer Science, pp. 63–76, 2009) of such selection rules. Furthermore, we discuss connections between the hiring process and the Chinese restaurant process introduced by Pitman (Combinatorial Stochastic Processes, Springer, Berlin, 2006).  相似文献   

8.
Fuzzy control is a methodology that translates “if”-“then” rules, Aji (x1) &…& Ajn(xn) → Bj(u), formulated in terms of a natural language, into an actual control strategy u(x). Implication of uncertain statements is much more difficult to understand than “and,” “or,” and “not.” So, the fuzzy control methodologies usually start with translating “if”-“then” rules into statements that contain only “and,” “not,” and “or.” the first such translation was proposed by Mamdani in his pioneer article on fuzzy control. According to this article, a fuzzy control is reasonable iff one of the rules is applicable, i.e., either the first rule is applicable (A11(x1) &…& A1n(xn) & B1(u)), or the second one is applicable, etc. This approach turned out to be very successful, and it is still used in the majority of fuzzy control applications. However, as R. Yager noticed, in some cases, this approach is not ideal: Namely, if for some x, we know what u(x) should be, and add this crisp rule to our rules, then the resulting fuzzy control for this x may be different from the desired value u(x). to overcome this drawback, Yager proposed to assign priorities to the rules, so that crisp rules get the highest priority, and use these priorities while translating the rules into a control strategy u(x). In this article, we show that a natural modification of Mamdani's approach can solve this problem without adding any ad hoc priorities. © 1995 John Wiley & Sons, Inc.  相似文献   

9.
Text is composed of words and phrases. In the bag‐of‐words model, phrases in text are split into words. This may discard the semantics of phrases, which, in turn, may give an inconsistent relatedness score between 2 texts. Our objective is to apply phrase relatedness in conjunction with word relatedness on the text relatedness task to improve text relatedness performance. We adopt 2 existing word relatedness measures based on Google n‐gram and Global Vectors for Word Representation, respectively, and incorporate them differently with an existing Google n‐gram–based phrase relatedness method to compute text relatedness. The combination of Google n‐gram–based word and phrase relatedness performs better than Google n‐gram–based word relatedness alone, by achieving the higher weighted mean of Pearson's r, ie, 0.639 and 0.619, respectively, on the 14 data sets from the series of Semantic Evaluation workshops SemEval‐2012, SemEval‐2013, and SemEval‐2015. Similarly, the combination of GloVe‐based word relatedness and Google n‐gram–based phrase relatedness performs better than GloVe‐based word relatedness alone, by achieving the higher weighted mean of Pearson's r, ie, 0.619 and 0.605, respectively, on the same 14 data sets. On the SemEval‐2012, SemEval‐2013, and SemEval‐2015 data sets, the text relatedness results obtained from the combination of Google n‐gram–based word and phrase relatedness ranked 24, 3, and 31 out of 89, 90, and 73 text relatedness systems, respectively.  相似文献   

10.
Strategies for Expressing Concise, Helpful Answers   总被引:2,自引:0,他引:2  
An intelligent help system needs to take into account the user's knowledgewhen formulating answers. This allows the system to provide more conciseanswers, because it can avoid telling users things that they already know.Since these concise answers concentrate exclusively on pertinent newinformation, they are also easier to understand. Information about theuser's knowledge also allows the system to take advantage of the user'sprior knowledge in formulating explanations. The system can provide betteranswers by referring to the user's prior knowledge in the explanation(e.g., through use of similes). This process of refining answers is calledanswer expression.The process of answer expression has been implemented in the UCExpresscomponent of UC (UNIX Consultant), a natural language system that helps theuser solve problems in using the UNIX operating system. UCExpress separatesanswer expression into two phases: pruning and formatting.In the pruning phase, subconcepts of the answer are pruned by being markedas already known by the user (and hence do not need to be generated), ormarked as candidates for generating anaphora or ellipsis (since they arepart of the conversational context). In the formatting phase, UCExpressuses information about the user's prior domain knowledge to select amongspecialized expository formats,such as similes and examples, for expressing information to the user. Theseformats allow UCExpress to present different types of information to theuser in a clear, concise manner. The result of UCExpress' answer expressionprocess is an internal form that a tactical level generator can easily useto produce good English.  相似文献   

11.
Abstract

It is commonly believed that "intelligent" manipulation of natural language (NL) requires the translation of texts into some internal form (e.g., deep structure). However, many disadvantages also arise from the use of internal forms. In order to avoid most of them, reasoning directly on texts should be considered. But this alternative has its own drawbacks and is not generally taken seriously since it entails a dramatic increase in the already “explosive” nature of the process. We discuss here a pattern-matching technique and a strategy, called caricature, that are shown to counteract the effect of this explosion, and we give some results and prospects.  相似文献   

12.
Adjectives are common in natural language, and their usage and semantics have been studied broadly. In recent years, with the rapid growth of knowledge bases (KBs), many knowledge-based question answering (KBQA) systems are developed to answer users’ natural language questions over KBs. A fundamental task of such systems is to transform natural language questions into structural queries, e.g., SPARQL queries. Thus, such systems require knowledge about how natural language expressions are represented in KBs, including adjectives. In this paper, we specifically address the problem of representing adjectives over KBs. We propose a novel approach, called Adj2SP, to represent adjectives as SPARQL query patterns. Adj2SP contains a statistic-based approach and a neural network-based approach, both of them can effectively reduce the search space for adjective representations and overcome the lexical gap between input adjectives and their target representations. Two adjective representation datasets are built for evaluation, with adjectives used in QALD and Yahoo! Answers, as well as their representations over DBpedia. Experimental results show that Adj2SP can generate representations of high quality and significantly outperform several alternative approaches in F1-score. Furthermore, we publish Lark, a lexicon for adjective representations over KBs. Current KBQA systems show an improvement of over 24% in F1-score by integrating Adj2SP.  相似文献   

13.
This article explores readings of (micro)blogging services as outlets for playful, “imperfect” language. Adopting a transcultural approach, it examines a blog category that has attracted scarce academic attention to date: the creative worker's blog. Through a qualitative analysis of metalinguistic statements by 14 Russian writer‐bloggers, the author tests 2 interdependent hypotheses: (H1) through metalinguistic statements and pragmatic strategies, writers present language play and “imperfect” language as prototypical for new media; and (H2) If H1 is correct, the writer‐blogger's preference for “imperfect” language caters into a broader cultural‐philosophical anxiety – one of foregrounding imperfection as an aesthetic counterresponse to digital perfection.  相似文献   

14.
The aim of the present study (n = 113) was to examine how (objective and subjective) information on peers' preparation, confidence, and past performance can support students in answering correctly in audience response systems (aka clickers). The result analysis shows that in the “challenging” questions, in which answers diverged, students who received additional information about peers' self‐reported preparation and/or confidence outperformed students who were only given the objective percentage with or without past performance feedback. In addition, students expressed a positive attitude towards the activity, commenting its usefulness in better understanding course material and identifying misconceptions.  相似文献   

15.
Abstract

This article examines global and indigenous knowledge sharing with a focus on electronic information exchange in Nepal's development sector. Drawing on lessons from experience based on two local examples, a framework is presented of a strategy for realising the potential of Information and Communications Technology (ICT) in countries where knowledge sharing and access is constrained in a variety of ways.

The “iCAPACITY framework” outlined for the South Asian context integrates the inter‐dependent themes of Content, Access, and Partnership, highlighting the critical components that require consideration when building the capacity for ICT usage and knowledge sharing in a developing country context. Practical initial steps are put forward, that recognise the primary concern for holistically addressing economic, social and environmental issues, with the overall priority of alleviating poverty using broad‐based participation.

The paper concludes that developing countries such as Nepal, currently occupy what may be metaphorically referred to as “the thin air of cyberspace”, where the essential knowledge needing to be shared locally or globally is not yet widely available or accessible. In this context, particular care has to be taken in formulating localised strategies and models that can improve the quality of this “air”, and lead to a situation where development efforts can truly be enhanced by the IT revolution.  相似文献   

16.
The 2003 heat wave killed nearly 15,000 people in France. It was a stealth killer. “We did not notice anything”, as the Minister of Health declared to the Parliamentary Commission. It is of crucial importance to understand the keys to this collective failure, which has much in common with the Chicago experience in 1995 –the lessons of which had not been grasped nor learned. A four‐layered challenge explains the fiasco. The emergency challenge, which is not the realm of bureaucracies outside the “9/11” bodies. The crisis management challenge, largely documented since the 80s and the 90s, but still poorly known by most organisations, in France and elsewhere. The unconventional crisis challenge, emerging more and more today with “outside‐of‐the‐box” scenarios – and for which very few are ready to prepare, in any country in the world. The “texture” challenge, when the whole fabric of our complex systems (rather than just some specific segment) is suddenly deeply affected — an entirely new front‐line in the crisis world, which urges to switch from a mechanical or an architectural to a more “biological” approach to read, seize, and handle emerging csrises. The 2003 heat fiasco compels us to prepare for far more than climate‐related crises. It calls for a fresh and bold look at our crisis paradigms. As General Foch said: “Gunfire kills, but so do outdated visions”.  相似文献   

17.
The relationship between programs and the set of partial correctness assertions that they satisfy, constitutes a Galois connection. The topology resulting from this Galois connection is closely related to the Lindenbaum baum topology for the language in which these partial correctness assertions are stated. This relationship provides us with a tool for understanding the incompleteness of Hoare Logics and for answering certain natural questions about the connection between the relational semantics and the partial correctness assertion semantics for programs, especially in connection with the question of modularity of programs. Two questions to which we shall find topological answers in this paper are “When is a language expressive for a program?”, and “When can we have rules of inference which are adequate to infer the properties of the complex program ±#β from those of its components ±,β?” We also obtain a natural answer to the question “What can the set{(A, B)|{A}±{B} is true) look like for arbitraryα?”.  相似文献   

18.
In this paper, we present a mining algorithm to improve the efficiency of finding large itemsets. Based on the concept of prediction proposed in the (n, p) algorithm, our method considers the data dependency in the given transactions to predict promising and non-promising candidate itemsets. Our method estimates for each level a different support threshold that is derived from a data dependency parameter and determines whether an item should be included in a promising candidate itemset directly. In this way, we maintain the efficiency of finding large itemsets by reducing the number of scanning the input dataset and the number candidate items. Experimental results show our method has a better efficiency than the apriori and the (n, p) algorithms when the minimum support value is small.  相似文献   

19.
The problem of managing and querying inconsistent databases has been deeply investigated in the last few years. As the problem of consistent query answering is hard in the general case, most of the techniques proposed so far have an exponential complexity. Polynomial techniques have been proposed only for restricted forms of constraints (such as functional dependencies) and queries. In this paper, a technique for computing “approximate” consistent answers in polynomial time is proposed, which works in the presence of a wide class of constraints (namely, full constraints) and Datalog queries. The proposed approach is based on a repairing strategy where update operations assigning an undefined truth value to the “reliability” of tuples are allowed, along with updates inserting or deleting tuples. The result of a repair can be viewed as a three-valued database which satisfies the specified constraints. In this regard, a new semantics (namely, partial semantics) is introduced for constraint satisfaction in the context of three-valued databases, which aims at capturing the intuitive meaning of constraints under three-valued logic. It is shown that, in order to compute “approximate” consistent query answers, it suffices to evaluate queries by taking into account a unique repair (called deterministic repair), which in some sense “summarizes” all the possible repairs. The so obtained answers are “approximate” in the sense that are safe (true and false atoms in the answers are, respectively, true and false under the classical two-valued semantics), but not complete.  相似文献   

20.
This paper focuses on the techniques used in an NKRL environment (NKRL = Narrative Knowledge Representation Language) to deal with a general problem affecting the so-called “semantic/conceptual annotations” techniques. These last, mainly ontology-based, aim at “annotating” multimedia documents by representing, in some way, the “inner meaning/deep content” of these documents. For documents of sufficient size, the content modeling operations are separately executed on ‘significant fragments’ of the documents, e.g., “sentences” for natural language texts or “segments” (minimal units for story advancement) in a video context. The general problem above concerns then the possibility of collecting all the partial conceptual representations into a global one. This integration operation must, moreover, be carried out in such a way that the meaning of the full document could go beyond the simple addition of the ‘meanings’ conveyed by the single fragments. In this context, NKRL makes use of second order knowledge representation structures, “completive construction” and “binding occurrences”, for collecting within the conceptual annotation of a whole “narrative” the basic building blocks corresponding to the representation of its composing elementary events. These solutions, of a quite general nature, are discussed in some depth in this paper. This last includes also a short “state of the art” in the annotation domain and some comparisons with the different methodologies proposed in the past for solving the above ‘integration’ problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号