首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 929 毫秒
1.
Imperfect information inevitably appears in real situations for a variety of reasons. Although efforts have been made to incorporate imperfect data into classification techniques, there are still many limitations as to the type of data, uncertainty, and imprecision that can be handled. In this paper, we will present a Fuzzy Random Forest ensemble for classification and show its ability to handle imperfect data into the learning and the classification phases. Then, we will describe the types of imperfect data it supports. We will devise an augmented ensemble that can operate with others type of imperfect data: crisp, missing, probabilistic uncertainty, and imprecise (fuzzy and crisp) values. Additionally, we will perform experiments with imperfect datasets created for this purpose and datasets used in other papers to show the advantage of being able to express the true nature of imperfect information.  相似文献   

2.
Traditional supervised learning requires the groundtruth labels for the training data, which can be difficult to collect in many cases. In contrast, crowdsourcing learning collects noisy annotations from multiple non-expert workers and infers the latent true labels through some aggregation approach. In this paper, we notice that existing deep crowdsourcing work does not sufficiently model worker correlations, which is, however, shown to be helpful for learning by previous non-deep learning approaches. We propose a deep generative crowdsourcing learning approach to incorporate the strengths of Deep Neural Networks (DNNs) and exploit worker correlations. The model comprises a DNN classifier as a prior and an annotation generation process. A mixture model of workers'' capabilities within each class is introduced into the annotation generation process for worker correlation modeling. For adaptive trade-off between model complexity and data fitting, we implement fully Bayesian inference. Based on the natural-gradient stochastic variational inference techniques developed for the Structured Variational AutoEncoder (SVAE), we combine variational message passing for conjugate parameters and stochastic gradient descent for DNN parameters into a unified framework for efficient end-to-end optimization. Experimental results on 22 real crowdsourcing datasets demonstrate the effectiveness of the proposed approach.  相似文献   

3.
Credit scoring models are commonly built on a sample of accepted applicants whose repayment and behaviour information is observable once the loan has been issued. However in practice these models are regularly applied to new applicants, which may cause sample bias. This bias is even more pronounced in online lending, where over 90% of total loan requests are rejected. Reject inference is a technique to infer the outcomes for rejected applicants and incorporate them in the scoring system, with the expectation that predictive accuracy is improved. This paper extends previous studies in two main ways: firstly, we propose a new method involving machine learning to solve the reject inference problem; secondly, the Semi-supervised Support Vector Machines model is found to improve the performance of scoring models compared to the industrial benchmark of logistic regression, based on 56,626 accepted and 563,215 rejected online consumer loans.  相似文献   

4.
According to efficient markets theory, information is an important factor that affects market performance and serves as a source of first‐hand evidence in decision making, in particular with the rapid rise of Internet technologies in recent years. However, a lack of knowledge and inference ability prevents current decision support systems from processing the wide range of available information. In this paper, we propose a common‐sense knowledge‐supported news model. Compared with previous work, our model is the first to incorporate broad common‐sense knowledge into a decision support system, thereby improving the news analysis process through the application of a graphic random‐walk framework. Prototype and experiments based on Hong Kong stock market data have demonstrated that common‐sense knowledge is an important factor in building financial decision models that incorporate news information.  相似文献   

5.
PrDB: managing and exploiting rich correlations in probabilistic databases   总被引:2,自引:0,他引:2  
Due to numerous applications producing noisy data, e.g., sensor data, experimental data, data from uncurated sources, information extraction, etc., there has been a surge of interest in the development of probabilistic databases. Most probabilistic database models proposed to date, however, fail to meet the challenges of real-world applications on two counts: (1) they often restrict the kinds of uncertainty that the user can represent; and (2) the query processing algorithms often cannot scale up to the needs of the application. In this work, we define a probabilistic database model, PrDB, that uses graphical models, a state-of-the-art probabilistic modeling technique developed within the statistics and machine learning community, to model uncertain data. We show how this results in a rich, complex yet compact probabilistic database model, which can capture the commonly occurring uncertainty models (tuple uncertainty, attribute uncertainty), more complex models (correlated tuples and attributes) and allows compact representation (shared and schema-level correlations). In addition, we show how query evaluation in PrDB translates into inference in an appropriately augmented graphical model. This allows us to easily use any of a myriad of exact and approximate inference algorithms developed within the graphical modeling community. While probabilistic inference provides a generic approach to solving queries, we show how the use of shared correlations, together with a novel inference algorithm that we developed based on bisimulation, can speed query processing significantly. We present a comprehensive experimental evaluation of the proposed techniques and show that even with a few shared correlations, significant speedups are possible.  相似文献   

6.
Prior research in botnet detection has used the bot lifecycle to build detection systems. These systems, however, use rule-based decision engines which lack automated adaptability and learning, accuracy tunability, the ability to cope with gaps in training data, and the ability to incorporate local security policies. To counter these limitations, we propose to replace the rigid decision engines in contemporary bot detectors with a more formal Bayesian inference engine. Bottleneck, our prototype implementation, builds confidence in bot infections based on the causal bot lifecycle encoded in a Bayesian network. We evaluate Bottleneck by applying it as a post-processing decision engine on lifecycle events generated by two existing bot detectors (BotHunter and BotFlex) on two independently-collected datasets. Our experimental results show that Bottleneck consistently achieves comparable or better accuracy than the existing rule-based detectors when the test data is similar to the training data. For differing training and test data, Bottleneck, due to its automated learning and inference models, easily surpasses the accuracies of rule-based systems. Moreover, Bottleneck’s stochastic nature allows its accuracy to be tuned with respect to organizational needs. Extending Bottleneck’s Bayesian network into an influence diagram allows for local security policies to be defined within our framework. Lastly, we show that Bottleneck can also be extended to incorporate evidence trustscore for false alarm reduction.  相似文献   

7.
A general framework for adaptive processing of data structures   总被引:2,自引:0,他引:2  
A structured organization of information is typically required by symbolic processing. On the other hand, most connectionist models assume that data are organized according to relatively poor structures, like arrays or sequences. The framework described in this paper is an attempt to unify adaptive models like artificial neural nets and belief nets for the problem of processing structured information. In particular, relations between data variables are expressed by directed acyclic graphs, where both numerical and categorical values coexist. The general framework proposed in this paper can be regarded as an extension of both recurrent neural networks and hidden Markov models to the case of acyclic graphs. In particular we study the supervised learning problem as the problem of learning transductions from an input structured space to an output structured space, where transductions are assumed to admit a recursive hidden state-space representation. We introduce a graphical formalism for representing this class of adaptive transductions by means of recursive networks, i.e., cyclic graphs where nodes are labeled by variables and edges are labeled by generalized delay elements. This representation makes it possible to incorporate the symbolic and subsymbolic nature of data. Structures are processed by unfolding the recursive network into an acyclic graph called encoding network. In so doing, inference and learning algorithms can be easily inherited from the corresponding algorithms for artificial neural networks or probabilistic graphical model.  相似文献   

8.
We formulate the problem of 3D human pose estimation and tracking as one of inference in a graphical model. Unlike traditional kinematic tree representations, our model of the body is a collection of loosely-connected body-parts. In particular, we model the body using an undirected graphical model in which nodes correspond to parts and edges to kinematic, penetration, and temporal constraints imposed by the joints and the world. These constraints are encoded using pair-wise statistical distributions, that are learned from motion-capture training data. Human pose and motion estimation is formulated as inference in this graphical model and is solved using Particle Message Passing (PaMPas). PaMPas is a form of non-parametric belief propagation that uses a variation of particle filtering that can be applied over a general graphical model with loops. The loose-limbed model and decentralized graph structure allow us to incorporate information from “bottom-up” visual cues, such as limb and head detectors, into the inference process. These detectors enable automatic initialization and aid recovery from transient tracking failures. We illustrate the method by automatically tracking people in multi-view imagery using a set of calibrated cameras and present quantitative evaluation using the HumanEva dataset.  相似文献   

9.
In this work we develop a new multivariate technique to produce regressions with interpretable coefficients that are close to and of the same signs as the pairwise regression coefficients. Using a multiobjective approach to incorporate multiple and pairwise regressions into one objective we reduce this technique to an eigenproblem that represents a hybrid between regression and principal component analyses. We show that our approach corresponds to a specific scheme of ridge regression with a total matrix added to the matrix of correlations.Scope and purposeOne of the main goals of multiple regression modeling is to assess the importance of predictor variables in determining the prediction. However, in practical applications inference about the coefficients of regression can be difficult because real data is correlated and multicollinearity causes instability in the coefficients. In this paper we present a new technique to create a regression model that maintains the interpretability of the coefficients. We show with real data that it is possible to generate a model with coefficients that are similar to easily interpretable pairwise relations of predictors with the dependent variable, and this model is similar to the regular multiple regression model in predictive ability.  相似文献   

10.
模糊集理论适用于一些实验数据中不确定性和模糊性的建模问题,而模糊推理系统拥有模糊IF-THEN格式的结构化知识表示,但缺少适应性。神经网络本身具有对外部很强的适应性和从过去数据中学习的机制,但基于线性推理的模糊神经网络(FNN)模型作为模糊推理方法不能得到存在于参数间的最终关系,也不能影响接着发生的模糊集合。因此,我们提出了一个多级模糊神经网络(Multi-FNN),使用硬C均值聚类和进化模糊颗粒,利用处理为近似推理的一个线性推理,获得信息微粒和模糊集之间的关系。  相似文献   

11.
Malicious users can exploit the correlation among data to infer sensitive information from a series of seemingly innocuous data accesses. Thus, we develop an inference violation detection system to protect sensitive data content. Based on data dependency, database schema and semantic knowledge, we constructed a semantic inference model (SIM) that represents the possible inference channels from any attribute to the pre-assigned sensitive attributes. The SIM is then instantiated to a semantic inference graph (SIG) for query-time inference violation detection. For a single user case, when a user poses a query, the detection system will examine his/her past query log and calculate the probability of inferring sensitive information. The query request will be denied if the inference probability exceeds the prespecified threshold. For multi-user cases, the users may share their query answers to increase the inference probability. Therefore, we develop a model to evaluate collaborative inference based on the query sequences of collaborators and their task-sensitive collaboration levels. Experimental studies reveal that information authoritativeness, communication fidelity and honesty in collaboration are three key factors that affect the level of achievable collaboration. An example is given to illustrate the use of the proposed technique to prevent multiple collaborative users from deriving sensitive information via inference.  相似文献   

12.
Until recently, the lack of ground truth data has hindered the application of discriminative structured prediction techniques to the stereo problem. In this paper we use ground truth data sets that we have recently constructed to explore different model structures and parameter learning techniques. To estimate parameters in Markov random fields (MRFs) via maximum likelihood one usually needs to perform approximate probabilistic inference. Conditional random fields (CRFs) are discriminative versions of traditional MRFs. We explore a number of novel CRF model structures including a CRF for stereo matching with an explicit occlusion model. CRFs require expensive inference steps for each iteration of optimization and inference is particularly slow when there are many discrete states. We explore belief propagation, variational message passing and graph cuts as inference methods during learning and compare with learning via pseudolikelihood. To accelerate approximate inference we have developed a new method called sparse variational message passing which can reduce inference time by an order of magnitude with negligible loss in quality. Learning using sparse variational message passing improves upon previous approaches using graph cuts and allows efficient learning over large data sets when energy functions violate the constraints imposed by graph cuts.  相似文献   

13.
词向量在自然语言处理中起着重要的作用,近年来受到越来越多研究者的关注。然而,传统词向量学习方法往往依赖于大量未经标注的文本语料库,却忽略了单词的语义信息如单词间的语义关系。为了充分利用已有领域知识库(包含丰富的词语义信息),文中提出一种融合语义信息的词向量学习方法(KbEMF),该方法在矩阵分解学习词向量的模型上加入领域知识约束项,使得拥有强语义关系的词对获得的词向量相对近似。在实际数据上进行的单词类比推理任务和单词相似度量任务结果表明,KbEMF比已有模型具有明显的性能提升。  相似文献   

14.
数据关联是视觉传感网络联合监控系统的基本问题之一. 本文针对存在漏检条件下视觉传感网络的数据关联问题, 提出高阶时空观测模型并在此基础上建立了数据关联问题的动态贝叶斯网络描述. 给出了数据关联精确推理算法并分析了其计算复杂性, 接着根据不同的独立性假设提出两种近似推理算法以降低算法运算量, 并将提出的推理算法嵌入到EM算法框架中,使该算法能够应用于目标外观模型未知的情况. 仿真和实验结果表明了所提方法的有效性.  相似文献   

15.
科学数据库主要包括数据和程序.为了能够将可用的数据资源和科学程序合并成强大的信息管理和计算系统,识别和利用它们之间的语义关系十分必要.提出一种本体的框架来获取这种关系,同时允许对科学资源的有效结合做出判断.在此基础之上,创建一种分层的开放式体系结构,这不仅能够解决传统的设计难题,还可以动态地生成工作流,最大程度地减少人工干预,以达到提高科学数据库使用效率的目的.  相似文献   

16.
17.
18.
传统的语义数据流推理使用前向或后向链式推理产生确定性的答案,但是在复杂的传递规则推理中效率不高,无法满足实时数据流处理场景对答案的及时性要求。因此,提出一种基于联合嵌入模型的知识表示方法,并应用于语义数据流处理中。将规则与事实三元组联合嵌入并利用深度学习模型进行训练,在推理阶段,根据查询中涉及的规则建立推理模板,利用深度学习模型对推理模板产生的三元组进行预测和分类,将结果作为查询和推理答案输出。实验表明,对于复杂规则推理,基于知识表示学习的实时语义数据流推理能够在保障较好推理准确性和命中率的前提下有效地降低延迟。  相似文献   

19.
In this paper we describe a technique for controlling inference, called meta-level inference, and a program for algebraic manipulation, PRESS, which embodies this technique. In PRESS, algebraic expressions are manipulated by a series of methods. The appropriate method is chosen by meta-level inference and itself uses meta-level reasoning to select and apply rewrite rules to the current expression.The use of meta-level inference is shown to drastically cut down on search, lead to clear and modular programs, aid the proving of properties of the program and enable the automatic learning of both new algebraic facts and new control information.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号