共查询到20条相似文献,搜索用时 0 毫秒
1.
Automated model searches using information criteria are used for the estimation of linear single equation models. Genetic algorithms are described and used for this purpose. These algorithms are shown to be a practical method for model selection when the number of sub-models are very large. Several examples are presented including tests for bivariate Granger causality and seasonal unit roots. Automated selection of an autoregressive distributed lag model for the consumption function in the US is also undertaken.JEL classifications: C32, C69 相似文献
2.
Given a set of models and some training data, we would like to find the model that best describes the data. Finding the model with the lowest generalization error is a computationally expensive process, especially if the number of testing points is high or if the number of models is large. Optimization techniques such as hill climbing or genetic algorithms are helpful but can end up with a model that is arbitrarily worse than the best one or cannot be used because there is no distance metric on the space of discrete models. In this paper we develop a technique called racing that tests the set of models in parallel, quickly discards those models that are clearly inferior and concentrates the computational effort on differentiating among the better models. Racing is especially suitable for selecting among lazy learners since training requires negligible expense, and incremental testing using leave-one-out cross validation is efficient. We use racing to select among various lazy learning algorithms and to find relevant features in applications ranging from robot juggling to lesion detection in MRI scans. 相似文献
3.
4.
Information Filtering: Selection Mechanisms in Learning Systems 总被引:2,自引:2,他引:2
Knowledge has traditionally been considered to have a beneficial effect on the performance of problem solvers but recent studies indicate that knowledge acquisition is not necessarily a monotonically beneficial process, because additional knowledge sometimes leads to a deterioration in system performance. This paper is concerned with the problem of harmful knowledge: that is, knowledge whose removal would improve a system's performance. In the first part of the paper a unifying framework, called theinformation filtering model, is developed to define the various alternative methods for eliminating such knowledge from a learning system where selection processes, called filters, may be inserted to remove potentially harmful knowledge. These filters are termed selective experience, selective attention, selective acquisition, selective retention, and selective utilization. The framework can be used by developers of learning systems as a guide for selecting an appropriate filter to reduce or eliminate harmful knowledge.In the second part of the paper, the framework is used to identify a suitable filter for solving a problem caused by the acquisition of harmful knowledge in a learning system calledLassy.Lassy is a system that improves the performance of a PROLOG interpreter by utilizing acquired domain specific knowledge in the form of lemmas stating previously proved results. It is shown that the particular kind of problems that arise with this system are best solved using a novel utilization filter that blocks the use of lemmas in attempts to prove subgoals that have a high probability of failing. 相似文献
5.
Geometric Information Criterion for Model Selection 总被引:3,自引:0,他引:3
Kenichi Kanatani 《International Journal of Computer Vision》1998,26(3):171-189
In building a 3-D model of the environment from image and sensor data, one must fit to the data an appropriate class of models, which can be regarded as a parametrized manifold, or geometric model, defined in the data space. In this paper, we present a statistical framework for detecting degeneracies of a geometric model by evaluating its predictive capability in terms of the expected residual and derive the geometric AIC. We show that it allows us to detect singularities in a structure-from-motion analysis without introducing any empirically adjustable thresholds. We illustrate our approach by simulation examples. We also discuss the application potential of this theory for a wide range of computer vision and robotics problems. 相似文献
6.
该文针对分布式信息检索时不同集合对最终检索结果贡献度有差异的现象,提出一种基于LDA主题模型的集合选择方法。该方法首先使用基于查询的采样方法获取各集合描述信息;其次,通过建立LDA主题模型计算查询与文档的主题相关度;再次,用基于关键词相关度与主题相关度相结合的方法估计查询与样本集中文档的综合相关度,进而估计查询与各集合的相关度;最后,选择相关度最高的M个集合进行检索。实验部分采用Rm、P@n和MAP作为评价指标,对集合选择方法的性能进行了验证。实验结果表明该方法能更准确的定位到包含相关文档多的集合,提高了检索结果的召回率和准确率。 相似文献
7.
James P. Hoffmann Christopher D. Ellingwood Osei M. Bonsu Daniel E. Bentil 《Genetic Programming and Evolvable Machines》2004,5(2):229-241
This paper describes an evolutionary algorithm-based approach to model selection and demonstrates its effectiveness in using the information content of ecological data to choose the correct model structure. Experiments with a modified genetic algorithm are described that combine parsimony with a novel gene regulation mechanism. This combination creates evolvable switches that implement functional variable-length genomes in the GA that allow for simultaneous model selection and parameter fitting. In effect, the GA orchestrates a competition among a community of models. Parsimony is implemented via the Akaike Information Criterion, and gene regulation uses a modulo function to overload the gene values and create an evolvable binary switch. The approach is shown to successfully specify the correct model structure in experiments with a nested set of polynomial test models and complex biological simulation models, even when Gaussian noise is added to the data. 相似文献
8.
Advertisement (ad) selection plays an important role in sponsored search, since it is an upstream component and will heavily influence the effectiveness of the subsequent auction mechanism. However, mo... 相似文献
9.
Zhiwu Lu 《Neural Processing Letters》2007,25(1):17-30
In Gaussian mixture modeling, it is crucial to select the number of Gaussians or mixture model for a sample data set. Under
regularization theory, we aim to solve this kind of model selection problem through implementing entropy regularized likelihood
(ERL) learning on Gaussian mixture via a batch gradient learning algorithm. It is demonstrated by the simulation experiments
that this gradient ERL learning algorithm can select an appropriate number of Gaussians automatically during the parameter
learning on a sample data set and lead to a good estimation of the parameters in the actual Gaussian mixture, even in the
cases of two or more actual Gaussians overlapped strongly. We further give an adaptive gradient implementation of the ERL
learning on Gaussian mixture followed with theoretic analysis, and find a mechanism of generalized competitive learning implied
in the ERL learning. 相似文献
10.
排序问题在信息检索领域是一个非常重要的课题。虽然排序学习模型的算法早已被深入研究,但针对排序学习算法中的特征选择的研究却很少。现实的情况是,许多用于分类的特征选择方法被直接应用到排序学习中。但由于排序和分类有着显著的差异,应研究出针对排序的特征选择算法。文中在介绍常用的排序学习的特征选择方法的基础上,提出了一种全新的、适用于QA问题的排序学习的特征选择方法一锦标赛排序特征选择方法。实验结果显示,这种新的特征选择方法在提高特征提取效率和降低特征向量维数方面都有显著改善。 相似文献
11.
Graph neural networks(GNNs) have shown great power in learning on graphs.However,it is still a challenge for GNNs to model information faraway from the source node.The ability to preserve global information can enhance graph representation and hence improve classification precision.In the paper,we propose a new learning framework named G-GNN(Global information for GNN) to address the challenge.First,the global structure and global attribute features of each node are obtained via unsupervised pre-training,and those global features preserve the global information associated with the node.Then,using the pre-trained global features and the raw attributes of the graph,a set of parallel kernel GNNs is used to learn different aspects from these heterogeneous features.Any general GNN can be used as a kernal and easily obtain the ability of preserving global information,without having to alter their own algorithms.Extensive experiments have shown that state-of-the-art models,e.g.,GCN,GAT,Graphsage and APPNP,can achieve improvement with G-GNN on three standard evaluation datasets.Specially,we establish new benchmark precision records on Cora(84.31%) and Pubmed(80.95%) when learning on attributed graphs. 相似文献
12.
特征选择是机器学习非常重要的预处理步骤,而邻域互信息是一种能直接处理连续型或离散型特征的有效方法。然而基于邻域互信息的特征选择方法一般采用启发式贪婪策略,其特征子集质量难以得到有效保证。基于三支决策的思想,提出了三支邻域互信息特征选择方法(NMI-TWD)。通过扩展三个潜在的候选特征子集,并保持各子集之间的差异性,以获得更高质量的特征子集。对三个差异性的特征子集进行集成学习,构建三支协同决策模型,以进一步提高分类学习性能。UCI实验数据表明,新方法的特征选择结果和分类性能较其他方法更优,说明了其有效性。 相似文献
13.
自动问答系统可以帮助人们快速从海量文本中提取出有效信息,而答案选取作为其中的关键一步,在很大程度上影响着自动问答系统的性能.针对现有答案选择模型中答案关键信息捕获不准确的问题,本文提出了一种融合语义信息与问题关键信息的多阶段注意力答案选取模型.该方法首先利用双向LSTM模型分别对问题和候选答案进行语义表示;然后采用问题的关键信息,包括问题类型和问题中心词,利用注意力机制对候选答案集合进行信息增强,筛选Top K个候选答案;然后采用问题的语义信息,再次利用注意力机制对Top K个候选答案集合进行信息增强,筛选出最佳答案.通过分阶段地将问题的关键信息和语义信息与候选答案的语义表示相结合,有效提高了对候选答案关键信息的捕获能力,从而提升了答案选取系统的性能.在三个数据集上对本文所提出的模型进行验证,相较已知同类最好模型,最高性能提升达1.95%. 相似文献
14.
We present a system for rapidly and easily building instructable and self-adaptive software agents that retrieve and extract information. Our Wisconsin Adaptive Web Assistant (WAWA) constructs intelligent agents by accepting user preferences in the form of instructions. These user-provided instructions are compiled into neural networks that are responsible for the adaptive capabilities of an intelligent agent. The agent’s neural networks are modified via user-provided and system-constructed training examples. Users can create training examples by rating Web pages (or documents), but more importantly WAWA’s agents uses techniques from reinforcement learning to internally create their own examples. Users can also provide additional instruction throughout the life of an agent. Our experimental evaluations on a ‘home-page finder’ agent and a ‘seminar-announcement extractor’ agent illustrate the value of using instructable and adaptive agents for retrieving and extracting information. 相似文献
15.
隆益民 《计算机测量与控制》2010,18(4)
分析当前信息系统存在的主要问题,介绍信息交换平台是一个基于点对点(Peer to Peer)的信息发布系统;在这个系统里,从信息的产生、采集、加工、存储、发布、消费到监管,形成了一个完整的信息生命体系;基于XML技术,经过对信息交换平台的信息描述深入的研究,提出了开放信息模型(OIM),对信息进行统一的描述,使信息可以跨平台发布;OIM研究目标有两个:一个是信息模型的设计;另一个是信息模型在信息交换平台里的实现;重点讨论基于开放信息模型的信息系统集成的系统结构、实现方法,并给出信息系统集成的应用案例与性能分析;实践表明,该架构可以较好地实现信息系统的应用集成并具有良好的性能。 相似文献
16.
自动文本分类技术是组织和管理医药信息的一个有效的办法。本文主要针对医药信息的自动文本分类系统展开研究,重点研究如何根据医药领域的特点进行有效的特征选择,提出了使用文档频率DF和互信息MI相结合进行医药特征选择的方法。另外,本文还构建了一个医药信息语料库作为医药信息自动文本分类系统的训练集和测试集,该语料库包含五个类别,600篇文本。实验证明,该方法能够有效提高医药文本分类系统的分类速度和精度。 相似文献
17.
分析了当前信息系统存在的主要问题,介绍了信息交换平台是一个基于点对点的信息发布系统.在这个系统里,从信息的产生、采集、加工、存储、发布、消费到监管,形成了一个完整的信息生命体系.基于XML技术,经过对信息交换平台的信息描述深入的研究,提出了开放信息模型(OIM),对信息进行统一的描述,使信息可以跨平台发布.重点讨论基于... 相似文献
18.
电竞行业近几年来迅速发展,这其中离不开机器学习在电竞中的分析和应用.职业选手能够熟悉电竞角色往往需要一定的时间和经验,采用机器学习对电竞角色进行分析,有利于选手对角色选择的考虑,为职业选手的训练和比赛提供数据支持,体现预测模型的分析效果.基于电竞结果通常为二分类数据,为了更好地利用先验信息,将自变量对因变量发生的不同重... 相似文献
19.
经典的消息传播模型没有充分考虑在线社交网络的复杂性以及网络节点间的拓扑结构差异。针对这种情况,提出一种基于PageRank的在线社交网络的消息传播模型P-SIR。该模型利用节点的PageRank值作为节点权威度并考虑在线社交网络传播机理,刻画不同类型节点随着时间变化的状态演化关系,反映消息传播过程受到网络拓扑结构和传播机理的影响。该模型还考虑在线社交网络中影响消息传播过程中的一些实际因素,动态指定节点的权威度以适应非均质网络,并考虑外部社会加强效应。采用3种不同类型的网络模拟消息传播过程,通过仿真实验验证P-SIR模型可以有效反映在线社交网络中的消息传播过程。 相似文献
20.
论文针对在线社会网络中的信息扩散问题,提出了一个信息扩散的预测模型。首先给出了以好友关系作为用户距离度量的方法;然后将信息扩散看作是“社会扩散”和“内部扩散”两种方式同时作用的结果,并分别将Fick扩散理论和Logistic增长模型用于描述这两个过程,设计了Fick‐Logistic扩散预测模型。最后,用该模型对Digg数据集中最具代表性的新闻实例进行预测。较高的预测准确率表明,论文提出的Fick‐Logistic扩散预测模型能较好描述Digg在线社会网络中的信息扩散过程,具有较好的预测性能。 相似文献