首页 | 本学科首页   官方微博 | 高级检索  
文章检索
  按 检索   检索词:      
出版年份:   被引次数:   他引次数: 提示:输入*表示无穷大
  收费全文   13篇
  免费   0篇
无线电   1篇
冶金工业   2篇
自动化技术   10篇
  2010年   2篇
  2006年   1篇
  2002年   1篇
  2001年   2篇
  1998年   1篇
  1997年   1篇
  1996年   2篇
  1994年   1篇
  1992年   1篇
  1990年   1篇
排序方式: 共有13条查询结果,搜索用时 15 毫秒
1.
Predictability, complexity, and learning.   总被引:7,自引:0,他引:7  
We define predictive information I(pred)(T) as the mutual information between the past and the future of a time series. Three qualitatively different behaviors are found in the limit of large observation times T:I(pred)(T) can remain finite, grow logarithmically, or grow as a fractional power law. If the time series allows us to learn a model with a finite number of parameters, then I(pred)(T) grows logarithmically with a coefficient that counts the dimensionality of the model space. In contrast, power-law growth is associated, for example, with the learning of infinite parameter (or nonparametric) models such as continuous functions with smoothness constraints. There are connections between the predictive information and measures of complexity that have been defined both in learning theory and the analysis of physical systems through statistical mechanics and dynamical systems theory. Furthermore, in the same way that entropy provides the unique measure of available information consistent with some simple and plausible conditions, we argue that the divergent part of I(pred)(T) provides the unique measure for the complexity of dynamics underlying a time series. Finally, we discuss how these ideas may be useful in problems in physics, statistics, and biology.  相似文献   
2.
Selective Sampling Using the Query by Committee Algorithm   总被引:23,自引:0,他引:23  
Freund  Yoav  Seung  H. Sebastian  Shamir  Eli  Tishby  Naftali 《Machine Learning》1997,28(2-3):133-168
We analyze the query by committee algorithm, a method for filtering informative queries from a random stream of inputs. We show that if the two-member committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the number of queries. We show that, in particular, this exponential decrease holds for query learning of perceptrons.  相似文献   
3.
Ron  Dana  Singer  Yoram  Tishby  Naftali 《Machine Learning》1996,25(2-3):117-149
We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions generated by general probabilistic automata, we prove that the algorithm we present can efficiently learn distributions generated by PSAs. In particular, we show that for any target PSA, the KL-divergence between the distribution generated by the target and the distribution generated by the hypothesis the learning algorithm outputs, can be made small with high confidence in polynomial time and sample complexity. The learning algorithm is motivated by applications in human-machine interaction. Here we present two applications of the algorithm. In the first one we apply the algorithm in order to construct a model of the English language, and use this model to correct corrupted text. In the second application we construct a simple stochastic model for E.coli DNA.  相似文献   
4.
The Hierarchical Hidden Markov Model: Analysis and Applications   总被引:20,自引:0,他引:20  
Fine  Shai  Singer  Yoram  Tishby  Naftali 《Machine Learning》1998,32(1):41-62
We introduce, analyze and demonstrate a recursive hierarchical generalization of the widely used hidden Markov models, which we name Hierarchical Hidden Markov Models (HHMM). Our model is motivated by the complex multi-scale structure which appears in many natural sequences, particularly in language, handwriting and speech. We seek a systematic unsupervised approach to the modeling of such structures. By extending the standard Baum-Welch (forward-backward) algorithm, we derive an efficient procedure for estimating the model parameters from unlabeled data. We then use the trained model for automatic hierarchical parsing of observation sequences. We describe two applications of our model and its parameter estimation procedure. In the first application we show how to construct hierarchical models of natural English text. In these models different levels of the hierarchy correspond to structures on different length scales in the text. In the second application we demonstrate how HHMMs can be used to automatically identify repeated strokes that represent combination of letters in cursive handwriting.  相似文献   
5.
6.
Clustering stability methods are a family of widely used model selection techniques for data clustering. Their unifying theme is that an appropriate model should result in a clustering which is robust with respect to various kinds of perturbations. Despite their relative success, not much is known theoretically on why or when do they work, or even what kind of assumptions they make in choosing an ‘appropriate’ model. Moreover, recent theoretical work has shown that they might ‘break down’ for large enough samples. In this paper, we focus on the behavior of clustering stability using k-means clustering. Our main technical result is an exact characterization of the distribution to which suitably scaled measures of instability converge, based on a sample drawn from any distribution in ? n satisfying mild regularity conditions. From this, we can show that clustering stability does not ‘break down’ even for arbitrarily large samples, at least for the k-means framework. Moreover, it allows us to identify the factors which eventually determine the behavior of clustering stability. This leads to some basic observations about what kind of assumptions are made when using these methods. While often reasonable, these assumptions might also lead to unexpected consequences.  相似文献   
7.
8.
Dynamical encoding of cursive handwriting   总被引:1,自引:0,他引:1  
  相似文献   
9.
Dubnov  Shlomo  El-Yaniv  Ran  Gdalyahu  Yoram  Schneidman  Elad  Tishby  Naftali  Yona  Golan 《Machine Learning》2002,47(1):35-61
We present a novel pairwise clustering method. Given a proximity matrix of pairwise relations (i.e. pairwise similarity or dissimilarity estimates) between data points, our algorithm extracts the two most prominent clusters in the data set. The algorithm, which is completely nonparametric, iteratively employs a two-step transformation on the proximity matrix. The first step of the transformation represents each point by its relation to all other data points, and the second step re-estimates the pairwise distances using a statistically motivated proximity measure on these representations. Using this transformation, the algorithm iteratively partitions the data points, until it finally converges to two clusters. Although the algorithm is simple and intuitive, it generates a complex dynamics of the proximity matrices. Based on this bipartition procedure we devise a hierarchical clustering algorithm, which employs the basic bipartition algorithm in a straightforward divisive manner. The hierarchical clustering algorithm copes with the model validation problem using a general cross-validation approach, which may be combined with various hierarchical clustering methods.We further present an experimental study of this algorithm. We examine some of the algorithm's properties and performance on some synthetic and standard data sets. The experiments demonstrate the robustness of the algorithm and indicate that it generates a good clustering partition even when the data is noisy or corrupted.  相似文献   
10.
The Information Bottleneck is an information theoretic framework that finds concise representations for an ‘input’ random variable that are as relevant as possible for an ‘output’ random variable. This framework has been used successfully in various supervised and unsupervised applications. However, its learning theoretic properties and justification remained unclear as it differs from standard learning models in several crucial aspects, primarily its explicit reliance on the joint input–output distribution. In practice, an empirical plug-in estimate of the underlying distribution has been used, so far without any finite sample performance guarantees. In this paper we present several formal results that address these difficulties. We prove several finite sample bounds, which show that the information bottleneck can provide concise representations with good generalization, based on smaller sample sizes than needed to estimate the underlying distribution. The bounds are non-uniform and adaptive to the complexity of the specific model chosen. Based on these results, we also present a preliminary analysis on the possibility of analyzing the information bottleneck method as a learning algorithm in the familiar performance-complexity tradeoff framework. In addition, we formally describe the connection between the information bottleneck and minimal sufficient statistics.  相似文献   
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号