首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Predicting the fold, or approximate 3D structure, of a protein from its amino acid sequence is an important problem in biology. The homology modeling approach uses a protein database to identify fold-class relationships by sequence similarity. The main limitation of this method is that some proteins with similar structures appear to have very different sequences, which we call the hidden-homology problem. As in other real-world domains for machine learning, this difficulty may be caused by a low-level representation. Learning in such domains can be improved by using domain knowledge to search for representations that better match the inductive bias of a preferred algorithm. In this domain, knowledge of amino acid properties can be used to construct higher-level representations of protein sequences. In one experiment using a 179-protein data set, the accuracy of fold-class prediction was increased from 77.7% to 81.0%. The search results are analyzed to refine the grouping of small residues suggested by Dayhoff. Finally, an extension to the representation incorporates sequential context directly into the representation, which can express finer relationships among the amino acids. The methods developed in this domain are generalized into a framework that suggests several systematic roles for domain knowledge in machine learning. Knowledge may define both a space of alternative representations, as well as a strategy for searching this space. The search results may be summarized to extract feedback for revising the domain knowledge.  相似文献   

2.
Prequential model selection and delete-one cross-validation are data-driven methodologies for choosing between rival models on the basis of their predictive abilities. For a given set of observations, the predictive ability of a model is measured by the model's accumulated prediction error and by the model's average-out-of-sample prediction error, respectively, for prequential model selection and for cross-validation. In this paper, given i.i.d. observations, we propose nonparametric regression estimators—based on neural networks—that select the number of hidden units (or neurons) using either prequential model selection or delete-one cross-validation. As our main contributions: (i) we establish rates of convergence for the integrated mean-squared errors in estimating the regression function using off-line or batch versions of the proposed estimators and (ii) we establish rates of convergence for the time-averaged expected prediction errors in using on-line versions of the proposed estimators. We also present computer simulations (i) empirically validating the proposed estimators and (ii) empirically comparing the proposed estimators with certain novel prequential and cross-validated mixture regression estimators.  相似文献   

3.
One of the main limitations when accessing the web is the lack of explicit structure, whose presence may help in understanding data semantics. Schema for web data can be constructed at different levels, structuring a single pages or a whole site or group of sites. Here we present an approach to give a logical schema to a web-site, first defining a model for a single page, where its contents is divided into logical sections, i.e. parts of a page each collecting related information. Then, we introduce a site model in which both physical and logical links among different page sections are represented: physical are existing hyperlinks, while logical links are links between sections containing semantically related information. We show how such links can be found and classified according to their relevance, also showing how schema is used in a structure-aware browser to improve both browsing and searching.  相似文献   

4.
We consider the difficulty in deriving and validating new scales of measurement for modular cohesion. We show that currently derived objective measures cannot predict, or measure, a scale of cohesion that has an empirical relation system, for which a high degree of interpersonal agreement exists. However, we demonstrate empirically that it is feasible to predict low levels of a cohesion scale with an observed empirical relation. For this scale there exists agreement to make the observational distinctions that form the empirical relation system. Our statistically derived prediction systems use information flow measures and are available at architectural and detailed design. These prediction systems have been validated and we have determined their predictive capability using cross-validation. Within the limits of their external validity, we discuss how these and future prediction systems can be used to improve modular cohesion. For example, improvements may be achieved by using a simple cut-off value for fanout to predict modules that lack cohesion.  相似文献   

5.
Abe  Naoki  Mamitsuka  Hiroshi 《Machine Learning》1997,29(2-3):275-301
We propose a new method for predicting protein secondary structure of a given amino acid sequence, based on a training algorithm for the probability parameters of a stochastic tree grammar. In particular, we concentrate on the problem of predicting -sheet regions, which has previously been considered difficult because of the unbounded dependencies exhibited by sequences corresponding to -sheets. To cope with this difficulty, we use a new family of stochastic tree grammars, which we call Stochastic Ranked Node Rewriting Grammars, which are powerful enough to capture the type of dependencies exhibited by the sequences of -sheet regions, such as the parallel and anti-parallel dependencies and their combinations. The training algorithm we use is an extension of the inside-outside algorithm for stochastic context-free grammars, but with a number of significant modifications. We applied our method on real data obtained from the HSSP database (Homology-derived Secondary Structure of Proteins Ver 1.0) and the results were encouraging: Our method was able to predict roughly 75 percent of the -strands correctly in a systematic evaluation experiment, in which the test sequences not only have less than 25 percent identity to the training sequences, but are totally unrelated to them. This figure compares favorably to the predictive accuracy of the state-of-the-art prediction methods in the field, even though our experiment was on a restricted type of -sheet structures and the test was done on a relatively small data size. We also stress that our method can predict the structure as well as the location of -sheet regions, which was not possible by conventional methods for secondary structure prediction. Extended abstracts of parts of the work presented in this paper have appeared in (Abe & Mamitsuka, 1994) and (Mamitsuka & Abe, 1994).  相似文献   

6.
Summary Geffert has shown that earch recursively enumerable languageL over can be expressed in the formL{h(x) –1 g(x)x in +} * where is an alphabet andg, h is a pair of morphisms. Our purpose is to give a simple proof for Geffert's result and then sharpen it into the form where both of the morphisms are nonerasing. In our method we modify constructions used in a representation of recursively enumerable languages in terms of equality sets and in a characterization of simple transducers in terms of morphisms. As direct consequences, we get the undecidability of the Post correspondence problem and various representations ofL. For instance,L =(L 0) * whereL 0 is a minimal linear language and is the Dyck reductiona, A.  相似文献   

7.
This study deals with the integration of crisp and granular information for predicting the performance of a manufacturing process. Supporting and computing a set of two If-Then rules is considered the central idea for this integration. In these rules, the antecedent part deals with the recommended ranges of the control variables of the process, while the consequent part deals with the acceptable ranges of the performance measures of the process. The rules specify that if the control variables are kept within their recommended ranges, then it is likely or unlikely to get the performance measures within their acceptable ranges. The rules are supported by using the following conditional probabilities: the probability of getting the performance measures acceptable given that the control variables are within their recommended ranges (which should be likely), and the probability of getting performance measures acceptable given that the control variables are not within their recommended ranges (which should be unlikely). The remarkable thing is that both acceptable ranges and recommended ranges are subjectively defined concepts. So are likelihood perceptions such as likely and unlikely. Therefore, all of them can be defined by using some kind of fuzzy-granular information. The usefulness of this new approach is demonstrated by solving a machining decision-making problem (select cutting conditions and inserts satisfying subjectively defined surface finish requirement in terms of roughness and fractal dimension of machined surface). Further study should be directed toward understanding these rules in the context of predictive process planning.This revised version was published in June 2005 with corrected page numbers.  相似文献   

8.
Unification algorithms have been constructed for semigroups and commutative semigroups. This paper considers the intermediate case of partially commutative semigroups. We introduce classesN and of such semigroups and justify their use. We present an equation-solving algorithm for any member of the classN. This algorithm is relative to having an algorithm to determine all non-negative solutions of a certain class of diophantine equations of degree 2 which we call -equations. The difficulties arising when attempting to solve equations in members of the class are discussed, and we present arguments that strongly suggest that unification in these semigroups is undecidable.  相似文献   

9.
This paper shows how an affine representation of spatial configuration is obtained from a pair of projection views. Calibration of cameras and knowledge of the camera's motion are not necessary; however, some preselected reference points and their correspondences are needed. Projective and affine geometry invariants are trickily manipulated to do the affine reconstruction. The method is thus geometrically constructive. When it is compared with the solution proposed in 1989 by J.J. Koenderink and A.J. Van Doorn (Affine Structure from Motion, Technical Report, Utrect University), the method provides a viewpoint-independent affine representation under parallel projections. Further, we investigate the central-projection case in which, with three additional special reference points, the same affine reconstruction can be done. We also discuss some important applications of this viewpoint independence of shape representation.  相似文献   

10.
Predictive Prefetching on the Web and Its Potential Impact in the Wide Area   总被引:2,自引:0,他引:2  
The rapid increase of World Wide Web users and the development of services with high bandwidth requirements have caused the substantial increase of response times for users on the Internet. Web latency would be significantly reduced, if browser, proxy or Web server software could make predictions about the pages that a user is most likely to request next, while the user is viewing the current page, and prefetch their content.In this paper we study Predictive Prefetching on a totally new Web system architecture. This is a system that provides two levels of caching before information reaches the clients. This work analyses prefetching on a Wide Area Network with the above mentioned characteristics. We first provide a structured overview of predictive prefetching and show its wide applicability to various computer systems. The WAN that we refer to is the GRNET academic network in Greece. We rely on log files collected at the network's Transparent cache (primary caching point), located at GRNET's edge connection to the Internet. We present the parameters that are most important for prefetching on GRNET's architecture and provide preliminary results of an experimental study, quantifying the benefits of prefetching on the WAN. Our experimental study includes the evaluation of two prediction algorithms: an n most popular document algorithm and a variation of the PPM (Prediction by Partial Matching) prediction algorithm. Our analysis clearly shows that Predictive prefetching can improve Web response times inside the GRNET WAN without substantial increase in network traffic due to prefetching.  相似文献   

11.
12.
In this paper we concentrate on spatial prepositions, more specifically we are interested here in projective prepositions (eg. in front of, to the left of) which have in the past been treated as semantically uninteresting. We demonstrate that projective prepositions are in fact problematic and demand more attention than they have so far been afforded; after summarising the important components of their meaning, we review the deficiencies of past and current approaches to the decoding problem; that is, predicting what a locative expression used in a particular situation conveys. Finally we present our own approach. Motivated by the shortcomings of contemporary work, we integrate elements of Lang's conceptual representation of objects' perceptual and dimensional characteristics, and the potential field model of object proximity that originated in manipulator and mobile robot path-finding.  相似文献   

13.
Dynamic Textures   总被引:7,自引:0,他引:7  
Dynamic textures are sequences of images of moving scenes that exhibit certain stationarity properties in time; these include sea-waves, smoke, foliage, whirlwind etc. We present a characterization of dynamic textures that poses the problems of modeling, learning, recognizing and synthesizing dynamic textures on a firm analytical footing. We borrow tools from system identification to capture the essence of dynamic textures; we do so by learning (i.e. identifying) models that are optimal in the sense of maximum likelihood or minimum prediction error variance. For the special case of second-order stationary processes, we identify the model sub-optimally in closed-form. Once learned, a model has predictive power and can be used for extrapolating synthetic sequences to infinite length with negligible computational cost. We present experimental evidence that, within our framework, even low-dimensional models can capture very complex visual phenomena.  相似文献   

14.
This paper describes an approach for tracking rigid and articulated objects using a view-based representation. The approach builds on and extends work on eigenspace representations, robust estimation techniques, and parameterized optical flow estimation. First, we note that the least-squares image reconstruction of standard eigenspace techniques has a number of problems and we reformulate the reconstruction problem as one of robust estimation. Second we define a subspace constancy assumption that allows us to exploit techniques for parameterized optical flow estimation to simultaneously solve for the view of an object and the affine transformation between the eigenspace and the image. To account for large affine transformations between the eigenspace and the image we define a multi-scale eigenspace representation and a coarse-to-fine matching strategy. Finally, we use these techniques to track objects over long image sequences in which the objects simultaneously undergo both affine image motions and changes of view. In particular we use this EigenTracking technique to track and recognize the gestures of a moving hand.  相似文献   

15.
Fern  Alan  Givan  Robert 《Machine Learning》2003,53(1-2):71-109
We study resource-limited online learning, motivated by the problem of conditional-branch outcome prediction in computer architecture. In particular, we consider (parallel) time and space-efficient ensemble learners for online settings, empirically demonstrating benefits similar to those shown previously for offline ensembles. Our learning algorithms are inspired by the previously published boosting by filtering framework as well as the offline Arc-x4 boosting-style algorithm. We train ensembles of online decision trees using a novel variant of the ID4 online decision-tree algorithm as the base learner, and show empirical results for both boosting and bagging-style online ensemble methods. Our results evaluate these methods on both our branch prediction domain and online variants of three familiar machine-learning benchmarks. Our data justifies three key claims. First, we show empirically that our extensions to ID4 significantly improve performance for single trees and additionally are critical to achieving performance gains in tree ensembles. Second, our results indicate significant improvements in predictive accuracy with ensemble size for the boosting-style algorithm. The bagging algorithms we tried showed poor performance relative to the boosting-style algorithm (but still improve upon individual base learners). Third, we show that ensembles of small trees are often able to outperform large single trees with the same number of nodes (and similarly outperform smaller ensembles of larger trees that use the same total number of nodes). This makes online boosting particularly useful in domains such as branch prediction with tight space restrictions (i.e., the available real-estate on a microprocessor chip).  相似文献   

16.
Maximal word functions occur in data retrieval applications and have connections with ranking problems, which in turn were first investigated in relation to data compression [21]. By the maximal word function of a languageL *, we mean the problem of finding, on inputx, the lexicographically largest word belonging toL that is smaller than or equal tox.In this paper we present a parallel algorithm for computing maximal word functions for languages recognized by one-way nondeterministic auxiliary pushdown automata (and hence for the class of context-free languages).This paper is a continuation of a stream of research focusing on the problem of identifying properties others than membership which are easily computable for certain classes of languages. For a survey, see [24].  相似文献   

17.
This paper introduces alloyed prediction, a new hardware-based two-level branch predictor organization that combines global and local history in the same structure, combining the advantages of current two-level predictors with those of hybrid predictors. The alloyed organization is motivated by measurements showing that wrong-history mispredictions are even more important than conflict-induced mispredictions. Wrong-history mispredictions arise because current two-level, history-based predictors provide only global or only local history. The contribution of wrong history to the overall misprediction rate is substantial because most programs have some branches that require global history and others that require local history. This paper explores several ways to implement alloyed prediction, including the previously proposed bi-mode organization. Simulations show that mshare is the best alloyed organization among those we examine, and that mshare gives reliably good prediction compared to bimodal (two-bit), two-level, and hybrid predictors. The robust performance of alloying across a range of predictor sizes stems from its ability to attack wrong-history mispredictions at even very small sizes without subdividing the branch prediction hardware into smaller and less effective components.  相似文献   

18.
We present an extension to binary decision diagrams (BDDs) that exploits the information contained in the structure of a given circuit to produce a compact,semicanonical, representation. The resulting XBDDs (extended BDDs) retain many of the advantages of BDDs, while at the same time allowing one to deal with larger circuits.We propose algorithms for verification of combinational circuits based on XBDDs that overcome the exponential growth in the number of nodes in the BDDs for some specific circuits such as the multipliers. While the approach remains cpu-time intensive, we believe it is the first to exactly verify the most difficult (median) output of a 16-bit multiplier. Experimental results are presented to support our claim that the XBDD approach is the best for multiplier verification.  相似文献   

19.
This paper describes techniques for automatic construction of dictionaries for use in large-scale foreign language tutoring (FLT) and interlingual machine translation (MT) systems. The dictionaries are based on a language-independent representation called lexical conceptual structure (LCS). A primary goal of the LCS research is to demonstrate that synonymous verb senses share distributional patterns. We show how the syntax–semantics relation can be used to develop a lexical acquisition approach that contributes both toward the enrichment of existing online resources and toward the development of lexicons containing more complete information than is provided in any of these resources alone. We start by describing the structure of the LCS and showing how this representation is used in FLT and MT. We then focus on the problem of building LCS dictionaries for large-scale FLT and MT. First, we describe authoring tools for manual and semi-automatic construction of LCS dictionaries; we then present a more sophisticated approach that uses linguistic techniques for building word definitions automatically. These techniques have been implemented as part of a set of lexicon-development tools used in the milt FLT project.  相似文献   

20.
In this paper we present a method to translate VHDL into symbolic finite-state models. Our method can handle those aspects of VHDL which have a finite representation obtaining the semantics defined in the IEEE statndard. We describe an intermediate representation based on finite automata and its translation into a BDD-based reperesentation. Our model interfaces VHDL with a BDD-based functional symbolic model checker.The work of these authors is supported by ESPRIT project 6128 FORMAT.The work of this author is supported by the Volkswagenstiftung project Informatiksysteme.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号