首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 653 毫秒
1.
Recently, a great deal of research work has been devoted to the development of algorithms to estimate the intrinsic dimensionality (id) of a given dataset, that is the minimum number of parameters needed to represent the data without information loss. id estimation is important for the following reasons: the capacity and the generalization capability of discriminant methods depend on it; id is a necessary information for any dimensionality reduction technique; in neural network design the number of hidden units in the encoding middle layer should be chosen according to the id of data; the id value is strongly related to the model order in a time series, that is crucial to obtain reliable time series predictions. Although many estimation techniques have been proposed in the literature, most of them fail on noisy data, or compute underestimated values when the id is sufficiently high. In this paper, after reviewing some of the most important id estimators related to our work, we provide a theoretical motivation of the bias that causes the underestimation effect, and we present two id estimators based on the statistical properties of manifold neighborhoods, which have been developed in order to reduce this effect. We exhaustively evaluate the proposed techniques on synthetic and real datasets, by employing an objective evaluation measure to compare their performance with those achieved by state of the art algorithms; the results show that the proposed methods are promising, and produce reliable estimates also in the difficult case of datasets drawn from non-linearly embedded manifolds, characterized by high?id.  相似文献   

2.
We study a simple technique, originally presented by Herlihy (ACM Trans. Program. Lang. Syst. 15(5):745–770, 1993), for executing concurrently, in a wait-free manner, blocks of code that have been programmed for sequential execution and require significant synchronization in order to be performed in parallel. We first present an implementation of this technique, called Sim, which employs a collect object. We describe a simple implementation of a collect object from a single shared object that supports atomic Add (or XOR) in addition to read; this implementation has step complexity O(1). By plugging in to Sim this implementation, Sim exhibits constant step complexity as well. This allows us to derive lower bounds on the step complexity of implementations of several shared objects, like Add, XOR, collect, and snapshot objects, from LL/SC objects. We then present a practical version of Sim, called PSim, which is implemented in a real shared-memory machine. From a theoretical perspective, PSim has worse step complexity than Sim, its theoretical analog; in practice though, we experimentally show that PSim is highly-efficient: it outperforms several state-of-the-art lock-based and lock-free synchronization techniques, and this given that it is wait-free, i.e. that it satisfies a stronger progress condition than all the algorithms that it outperforms. We have used PSim to get highly-efficient wait-free implementations of stacks and queues.  相似文献   

3.
Due to significant advances in SAT technology in the last years, its use for solving constraint satisfaction problems has been gaining wide acceptance. Solvers for satisfiability modulo theories (SMT) generalize SAT solving by adding the ability to handle arithmetic and other theories. Although there are results pointing out the adequacy of SMT solvers for solving CSPs, there are no available tools to extensively explore such adequacy. For this reason, in this paper we introduce a tool for translating FLATZINC (MINIZINC intermediate code) instances of CSPs to the standard SMT-LIB language. We provide extensive performance comparisons between state-of-the-art SMT solvers and most of the available FLATZINC solvers on standard FLATZINC problems. The obtained results suggest that state-of-the-art SMT solvers can be effectively used to solve CSPs.  相似文献   

4.
In this paper we introduce the Manufactured Analytical Solution Abstraction ( MASA ) library for applying the method of manufactured solutions to the verification of software used for solving a large class of problems stemming from numerical methods in mathematical physics including nonlinear equations, systems of algebraic equations, and ordinary and partial differential equations. We discuss the process of scientific software verification, manufactured solution generation using symbolic manipulation with computer algebra systems such as Maple? or SymPy, and automatic differentiation for forcing function evaluation. We discuss a hierarchic methodology that can be used to alleviate the combinatorial complexity in generating symbolic manufactured solutions for systems of equations based on complex physics. Finally, we detail the essential features and examples of the Application Programming Interface behind MASA , an open source library designed to act as a central repository for manufactured and analytical solutions over a diverse range of problems.  相似文献   

5.
In this paper we propose mathematical optimizations to select the optimal regularization parameter for ridge regression using cross-validation. The resulting algorithm is suited for large datasets and the computational cost does not depend on the size of the training set. We extend this algorithm to forward or backward feature selection in which the optimal regularization parameter is selected for each possible feature set. These feature selection algorithms yield solutions with a sparse weight matrix using a quadratic cost on the norm of the weights. A naive approach to optimizing the ridge regression parameter has a computational complexity of the order $O(R K N^{2} M)$ with $R$ the number of applied regularization parameters, $K$ the number of folds in the validation set, $N$ the number of input features and $M$ the number of data samples in the training set. Our implementation has a computational complexity of the order $O(KN^3)$ . This computational cost is smaller than that of regression without regularization $O(N^2M)$ for large datasets and is independent of the number of applied regularization parameters and the size of the training set. Combined with a feature selection algorithm the algorithm is of complexity $O(RKNN_s^3)$ and $O(RKN^3N_r)$ for forward and backward feature selection respectively, with $N_s$ the number of selected features and $N_r$ the number of removed features. This is an order $M$ faster than $O(RKNN_s^3M)$ and $O(RKN^3N_rM)$ for the naive implementation, with $N \ll M$ for large datasets. To show the performance and reduction in computational cost, we apply this technique to train recurrent neural networks using the reservoir computing approach, windowed ridge regression, least-squares support vector machines (LS-SVMs) in primal space using the fixed-size LS-SVM approximation and extreme learning machines.  相似文献   

6.
7.
The Meteor metric for automatic evaluation of machine translation   总被引:1,自引:1,他引:0  
The Meteor Automatic Metric for Machine Translation evaluation, originally developed and released in 2004, was designed with the explicit goal of producing sentence-level scores which correlate well with human judgments of translation quality. Several key design decisions were incorporated into Meteor in support of this goal. In contrast with IBM’s Bleu, which uses only precision-based features, Meteor uses and emphasizes recall in addition to precision, a property that has been confirmed by several metrics as being critical for high correlation with human judgments. Meteor also addresses the problem of reference translation variability by utilizing flexible word matching, allowing for morphological variants and synonyms to be taken into account as legitimate correspondences. Furthermore, the feature ingredients within Meteor are parameterized, allowing for the tuning of the metric’s free parameters in search of values that result in optimal correlation with human judgments. Optimal parameters can be separately tuned for different types of human judgments and for different languages. We discuss the initial design of the Meteor metric, subsequent improvements, and performance in several independent evaluations in recent years.  相似文献   

8.
9.
For a finite alphabet ∑ we define a binary relation on \(2^{\Sigma *} \times 2^{2^{\Sigma ^* } } \) , called balanced immunity. A setB ? ∑* is said to be balancedC-immune (with respect to a classC ? 2Σ* of sets) iff, for all infiniteL εC, $$\mathop {\lim }\limits_{n \to \infty } \left| {L^{ \leqslant n} \cap B} \right|/\left| {L^{ \leqslant n} } \right| = \tfrac{1}{2}$$ Balanced immunity implies bi-immunity and in natural cases randomness. We give a general method to find a balanced immune set'B for any countable classC and prove that, fors(n) =o(t(n)) andt(n) >n, there is aB εSPACE(t(n)), which is balanced immune forSPACE(s(n)), both in the deterministic and nondeterministic case.  相似文献   

10.
Producing and checking proofs from SMT solvers is currently the most feasible method for achieving high confidence in the correctness of solver results. The diversity of solvers and relative complexity of SMT over, say, SAT means that flexibility, as well as performance, is a critical characteristic of a proof-checking solution for SMT. This paper describes such a solution, based on a Logical Framework with Side Conditions (LFSC). We describe the framework and show how it can be applied for flexible proof production and checking for two different SMT solvers, clsat and cvc3. We also report empirical results showing good performance relative to solver execution time.  相似文献   

11.
Uruguay is currently undergoing a gradual process of inclusion of wind energy in its matrix of electric power generation. In this context, a computational tool has been developed to predict the electrical power that will be injected into the grid. The tool is based on the Weather Research and Forecasting (WRF) numerical model, which is the performance bottleneck of the application. For this reason, and in line with several successful efforts of other researchers, this article presents advances in porting the WRF to GPU. In particular, we present the implementation of sintb and bdy_interp1 routines on GPU and the integration of these routines with previous efforts from other authors. The speedup values obtained for the newly ported routines on a Nvidia GeForce GTX 480 GPU are up to \(33.9\times \) when compared with the sequential WRF and \(9.2\times \) when compared with the four-threaded WRF. The integration of the newly ported routines along with previous works produces a reduction of more than a 30 % in the total runtime of the multi-core four-threaded WRF and of more than a 50 % in the single-threaded version.  相似文献   

12.
A caller must satisfy the callee??s precondition??that is, reach a state in which the callee may be called. Preconditions describe the state that needs to be reached, but not how to reach it. We combine static analysis with model checking to mine Fair Computation Tree Logic (CTL F ) formulas that describe the operations a parameter goes through: ??In parseProperties(String xml), the parameter xml normally stems from getProperties().?? Such operational preconditions can be learned from program code, and the code can be checked for their violations. Applied to AspectJ, our Tikanga prototype found 169 violations of operational preconditions, uncovering 7 unique defects and 27 unique code smells??with 52% true positives in the 25% top-ranked violations.  相似文献   

13.
Efficient processing of high-dimensional similarity joins plays an important role for a wide variety of data-driven applications. In this paper, we consider $\varepsilon $ -join variant of the problem. Given two $d$ -dimensional datasets and parameter $\varepsilon $ , the task is to find all pairs of points, one from each dataset that are within $\varepsilon $ distance from each other. We propose a new $\varepsilon $ -join algorithm, called Super-EGO, which belongs the EGO family of join algorithms. The new algorithm gains its advantage by using novel data-driven dimensionality re-ordering technique, developing a new EGO-strategy that more aggressively avoids unnecessary computation, as well as by developing a parallel version of the algorithm. We study the newly proposed Super-EGO algorithm on large real and synthetic datasets. The empirical study demonstrates significant advantage of the proposed solution over the existing state of the art techniques.  相似文献   

14.
15.
We present a novel approach that generalizes the well- known quantum SWAP gate to higher dimensions and construct a regular quantum gate composed entirely in terms of the generalized CNOT gate that cyclically permutes the states of $d$ qudits for $d$ prime. We also investigate the case for $d$ other than prime. A key feature of the construction design relates to the periodicity evaluation for a family of linear recurrences which we achieve by exploiting generating functions and their factorization over the complex reals.  相似文献   

16.
The presentation of constraints in a usable form is an essential aspect of Constraint Logic Programming (CLP) systems. It is needed both in the output of constraints, as well as in the production of an internal representation of constraints for meta-level manipulation. Typically, only a small subset \(\tilde x\) of the variables in constraints is of interest, and so an informal statement of the problem at hand is: given a conjunction \(c(\tilde x,\tilde y)\) of constraints, express the projection \(\exists \tilde y c(\tilde x,\tilde y)\) ofc onto \(\tilde x\) in the simplest form. In this paper, we consider the constraints of the CLP(R) system and describe the essential features of its projection module. One main part focuses on the well-known problem of projection inlinear arithmetic constraints. We start with a classical algorithm and augment it with a procedure for eliminating redundant constraints generated by the algorithm. A second part discusses projection of the other object-level constraints: equations over trees and nonlinear equations. The final part deals with producing a manipulable form of the constraints, which complicates the projection problem.  相似文献   

17.
Linear kernel support vector machines (SVMs) using either $L_{1}$ -norm or $L_{2}$ -norm have emerged as an important and wildly used classification algorithm for many applications such as text chunking, part-of-speech tagging, information retrieval, and dependency parsing. $L_{2}$ -norm SVMs usually provide slightly better accuracy than $L_{1}$ -SVMs in most tasks. However, $L_{2}$ -norm SVMs produce too many near-but-nonzero feature weights that are highly time-consuming when computing nonsignificant weights. In this paper, we present a cutting-weight algorithm to guide the optimization process of the $L_{2}$ -SVMs toward a sparse solution. Before checking the optimality, our method automatically discards a set of near-but-nonzero feature weight. The final objects can then be achieved when the objective function is met by the remaining features and hypothesis. One characteristic of our cutting-weight algorithm is that it requires no changes in the original learning objects. To verify this concept, we conduct the experiments using three well-known benchmarks, i.e., CoNLL-2000 text chunking, SIGHAN-3 Chinese word segmentation, and Chinese word dependency parsing. Our method achieves 1–10 times feature parameter reduction rates in comparison with the original $L_{2}$ -SVMs, slightly better accuracy with a lower training time cost. In terms of run-time efficiency, our method is reasonably faster than the original $L_{2}$ -regularized SVMs. For example, our sparse $L_{2}$ -SVMs is 2.55 times faster than the original $L_{2}$ -SVMs with the same accuracy.  相似文献   

18.
This paper focuses on the iterative design of SofEA , an architecture for distributing evolutionary algorithms (EAs) across computer networks in an asynchronous and decentralized way. SofEA is based on a pool architecture implemented on an object store, allowing the asynchronous interaction with which several clients. The fact that each client is autonomous leads to complex behavior, which will be examined in the work, so that the design can be validated, rules of thumb can be extracted, and the limits of scalability can be found. In this paper we advance the design of an asynchronous, fault-tolerant, and hopefully scalable distributed EA based on the object store CouchDB. We do so by iteratively analyzing running time and average evaluations to solutions on increasingly better versions of the algorithm, looking for the best results, at least from the point of view of running time. By doing so, we increase speed almost fourfold, and also decrement the average number of evaluations to solution in some cases. Experiments have shown also which critical parameters have the bigger influence on the performance in this kind of systems: live population size and number of conflicts, with both being influenced by the number of clients and the size of the population block each client handles at a time. These experiments also show that there is a balance between scalability and fault tolerance, with scalability dropping when a certain number of clients is reached; further clients only increase fault tolerance, at least in the configurations we are using in this paper. The paper also shows that experimentation and measurement conform a good methodology for the design of this kind of asynchronous, heterogeneous and distributed systems, where analytic performance prediction is almost impossible.  相似文献   

19.
Despite a large body of work on XPath query processing in relational environment, systematic study of queries containing not-predicates have received little attention in the literature. Particularly, several xml supports of industrial-strength commercial rdbms fail to efficiently evaluate such queries. In this paper, we present an efficient and novel strategy to evaluate not -twig queries in a tree-unaware relational environment. not -twig queries are XPath queries with ancestor–descendant and parent–child axis and contain one or more not-predicates. We propose a novel Dewey-based encoding scheme called Andes (ANcestor Dewey-based Encoding Scheme), which enables us to efficiently filter out elements satisfying a not-predicate by comparing their ancestor group identifiers. In this approach, a set of elements under the same common ancestor at a specific level in the xml tree is assigned same ancestor group identifier. Based on this scheme, we propose a novel sql translation algorithm for not-twig query evaluation. Experiments carried out confirm that our proposed approach built on top of an off-the-shelf commercial rdbms significantly outperforms state-of-the-art relational and native approaches. We also explore the query plans selected by a commercial relational optimizer to evaluate our translated queries in different input cardinality. Such exploration further validates the performance benefits of Andes.  相似文献   

20.
L. Lopez 《Calcolo》1986,23(3):249-263
In this paper we propose some implicit methods for stiff Volterra integral equations of second kind. The methods are constructed on the integro-differential equation obtained by differentiation of the Volterra equation. The numerical schemes are derived using a class of A-stable and L-stable methods for ordinary differential equations (proposed by Liniger and Willoughby in [3]) associated with the Gregory quadrature formula. Related to the test equation: $$y(t) = 1 + \int_0^t {[\lambda + \mu (t - s)] y(s)ds \lambda , \mu< 0} $$ we give the definition ofA-stability and ?-stability for the proposed numerical methods how natural extension of the A-stability and L-stability for the schemes for solving ordinary differential equations. We show how we have to choose the parameters of the methods in order to obtainA-stability and ?-stability schemes. Because of these properties the proposed schemes are particularly advantageous in the case of stiff Volterra integral equations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号