期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

On the optimal number of features in the classification of multivariate Gaussian data

A.K. Jain W.G. Waller 《Pattern recognition》1978,10(5-6)

We consider the classification problem in the context of two equiprobable multivariate Gaussian densities with a common covariance matrix. The use of Anderson's W statistic for classification results in the existence of an optimum number of features, p_opt such that, for a given sample size, the average probability of misclassification decreases at first as the number of features are increased, attains a minimum at p_opt and then starts increasing. We have examined this peaking phenomenon for several cases and provide expressions which relate p_opt to the number of available training samples and the Mahalanobis distance between the two populations. We also show that to prevent peaking, each additional feature's contribution to the Mahalanobis distance must be a certain proportion of the accumulated Mahalanobis distance. 相似文献

2.

Analysis of a simple factorization algorithm

Donald E. Knuth Luis Trabb Pardo 《Theoretical computer science》1976,3(3):321-348

The probability that the k^th largest prime factor of a number n is at most n^x is shown to approach a limit F_k(x) as n → ∞. Several interesting properties of F_k(x) are explored, and numerical tables are given. These results are applied to the analysis of an algorithm commonly used to find all prime factors of a given number. The average number of digits in the k^th largest prime factor of a random m-digit number is shown to be asymptotically equivalent to the average length of the k^th longest cycle in a permutation on m objects. 相似文献

3.

Coupling from the past with randomized quasi-Monte Carlo

P. L’Ecuyer C. Sanvido 《Mathematics and computers in simulation》2010,81(3):476-489

The coupling-from-the-past (CFTP) algorithm of Propp and Wilson permits one to sample exactly from the stationary distribution of an ergodic Markov chain. By using it n times independently, we obtain an independent sample from that distribution. A more representative sample can be obtained by creating negative dependence between these n replicates; other authors have already proposed to do this via antithetic variates, Latin hypercube sampling, and randomized quasi-Monte Carlo (RQMC). We study a new, often more effective, way of combining CFTP with RQMC, based on the array-RQMC algorithm. We provide numerical illustrations for Markov chains with both finite and continuous state spaces, and compare with the RQMC combinations proposed earlier. 相似文献

4.

The zero-crossing interval statistics of the smoothed random telegraph signal

John T. Rickard 《Information Sciences》1977,13(3):253-268

This paper presents explicit expressions for the zero-crossing interval density P_o(τ) of the smoothed random telegraph signal (SRTS). p_o(τ) is given by an infinite series involving the two parameters of the process. When one of these parameters assumes integer values, the series reduces to a finite summation. From this result, the probability density P_n(τ) of the sum of n + 1 successive zero-crossing intervals and the discrete probability p(n,τ) of precisely n zero crossings in an arbitrary interval of length τ may be computed. Examples are given for p₁(τ) and p(0,τ). To our knowledge, these results represent the most complete solution to date for the zero crossing statistics of any nontrivial random process. 相似文献

5.

Prediction intervals for regression models

David J. Olive 《Computational statistics & data analysis》2007,51(6):3115-3122

This paper presents simple large sample prediction intervals for a future response Y_f given a vector x_f of predictors when the regression model has the form Y_i=m(x_i)+e_i where m is a function of x_i and the errors e_i are iid. Intervals with correct asymptotic coverage and shortest asymptotic length can be made by applying the shorth estimator to the residuals. Since residuals underestimate the errors, finite sample correction factors are needed.As an application, three prediction intervals are given for the least squares multiple linear regression model. The asymptotic coverage and length of these intervals and the classical estimator are derived. The new intervals are useful since the distribution of the errors does not need to be known, and simulations suggest that the large sample theory often provides good approximations for moderate sample sizes. 相似文献

6.

A random riemann zeta function and intersymbol interference

Richard Barakat 《Information Sciences》1978,14(3):189-197

The probability density function (PDF) of z = Σa_nn^?α with n = 1,2,…∞ is considered where {a_n} are independent, identically distributed, zero mean random variables with equally likely values of ± 1. For

12 < α ? 1

the series diverges; for 1 < α∞ the series converges. For the second situation, the associated characteristic function is bandlimited and we employ a sampling expansion to evaluate the PDF. Both cumulants and moments of z are evaluated. The error probability associated with the sequence is glen considered. 相似文献

7.

An approach to the evaluation of the performance of a discrete classifier

《Pattern recognition letters》2002,23(1-3):227-233

The problem studied is the behavior of a discrete classifier on a finite learning sample. With naive Bayes approach, the value of misclassification probability is represented as a random function, for which the first two moments are analytically derived. For arbitrary distributions, this allows evaluating learning sample size sufficient for the classification with given admissible misclassification probability and confidence level. The comparison with statistical learning theory shows that the suggested approach frequently recommends significantly smaller learning sample size. 相似文献

8.

Data-unit-size distribution model when message segmentations occur

Takashi IkegawaAuthor Vitae Yasuhito KishiAuthor Vitae Yukio TakahashiAuthor Vitae 《Performance Evaluation》2012,69(1):1-16

This paper proposes a data-unit-size distribution model to represent the message segmentation function implemented in many protocols, such as TCP and RLC, that allows a sender to divide a message larger than the payload size ?_d into multiple packets. To develop a Markov chain for a segmented packet size sequence, we introduce an auxiliary random variable representing two packet types: body and edge packets. The body packet is defined as a segmented packet appearing between the head and penultimate packets in the original message. If a message is segmented, the edge packet is defined as the final segmented packet. If not, it is identified with the original message. The sizes of body packets are equal to ?_d, whereas those of edge packets are variable, not to exceed ?_d. Using the Markov chain, we derive analytical forms of the occurrence probability of edge packets, as well as the distribution, mean and variance of packet sizes in the steady state. The key findings from the numerical results based on traffic measurement examples include the following. (1) When Web objects embedded in static Web pages that have a long-tailed size property are transferred using TCP, the occurrence probability of edge packets is not negligible in the case of commonly used values of ?_d, such as 1460 and 2272 bytes. (2) When IP messages are transferred using RLC protocol, the occurrence probability of edge packets is small because the payload size ?_d is very small. 相似文献

9.

Universal recursively enumerable sets of strings

Cristian S. Calude Frank Stephan 《Theoretical computer science》2011,412(22):2253-2261

相似文献

10.

Randomized Priority Queues for Fast Parallel Access

《Journal of Parallel and Distributed Computing》1998,49(1):86-97

We present simple randomized algorithms for parallel priority queues on distributed memory machines. Inserting O(n) elements or deleting the O(n) out ofmsmallest elements usingnprocessors requires O(T_coll+ log(m/n)) amortized time with high probability whereT_collbounds the time for performing prefix sums and randomized routing. The memory requirement is bounded by (m/n)(1 +o(1)) + O(logn) whp. These bounds are an improvement over the best previously known algorithms for many interconnection networks and even matches the speed of the best known PRAM algorithms. Generalizations for accessing thek⪢nsmallest elements are even more efficient. A portable implementation using MPI demonstrates that our approach is already useful for medium scale parallelism. Two parallel selection algorithms for randomly placed data are a spin-off. One runs in time O(T_coll) with high probability, beating a lower bound for the worst case. The other requires only a single reduction operation. 相似文献

11.

Almost sure convergence of classification procedures using Hermite series density estimates

Włodzimierz Greblicki Mirosław Pawlak 《Pattern recognition letters》1983,2(1):13-17

A multidimensional classification procedure is examined derived from the multiple Hermite series estimate of probability density functions. Conditions for the almost sure convergence of the integrated square error for the estimate are presented and the rate of the convergence is studied. The probability of misclassification, conditioned on a learning sequence of length n, is shown to converge to the Bayes risk almost surely as rapidly as

O (n^{?12+δ})

, δ positive. 相似文献

12.

On Variants of the Matroid Secretary Problem

Shayan Oveis Gharan Jan Vondrák 《Algorithmica》2013,67(4):472-497

We present a number of positive and negative results for variants of the matroid secretary problem. Most notably, we design a constant-factor competitive algorithm for the “random assignment” model where the weights are assigned randomly to the elements of a matroid, and then the elements arrive on-line in an adversarial order (extending a result of Soto, SODA 2011, pp. 1275–1284, 2011). This is under the assumption that the matroid is known in advance. If the matroid is unknown in advance, we present an O(logrlogn)-approximation, and prove that a better than O(logn/loglogn) approximation is impossible. This resolves an open question posed by Babaioff et al. (SODA 2007, pp. 434–443, 2007). As a natural special case, we also consider the classical secretary problem where the number of candidates n is unknown in advance. If n is chosen by an adversary from {1,…,N}, we provide a nearly tight answer, by providing an algorithm that chooses the best candidate with probability at least 1/(H _N?1+1) and prove that a probability better than 1/H _N cannot be achieved (where H _N is the N-th harmonic number). 相似文献

13.

Randomized parallel list ranking for distributed memory multiprocessors

Frank Dehne Siang W. Song 《International journal of parallel programming》1997,25(1):1-16

We present a randomized parallel list ranking algorithm for distributed memory multiprocessors, using a BSP type model. We first describe a simple version which requires, with high probability, log(3p)+log ln(n)=Õ(logp+log logn) communication rounds (h-relations withh=Õ(n/p)) andÕ(n/p)) local computation. We then outline an improved version that requires high probability, onlyr?(4k+6) log(2/3p)+8=Õ(k logp) communication rounds wherek=min{i?0 |ln(i+1)n?(2/3p)²ⁱ⁺¹}. Notekn) is an extremely small number. Forn andp?4, the value ofk is at most 2. Hence, for a given number of processors,p, the number of communication rounds required is, for all practical purposes, independent ofn. Forn?1, 500,000 and 4?p?2048, the number of communication rounds in our algorithm is bounded, with high probability, by 78, but the actual number of communication rounds observed so far is 25 in the worst case. Forn?10010100 and 4?p?2048, the number of communication rounds in our algorithm is bounded, with high probability, by 118; and we conjecture that the actual number of communication rounds required will not exceed 50. Our algorithm has a considerably smaller member of communication rounds than the list ranking algorithm used in Reid-Miller’s empirical study of parallel list ranking on the Cray C-90.⁽¹⁾ To our knowledge, Reid-Miller’s algorithm⁽¹⁾ was the fastest list ranking implementation so far. Therefore, we expect that our result will have considerable practical relevance. 相似文献

14.

The number of registers required for evaluating arithmetic expressions

P. Flajolet J.C. Raoult J. Vuillemin 《Theoretical computer science》1979,9(1):99-125

We study the number of registers required for evaluating arithmetic expressions. This parameter of binary trees appears in various computer science problems as well as in numerous natural sciences applications where it is known as the Strahler number.We give several enumeration results describing the distribution of the number of registers for trees of size n. The average number of registers has the asymptotic expansion log₄n + D(log₄n) + 0(1); here, function D is periodic of period one, and its Fourier expansion can be explicitly determined in terms of Riemann's zeta function and Euler's gamma function. 相似文献

15.

Relations for moments of progressively Type-II censored order statistics from half-logistic distribution with applications to inference

N. Balakrishnan H.M. Saleh 《Computational statistics & data analysis》2011,55(10):2775-2792

In this paper, we establish several recurrence relations for the single and product moments of progressively Type-II right censored order statistics from a half-logistic distribution. The use of these relations in a systematic recursive manner would enable one to compute all the means, variances and covariances of progressively Type-II right censored order statistics from the half-logistic distribution for all sample sizes n, effective sample sizes m, and all progressive censoring schemes (R₁,…,R_m). The results established here generalize the corresponding results for the usual order statistics due to Balakrishnan (1985). These moments are then utilized to derive best linear unbiased estimators of the scale and location-scale parameters of the half-logistic distribution. A comparison of these estimators with the maximum likelihood estimates is then made. The best linear unbiased predictors of censored failure times is then discussed briefly. Finally, two numerical examples are presented to illustrate all the inferential methods developed here. 相似文献

16.

Adaptive Drift Analysis

Benjamin Doerr Leslie Ann Goldberg 《Algorithmica》2013,65(1):224-250

We show that, for any c>0, the (1+1) evolutionary algorithm using an arbitrary mutation rate p _n=c/n finds the optimum of a linear objective function over bit strings of length n in expected time Θ(nlogn). Previously, this was only known for c≤1. Since previous work also shows that universal drift functions cannot exist for c larger than a certain constant, we instead define drift functions which depend crucially on the relevant objective functions (and also on c itself). Using these carefully-constructed drift functions, we prove that the expected optimisation time is Θ(nlogn). By giving an alternative proof of the multiplicative drift theorem, we also show that our optimisation-time bound holds with high probability. 相似文献

17.

Substar Reliability Analysis in Star Networks

Xiaolong Wu 《Information Sciences》2008,178(10):2337-2348

In this paper, we derive an upper bound on the (n − 1)-star reliability in an S_n using the probability fault model. Approximate (n − 1)-star reliability results are also obtained using the fixed partitioning. The numerical results show that the (n − 1)-star reliabilities under the probability fault model and the fixed partitioning are in good agreement especially for the low value of the node reliability. The numerical results are also shown to be consistent with and close to the simulation results. Conservative comparisons are made where possible between the reliability of similar size star graphs and hypercubes. 相似文献

18.

Stochastic complement analysis of multi-server threshold queues with hysteresis

《Performance Evaluation》1999,35(1-2):19-48

We consider a K-server threshold-based queueing system with hysteresis, for which a set of forward thresholds (F₁,F₂,…,F_K−1) and a set of reverse thresholds (R₁,R₂,…,R_K−1) are defined. A simple version of this multi-server queueing system behaves as follows. When a customer arrives to an empty system, it is serviced by a single server. Whenever the number of customers exceeds a forward threshold F_i, a server is added to the system and server activation is instantaneous. Whenever the number of customer falls below a reverse threshold R_i, a server is removed from the system. We consider and solve several variations of this problem, namely: (1) homogeneous servers with Poisson arrivals, (2) homogeneous servers with bulk (Poisson) arrivals, and (3) heterogeneous servers with Poisson arrivals. We place no restrictions on the number of servers or the bulk sizes or the size of the waiting room. In [O.C. Ibe, J. Keilson, Multi-server threshold queues with hysteresis, Performance Evaluation 21 (1995) 185–212], the authors solve a limited form of this problem using Green’s function method. More specifically, they give a closed-form solution for a K-server system, when the servers are homogeneous, and for a two-server system, when the servers are heterogeneous; the authors experienced difficulties in extending Green’s function method beyond the case of two heterogeneous servers. Rather than using Green’s function, we solve this problem using the concept of stochastic complementation, which is a more intuitive and more easily extensible method. For the case of a homogeneous multi-server system we are able to derive a closed-form solution for the steady state probability vector; for the remaining cases we give an algorithmic solution. Note, however, that we can use stochastic complementation to derive closed-form solutions for some limited forms of cases (2) and (3), such as heterogeneous servers with K=2 and bulk arrivals with a limited bulk size. Finally, our technique works both for systems with finite and infinite waiting rooms. 相似文献

19.

Exact Sampling from Perfect Matchings of Dense Regular Bipartite Graphs

Mark Huber 《Algorithmica》2006,44(3):183-193

We present the first algorithm for generating random variates exactly uniformly from the set of perfect matchings of a bipartite graph with a polynomial expected running time over a nontrivial set of graphs. Previous Markov chain results obtain approximately uniform variates for arbitrary graphs in polynomial time, but their general running time is Θ(n¹⁰ (ln n)²). Other algorithms (such as Kasteleyn's O(n³) algorithm for planar graphs) concentrated on restricted versions of the problem. Our algorithm employs acceptance/rejection together with a new upper limit on the permanent of a form similar to Bregman's theorem. For graphs with 2n nodes, where the degree of every node is γn for a constant γ, the expected running time is O(n^{1.5 + .5/γ}). Under these conditions, Jerrum and Sinclair showed that a Markov chain of Broder can generate approximately uniform variates in Θ(n^{4.5 + .5/γ} ln n) time, making our algorithm significantly faster on this class of graphs. The problem of counting the number of perfect matchings in these types of graphs is # P complete. In addition, we give a 1 + σ approximation algorithm for finding the permanent of 0–1 matrices with identical row and column sums that runs in O(n^{1.5 + .5/γ} (1/σ²) log (1/δ))$, where the probability that the output is within 1 + \sigma$ of the permanent is at least 1 – δ. 相似文献

20.

Effect of imbalance and intracluster correlation coefficient in cluster randomized trials with binary outcomes

Chul Ahn Fan Hu 《Computational statistics & data analysis》2009,53(3):596-602

Cluster randomization trials are increasingly popular among healthcare researchers. Intact groups (called ‘clusters’) of subjects are randomized to receive different interventions, and all subjects within a cluster receive the same intervention. In cluster randomized trials, a cluster is the unit of randomization, and a subject is the unit of analysis. Variation in cluster sizes can affect the sample size estimate or the power of the study. [Guittet, L., Ravaud, P., Giraudeau, B., 2006. Planning a cluster randomized trial with unequal cluster sizes: Practical issues involving continuous outcomes. BMC Medical Research Methodology 6 (17), 1-15] investigated the impact of an imbalance in cluster size on the power of trials with continuous outcomes through simulations. In this paper, we examine the impact of cluster size variation and intracluster correlation on the power of the study for binary outcomes through simulations. Because the sample size formula for cluster randomization trials is based on a large sample approximation, we evaluate the performance of the sample size formula with small sample sizes through simulation. Simulation study findings show that the sample size formula (m_p) accounting for unequal cluster sizes yields empirical powers closer to the nominal power than the sample size formula (m_a) for the average cluster size method. The differences in sample size estimates and empirical powers between m_a and m_p get smaller as the imbalance in cluster sizes gets smaller. 相似文献