期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Exploration–exploitation tradeoff using variance estimates in multi-armed bandits

Jean-Yves Audibert Rémi Munos Csaba Szepesvári 《Theoretical computer science》2009

Algorithms based on upper confidence bounds for balancing exploration and exploitation are gaining popularity since they are easy to implement, efficient and effective. This paper considers a variant of the basic algorithm for the stochastic, multi-armed bandit problem that takes into account the empirical variance of the different arms. In earlier experimental works, such algorithms were found to outperform the competing algorithms. We provide the first analysis of the expected regret for such algorithms. As expected, our results show that the algorithm that uses the variance estimates has a major advantage over its alternatives that do not use such estimates provided that the variances of the payoffs of the suboptimal arms are low. We also prove that the regret concentrates only at a polynomial rate. This holds for all the upper confidence bound based algorithms and for all bandit problems except those special ones where with probability one the payoff obtained by pulling the optimal arm is larger than the expected payoff for the second best arm. Hence, although upper confidence bound bandit algorithms achieve logarithmic expected regret rates, they might not be suitable for a risk-averse decision maker. We illustrate some of the results by computer simulations. 相似文献

2.

一种满足差分隐私的图赌博机算法

卢世银王广辉邱梓豪张利军《软件学报》2022,33(9):3223-3235

图赌博机是一种重要的不确定性环境下的序列决策模型, 在社交网络、电子商务和推荐系统等领域都得到了广泛的应用. 目前, 针对图赌博机的工作都只关注如何快速识别最优摇臂从而最小化累积遗憾, 而忽略了在很多应用场景中存在的隐私保护问题. 为了克服现有图赌博机算法的缺陷, 提出了一种满足差分隐私的图赌博机算法GAP (图反馈下的差分隐私摇臂消除策略). 一方面, GAP算法阶段性地根据摇臂的经验平均奖赏更新摇臂选取策略, 并在计算摇臂的经验平均奖赏时引入拉普拉斯噪声, 从而确保恶意攻击者难以根据算法输出推算摇臂奖赏数据, 保护了隐私. 另一方面, GAP算法在每个阶段根据精心构造的反馈图的独立集探索摇臂集合, 有效地利用了图形式的反馈信息. 证明了GAP算法满足差分隐私性质, 具有与理论下界相匹配的遗憾界. 在仿真数据集上的实验结果表明: GAP算法在有效保护隐私的同时取得了与现有无隐私保护的图赌博机算法相当的累积遗憾. 相似文献

3.

On the convergence condition and convergence time of BGP

Huaming Guo 《Computer Communications》2011,34(2):192-199

The Internet composes of thousands of Autonomous System (ASes). The Border Gateway Protocol (BGP) is the standard protocol for sharing inter-domain routing information. Unlike OSPF and IS-IS, BGP allows an AS to use a lot of attributes to express semantic rich routing policies that are consistent with its desired economic, business, performance, and security goals. However, the expressiveness could cause to delay convergence or even divergence in BGP. Recent work do not rigorously analyze the impact of the general routing policies on the convergence condition and convergence time of BGP, especially considering the widely used Multi-Exit Discriminator (MED) attribute. In this paper, we will fill this gap and give the rigorous analysis on the impact of the general routing policies on the convergence condition and convergence time of BGP, including MED attribute. We first introduce a timeless model to represent BGP with the general routing policies including the MED attribute. By incorporating the timeless model we derive a sufficient condition on these general routing policies for robust convergence of BGP. We then extend the timeless model to the real-time model by adding the edge delay. Finally, we find an upper bound on convergence time of BGP by incorporating the real-time model. 相似文献

4.

Pure exploration in finitely-armed and continuous-armed bandits

Sébastien Bubeck Rémi Munos 《Theoretical computer science》2011,412(19):1832-1852

We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of forecasters that perform an on-line exploration of the arms. These forecasters are assessed in terms of their simple regret, a regret notion that captures the fact that exploration is only constrained by the number of available rounds (not necessarily known in advance), in contrast to the case when the cumulative regret is considered and when exploitation needs to be performed at the same time. We believe that this performance criterion is suited to situations when the cost of pulling an arm is expressed in terms of resources rather than rewards. We discuss the links between the simple and the cumulative regret. One of the main results in the case of a finite number of arms is a general lower bound on the simple regret of a forecaster in terms of its cumulative regret: the smaller the latter, the larger the former. Keeping this result in mind, we then exhibit upper bounds on the simple regret of some forecasters. The paper ends with a study devoted to continuous-armed bandit problems; we show that the simple regret can be minimized with respect to a family of probability distributions if and only if the cumulative regret can be minimized for it. Based on this equivalence, we are able to prove that the separable metric spaces are exactly the metric spaces on which these regrets can be minimized with respect to the family of all probability distributions with continuous mean-payoff functions. 相似文献

5.

On Delivery Times in Packet Networks under Adversarial Traffic

Adi Rosen Michael S. Tsirkin 《Theory of Computing Systems》2006,39(6):805-827

We consider packet networks and make use of the "adversarial queuing theory" model [10]. We are interested in the question of guaranteeing that all packets are actually delivered to destination, and of having an upper bound on the delivery times of all packets. Whether this is possible against all adversarial queuing theory rate-1 adversaries was previously posed as an open question [13],[10]. Among other things, we give a queuing policy that guarantees bounded delivery time whenever the rate-1 adversary injects a sequence of packets for which there exists a schedule with a finite upper bound on the delivery times of all packets, and adheres to certain additional conditions. On the negative side we show that there exist rate-1 sequences of packets for which there is no schedule with a finite upper bound on the delivery times of all packets. We thus answer an open question posed by Gamarnik [13]. We further show that delivering all packets while maintaining stability (we coin the term "reliability" for this property) can be done by an offline scheduler whenever the injection of packets is done at rate of at most 1. However, on the other hand, we also show that there is no online protocol (even centralized) that can achieve that property against all rate-1 adversaries. We thus answer an open question of Borodin et al. [10]. 相似文献

6.

Bounds on the optimal quantization performance of dynamically quantized linear systems with bounded noise

Qiang Ling Huihui Gu Hai Lin Yu Kang 《Asian journal of control》2012,14(2):538-547

This paper studies a class of quantized linear control systems with diagonalizable system matrices and perturbed by bounded noise. The quantization performance of such systems is measured with the supremum of the quantization error sequence. Our goal is to improve this performance (reduce the quantization error) through designing appropriate quantization policies. Due to their efficiency, the dynamic quantization policies are considered in this paper. A lower bound on the optimal (minimum) quantization error is provided. We also propose a new quantization policy, whose quantization error is an upper bound on the optimal one. A more tractable upper quantization error bound is derived from the new policy. It is shown through simulations that the new policy's quantization error is very close to the lower bound, which confirms both the tightness of the lower bound and the efficiency of the new policy. The achieved lower and upper quantization error bounds, together with the quantization error of the new quantization policy, may provide a good indication on the optimal quantization performance. Copyright © 2011 John Wiley and Sons Asia Pte Ltd and Chinese Automatic Control Society 相似文献

7.

Implementations of a New Theorem for Computing Bounds for Positive Roots of Polynomials

A. Akritas A. Strzeboński P. Vigklas 《Computing》2006,78(4):355-367

Finding an upper bound for the positive roots of univariate polynomials is an important step of the continued fractions real root isolation algorithm. The revived interest in this algorithm has highlighted the need for better estimations of upper bounds of positive roots. In this paper we present a new theorem, based on a generalization of a theorem by D. Stefanescu, and describe several implementations of it – including Cauchy's and Kioustelidis' rules as well as two new rules recently developed by us. From the empirical results presented here we see that applying various implementations of our theorem – and taking the minimum of the computed values – greatly improves the estimation of the upper bound and hopefully that will affect the performance of the continued fractions real root isolation method. 相似文献

8.

时变系统有限数据窗最小二乘辨识的有界收敛性 总被引：8，自引：0，他引：8

丁锋丁韬萧德云杨家本《自动化学报》2002,28(5):754-761

利用随机过程理论证明了有限数据窗最小二乘法的有界收敛性,给出了参数估计误差上界的计算公式,阐述了获得最小均方参数估计误差上界时数据窗长度的选择方法.分析表明, 对于时不变随机系统,数据窗长度越大,均方参数估计误差上界越小;对于确定性时变系统,数据窗长度越小,均方参数估计误差上界越小.因此,对于时变随机系统,一个折中方案是寻求一个最佳数据窗长度,以使均方参数估计误差最小.该文的研究成果对于提高辨识算法的实际应用效果有重要意义. 相似文献

9.

Should the two-headed disk be greedy? - Yes,it should

Micha Hofri 《Information Processing Letters》1983,16(2):83-85

We consider a disk with two movable arms, each of which can access all of its surface. Read/write requests arrive singly and are treated as they arrive (no queue is formed). The arms can move concurrently, but not transmit simultaneously. We show that the policy that minimizes the mean seek time is to move to the required address the arm that is closest to it, while the other arm jockeys for optimal anticipatory position. This remains true when successive addresses are dependent and/or have different distributions. 相似文献

10.

A theory of learning from different domains 总被引：1，自引：0，他引：1

Shai Ben-David John Blitzer Koby Crammer Alex Kulesza Fernando Pereira Jennifer Wortman Vaughan 《Machine Learning》2010,79(1-2):151-175

Discriminative learning methods for classification perform well when training and test data are drawn from the same distribution. Often, however, we have plentiful labeled training data from a source domain but wish to learn a classifier which performs well on a target domain with a different distribution and little or no labeled training data. In this work we investigate two questions. First, under what conditions can a classifier trained from source data be expected to perform well on target data? Second, given a small amount of labeled target data, how should we combine it during training with the large amount of labeled source data to achieve the lowest target error at test time? We address the first question by bounding a classifier’s target error in terms of its source error and the divergence between the two domains. We give a classifier-induced divergence measure that can be estimated from finite, unlabeled samples from the domains. Under the assumption that there exists some hypothesis that performs well in both domains, we show that this quantity together with the empirical source error characterize the target error of a source-trained classifier. We answer the second question by bounding the target error of a model which minimizes a convex combination of the empirical source and target errors. Previous theoretical work has considered minimizing just the source error, just the target error, or weighting instances from the two domains equally. We show how to choose the optimal combination of source and target error as a function of the divergence, the sample sizes of both domains, and the complexity of the hypothesis class. The resulting bound generalizes the previously studied cases and is always at least as tight as a bound which considers minimizing only the target error or an equal weighting of source and target errors. 相似文献

11.

Alternative interest measures for mining associations in databases 总被引：9，自引：0，他引：9

Omiecinski E.R. 《Knowledge and Data Engineering, IEEE Transactions on》2003,15(1):57-69

Data mining is defined as the process of discovering significant and potentially useful patterns in large volumes of data. Discovering associations between items in a large database is one such data mining activity. In finding associations, support is used as an indicator as to whether an association is interesting. In this paper, we discuss three alternative interest measures for associations: any-confidence, all-confidence, and bond. We prove that the important downward closure property applies to both all-confidence and bond. We show that downward closure does not hold for any-confidence. We also prove that, if associations have a minimum all-confidence or minimum bond, then those associations will have a given lower bound on their minimum support and the rules produced from those associations will have a given lower bound on their minimum confidence as well. However, associations that have that minimum support (and likewise their rules that have minimum confidence) may not satisfy the minimum all-confidence or minimum bond constraint. We describe the algorithms that efficiently find all associations with a minimum all-confidence or minimum bond and present some experimental results. 相似文献

12.

双裁切近端策略优化算法

下载免费PDF全文

张骏王红成《计算机系统应用》2023,32(4):177-186

近端策略优化(proximal policy optimization, PPO)是一种稳定的深度强化学习算法,该算法的关键点之一是使用裁切后的代理目标限制更新步长.实验发现当使用经验最优的裁切系数时, KL散度(Kullback-Leibler divergence)无法被确立上界,这有悖于置信域优化理论.本文提出一种改进的双裁切近端策略优化算法(proximal policy optimization with double clipping boundaries, PPO-DC).该算法通过基于概率的两段裁切边界调整KL散度,将参数限制在置信域内,以保证样本数据得到充分利用.在多个连续控制任务中, PPO-DC算法取得了好于其他算法的性能. 相似文献

13.

基于分级规划策略的拟人机械臂仿人运动规划算法研究

下载免费PDF全文

王春荣夏尔冬赵京熊昌炯刘建军卫沅《图学学报》2018,39(3):553

为使拟人机械臂具有高精度的仿人运动,提出一种通过触发条件和分级规划策略的仿人运动新方法。将人臂运动过程离散为不同运动阶段,在每一个运动阶段都有与之对应的规划层,在不同的规划层中,拟人机械臂的运动特点不同。利用各自的特点建立不同规划层下的运动模型及臂姿预测指标,对拟人机械臂臂姿进行预测。最后,以NAO 机器人为实验平台, 比较所提方法与最小势能法(MTPE)的静态臂姿与动态臂姿预测,并与运动捕捉系统(OptiTrack) 采集的真实人臂运动数据进行比较。实验表明,该方法具有较小的静态臂姿和动态臂姿预测误差,能使拟人机械臂产生高度逼真的仿人运动。相似文献

14.

A golden ratio control policy for a multiple-access channel

Itai A. Rosberg Z. 《Automatic Control, IEEE Transactions on》1984,29(8):712-718

Considernstations sharing a single communications channel. Each station has a buffer of length one. If the arrival rate of stationiis ri, then1-Pi_{i}(1- r_{i})is shown to be an upper bound (over all policies) on the throughput of the channel. Moreover, an optimal policy always exists and is stationary and periodic. The throughput of two policies, the random-policy, and the golden-ratio policy, are analyzed for a finite and infinite number of stations. The latter is shown to approach a limit which is within at least 98.4 percent of the upper bound. 相似文献

15.

A generalisation of the Cantor-Zassenhaus algorithm

C. Hidber 《Computing》1997,59(4):325-330

We generalise the Cantor-Zassenhaus algorithm for factoring polynomials over finite fields. The generalisation yields a class of factorisation algorithms. We compute their factorisation probability and their least upper bound. We then give a simple characterisation of the algorithms reaching the least upper bound. As an example we show the Cantor-Zassenhaus and the Ben-Or algorithms have a factorisation probability equal to the least upper bound. 相似文献

16.

Routing and rate allocation in rate-based multi-class networks

Aniruddha Joy 《Performance Evaluation》2005,59(4):337-366

In this paper, we investigate the performance of routing and rate allocation (RRA) algorithms in rate-based multi-class networks. On the arrival of a connection request, an RRA algorithm selects a route for the connection and allocates an appropriate rate on the route; failing this, it blocks the connection request. We measure the performance of an RRA algorithm in terms of its minimum weighted carried traffic. This performance criterion encompasses two widely used performance criteria, namely, weighted carried traffic and minimum carried traffic. We derive an upper bound on the minimum weighted carried traffic of any RRA algorithm. The bound can be computed by solving a linear program. Moreover, we show that the bound is achieved asymptotically, when the offered load and the link capacities are large, by a Partitioning RRA algorithm. Therefore the bound can be used as a performance benchmark for any RRA algorithm. We observe that the proposed Partitioning RRA algorithm, though asymptotically optimal, performs poorly at very low loads. We investigate the cause of this undesirable behaviour and obtain two improved asymptotically optimal RRA algorithms. 相似文献

17.

Evaluation of a dynamic arm support for seated and standing tasks: a laboratory study of electromyography and subjective feedback

Odell D Barr A Goldberg R Chung J Rempel D 《Ergonomics》2007,50(4):520-535

The goal of this study was to determine whether a new dynamic arm support system reduced shoulder and arm muscle load for seated and standing hand/ arm tasks. The new system provides support for both horizontal and vertical arm motion. A total of 11 participants performed ten tasks (five seated and five standing) both with and without the arm support. Outcomes were assessed with electromyography and subjective feedback. Muscle activity was measured over the dominant side supraspinatus, triceps and forearm extensor muscles. Significant (p < 0.01) reductions in static muscle activity were observed in one of ten tasks performed with the support device for the supraspinatus muscle, in five tasks for the triceps and in one task for forearm extensor muscles. Likewise, a significant improvement in subjective measures was reported with the support device for 'ease of task' for two of ten tasks, for 'forearm comfort' for three of ten tasks and for 'shoulder effort' for six of ten tasks. The results suggest that a dynamic forearm support may improve subjective comfort and reduce static muscle loads in the upper extremity for tasks that involve horizontal movement of the arms. For rapid motions, the value of the support is limited due to internal inertia and friction. 相似文献

18.

Hairdressers’ shoulder load when blow-drying – Studying the effect of a new blow dryer design on arm inclination angle and muscle pain

《International Journal of Industrial Ergonomics》2019

During nine months, nineteen hairdressers every second/third month switched the use between a blow dryer with traditional design and one with a new design. The new blow dryer had the possibility to change between two opposite directed air flows. Every second/third month arm inclination angle and upper trapezius muscle activity were measured a whole workday, and during blow-drying in the laboratory. Pronounced upper arm elevation was reduced with the new blow dryer. The muscle activity of the upper trapezius was only reduced in the laboratory, and daily pain reports were not significantly influenced at all. The subjective rating of time use, functionality and heaviness was less favourable for the new blow dryer, with only three out of nineteen preferring the new dryer at the end of the study period. However, the design of the new dryer demanded a change of work technique that might have been conceived as problematic by the experienced hairdressers.Relevance to industryWe studied a new professional handheld blow dryer designed to allow less postures with elevated arms, addressing an import risk factor for work-related musculoskeletal problems. Hairdressers using this new dryer had less time with upper arm in pronounced elevation during blow-drying. 相似文献

19.

Asymptotic buffer overflow probabilities in multiclassmultiplexers: an optimal control approach

Bertsimas D. Paschalidis I.C. Tsitsiklis J.N. 《Automatic Control, IEEE Transactions on》1998,43(3):315-335

We consider a multiclass multiplexer with support for multiple service classes and dedicated buffers for each service class. Under specific scheduling policies for sharing bandwidth among these classes, we seek the asymptotic (as the buffer size goes to infinity) tail of the buffer overflow probability for each dedicated buffer. We assume dependent arrival and service processes as is usually the case in models of bursty traffic. In the standard large deviations methodology, we provide a lower and a matching (up to first degree in the exponent) upper bound on the buffer overflow probabilities. We introduce a novel optimal control approach to address these problems. In particular, we relate the lower bound derivation to a deterministic optimal control problem, which we explicitly solve. Optimal state trajectories of the control problem correspond to typical congestion scenarios. We explicitly and in detail characterize the most likely modes of overflow. We specialize our results to the generalized processor sharing policy (GPS) and the generalized longest queue first policy (GLQF). The performance of strict priority policies is obtained as a corollary. We compare the GPS and GLQF policies and conclude that GLQF achieves smaller overflow probabilities than GPS for all arrival and service processes for which our analysis holds. Our results have important implications for traffic management of high-speed networks and can be used as a basis for an admission control mechanism which guarantees a different loss probability for each class 相似文献

20.

Minimization of rounding errors algebraically by use of coding theory

《国际计算机数学杂志》2012,89(2):135-138

In this paper we consider the floating-point sum of positive terms. Among the algebraically equivalent schemes we determine the ones which lead to the minimum of the upper bound of the rounding errors. We also provide lower and upper bounds for this minimum by relating this problem to the coding theory. 相似文献