期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

《微型机与应用》2008,27(5):4

改革开放以来,中国经济的高速发展引起全世界的高度关注,也一路伴随着西方发达国家对逆差、汇率、质量、环保、人权等方面的不断施压.而近来,随着部分别有用心之人利用火炬接力把北京奥运肆无忌惮地政治化,使双方的情绪也达到了对立新高潮. 相似文献

2.

韩宁《网络与信息》2008,(6):79-79

传统的营销方式在网络时代已经不再占有优势,网络时代的营销也不仅仅是做个网站来展示自己的商品,而是通过人性化的服务来进一步推销自己的产品.传统意义上的市场营销,是指个人和群体通过创造并同他人交换产品和价值,以满足需求和欲望的一种社会和管理过程. 相似文献

3.

三地路演,带来一路震撼——记2007年世冠AMESim车辆行业技术研讨会

侯琳《CAD/CAM与制造业信息化》2007,(5):56-57

在路演上海专场,笔者有幸与IMAGINE市场及销售总监Olivier DATRY、特种热动力应用事业部经理Cedric ROMAN、区域技术经理David MARAND及世冠公司总经理李京燕女士进行了深层次的交流. 相似文献

4.

本地化,服务于中国用户——SolidWorks"老兵"写"新传"

黎艳《CAD/CAM与制造业信息化》2007,(10):10-11

作为SolidWorks在中国华南的增值软件经销商,智诚科技(ICT)公司一直以推动整个制造韭技术发展为己任,以先进的设诗理念引导中国的制造业紧跟世界最新潮流为目标. 相似文献

5.

A hybrid time-delay prediction method for networked control system

Zhong-Da Tian Xian-Wen Gao Kun Li 《国际自动化与计算杂志》2014,11(1):19-24

This paper presents an Ethernet based hybrid method for predicting random time-delay in the networked control system.First,db3 wavelet is used to decompose and reconstruct time-delay sequence,and the approximation component and detail components of time-delay sequences are fgured out.Next,one step prediction of time-delay is obtained through echo state network(ESN)model and auto-regressive integrated moving average model(ARIMA)according to the diferent characteristics of approximate component and detail components.Then,the fnal predictive value of time-delay is obtained by summation.Meanwhile,the parameters of echo state network is optimized by genetic algorithm.The simulation results indicate that higher accuracy can be achieved through this prediction method. 相似文献

6.

多起典型网络安全事件被成功处理——CNCERT/CC2007年网络安全事件处理部分案例介绍

国家计算机网络应急技术处理协调中心《信息网络安全》2008,(7)

恶意代码事件处理"Nimaya(熊猫烧香)"病毒事件处理"Nimaya(熊猫烧香)"病毒在2007年初出现流行趋势。该病毒具有感染、传播、网络更新、发起分布式拒绝服务攻击(DDoS)等功能。相似文献

7.

知识资本:中国参与全球创新网络的关键

Tony Affuso 《CAD/CAM与制造业信息化》2006,(9):12-13

对于当前世界发达经济实体中的领袖而言,中国到底是威胁还是机遇?在全球商业、工业、政府和学术观察人士中间,这一争论仍在延续.中国在全球工业中的显著作用已经无庸讳言.今天,中国制造了世界上50%的照相机、30%的空调和电视机、25%的洗衣机及接近23%的冰箱.制造业占中国GDP的53%,出口额的90%,进口额的85%以及吸引投资额的70%.中国是世界上(继美国、日本和德国之后)的第四大生产国,并正以惊人的增长速度向前发展. 相似文献

8.

整合江湖谁称霸 AMD 780G对决NVIDIA MCP78

现代计算机评测室《现代计算机》2008,(5)

AMD 780G 作为超威派7系列整合芯片组掌门人,AMD 780G算得上乱世英雄,因为它是首款正式上市的DirectX 10整合主板,引领着整合江湖进入到另一个新"视"界.这款芯片组全面支持新一代K10处理器,支持Hyper Transport 3.0总线与PCI-E2.0显卡插槽,并且内建了性能等效于Radeon HD2400 PRO的显示核心,从而全面支持UVD高清解码引擎.另外,这款整合芯片组还特别对混合交火技术提供支持,从而让AMD 780G这位独行侠行走江湖一段时间后,便获得了众多派别的鼎立相助,江湖地位日渐攀升. 相似文献

9.

单位局域网中ARP攻击的故障处理

朱江《计算机安全》2008,(5):94-95

1故障现象公安部上海消防研究所11台接入层交换机通过一台交换机和路由器进行级联,接入层交换机类型为ZXR102826(以下简称2826)。设备自开通两年以来一直运行正常,用户上网和下载上传均正常,但是在某个时刻该11台交换机上的40%用户无法上网,表现为打开网页出现"Internet Explorer不能链接到您请求的网页。此页可能暂时不可用。",但其余用户能正常上网,但表现为上网速度大幅度下降。相似文献

10.

中国Linux软件产业生机勃勃2008年Q1中国Linux软件市场销售额同比增长22.6%

汪涵《数码世界》2008,7(11)

根据赛迪顾问发布的<2008年第一季度中国Linux软件市场分析报告>数据显示,2008年一季度,中国Linux软件市场销售额达3800万元,较去年同期增长22.6%,在整个操作系统市场中继续保持快速增长势头.与Unix在整个非嵌入式操作系统中比重下滑相比,Linux则小幅上升0.2个百分点,显示出Linux产业的勃勃生机.这既得益于国家宏观经济的良好走势,同时也与近年来Linux市场产业链生态环境的改善密切相关. 相似文献

11.

在加强型学习系统中用伪熵进行不确定性估计

张平斯特凡·卡纽《控制理论与应用》1998,15(1):100-104

加强型学习系统是一种与没有约束的，未知的环境相互作用的系统，学习系统的目标在大最大可能地获取累积奖励信号，这个奖励信号在有限，未知的生命周期由系统所处的环境中得到，对于一个加强型学习系统，困难之一在于奖励信号非常稀疏，尤其是对于只有时延信号的系统，已有的加强型学习方法以价值函数的形式贮存奖励信号，例如著名的Ｑ－学习。本文提出了一个基于状态的不生估计模型的方法，这个算法对有利用存贮于价值函数中的奖励相似文献

12.

ADAPTIVE MODEL LEARNING BASED ON DYNA-Q LEARNING

Kao-Shing Hwang Wei-Cheng Jiang Yu-Jen Chen 《控制论与系统》2013,44(8):641-662

Dyna-Q, a well-known model-based reinforcement learning (RL) method, interplays offline simulations and action executions to update Q functions. It creates a world model that predicts the feature values in the next state and the reward function of the domain directly from the data and uses the model to train Q functions to accelerate policy learning. In general, tabular methods are always used in Dyna-Q to establish the model, but a tabular model needs many more samples of experience to approximate the environment concisely. In this article, an adaptive model learning method based on tree structures is presented to enhance sampling efficiency in modeling the world model. The proposed method is to produce simulated experiences for indirect learning. Thus, the proposed agent has additional experience for updating the policy. The agent works backwards from collections of state transition and associated rewards, utilizing coarse coding to learn their definitions for the region of state space that tracks back to the precedent states. The proposed method estimates the reward and transition probabilities between states from past experience. Because the resultant tree is always concise and small, the agent can use value iteration to quickly estimate the Q-values of each action in the induced states and determine a policy. The effectiveness and generality of our method is further demonstrated in two numerical simulations. Two simulations, a mountain car and a mobile robot in a maze, are used to verify the proposed methods. The simulation result demonstrates that the training rate of our method can improve obviously. 相似文献

13.

Potential-based reward shaping for finite horizon online POMDP planning

Adam Eck Leen-Kiat Soh Sam Devlin Daniel Kudenko 《Autonomous Agents and Multi-Agent Systems》2016,30(3):403-445

In this paper, we address the problem of suboptimal behavior during online partially observable Markov decision process (POMDP) planning caused by time constraints on planning. Taking inspiration from the related field of reinforcement learning (RL), our solution is to shape the agent’s reward function in order to lead the agent to large future rewards without having to spend as much time explicitly estimating cumulative future rewards, enabling the agent to save time to improve the breadth planning and build higher quality plans. Specifically, we extend potential-based reward shaping (PBRS) from RL to online POMDP planning. In our extension, information about belief states is added to the function optimized by the agent during planning. This information provides hints of where the agent might find high future rewards beyond its planning horizon, and thus achieve greater cumulative rewards. We develop novel potential functions measuring information useful to agent metareasoning in POMDPs (reflecting on agent knowledge and/or histories of experience with the environment), theoretically prove several important properties and benefits of using PBRS for online POMDP planning, and empirically demonstrate these results in a range of classic benchmark POMDP planning problems. 相似文献

14.

基于状态回溯代价分析的启发式Q学习

方敏李浩《模式识别与人工智能》2013,26(9):838-844

由于强化学习算法动作策略学习比较费时,提出一种基于状态回溯的启发式强化学习方法.分析强化学习过程中重复状态,通过比较状态回溯过程中重复动作的选择策略,引入代价函数描述重复动作的重要性.结合动作奖赏及动作代价提出一种新的启发函数定义.该启发函数在强调动作重要性以加快学习速度的同时,基于代价函数计算动作选择的代价以减少不必要的探索,从而平稳地提高学习效率.对基于代价函数的动作选择策略进行证明.建立两种仿真场景,将算法用于机器人路径规划的仿真实验.实验结果表明基于状态回溯的启发式强化学习方法能平衡考虑获得的奖赏及付出的代价,有效提高Q学习的收敛速度. 相似文献

15.

面向动态三维迷宫的综合奖励设计

下载免费PDF全文

焦昌成《计算机应用研究》2024,41(6)

动态三维迷宫是较为困难的、具有不确定性和不完全信息的强化学习任务环境,使用常规奖励函数在此环境中训练任务,速度缓慢甚至可能无法完成。为解决利用强化学习在动态迷宫中寻找多目标的问题,提出一种基于事件触发的综合奖励方案,该方案将三维迷宫中各种行为状态表达为各种事件,再由事件驱动奖励。奖励分为环境奖励和内部奖励,其中环境奖励与三维迷宫任务直接相关,含有体现任务目标的节点奖励和任务约束的约束奖励。内部奖励与智能体学习过程中的状态感受相关,含有判断奖励和心情奖励。在实验中,综合奖励的性能均值相较于改进奖励提升54.66%,结果表明,综合奖励方案在提高完成任务满意度、增强探索能力、提升训练效率方面具有优势。相似文献

16.

结构化状态空间中的递阶再励学习方法

孟江华朱纪洪孙增圻《控制与决策》2007,22(2):233-237

在状态空间满足结构化条件的前提下,通过状态空间的维度划分直接将复杂的原始MDP问题递阶分解为一组简单的MDP或SMDP子问题,并在线对递阶结构进行完善.递阶结构中嵌入不同的再励学习方法可以形成不同的递阶学习.所提出的方法在具备递阶再励学习速度快、易于共享等优点的同时,降低了对先验知识的依赖程度,缓解了学习初期回报值稀少的问题. 相似文献

17.

基于启发式奖赏塑形方法的智能化攻击路径发现

下载免费PDF全文

曾庆伟张国敏邢长友宋丽华《信息安全学报》2024,9(3):44-58

渗透测试作为一种评估网络系统安全性能的重要手段, 是以攻击者的角度模拟真实的网络攻击, 找出网络系统中的脆弱点。而自动化渗透测试则是利用各种智能化方法实现渗透测试过程的自动化, 从而大幅降低渗透测试的成本。攻击路径发现作为自动化渗透测试中的关键技术, 如何快速有效地在网络系统中实现智能化攻击路径发现, 一直受到学术界的广泛关注。现有的自动化渗透测试方法主要基于强化学习框架实现智能化攻击路径发现, 但还存在奖赏稀疏、学习效率低等问题, 导致算法收敛速度慢, 攻击路径发现难以满足渗透测试的高时效性需求。为此, 提出一种基于势能的启发式奖赏塑形函数的分层强化学习算法(HRL-HRSF), 该算法首先利用渗透测试的特性, 根据网络攻击的先验知识提出了一种基于深度横向渗透的启发式方法, 并利用该启发式方法设计出基于势能的启发式奖赏塑形函数, 以此为智能体前期探索提供正向反馈, 有效缓解了奖赏稀疏的问题;然后将该塑形函数与分层强化学习算法相结合, 不仅能够有效减少环境状态空间与动作空间大小, 还能大幅度提高智能体在攻击路径发现过程中的奖赏反馈, 加快智能体的学习效率。实验结果表明, HRL-HRSF 相较于没有奖赏塑形的分层强化学习算法、DQN 及其改进算法更加快速有效, 并且随着网络规模和主机漏洞数目的增大, HRL-HRSF 均能保持更好地学习效率, 拥有良好的鲁棒性和泛化性。相似文献

18.

Software Agent with Reinforcement Learning Approach for Medical Image Segmentation

下载免费PDF全文

Mahsa Chitsaz Chaw Seng Woo 《计算机科学技术学报》2011,26(2):247-255

Many image segmentation solutions are problem-based. Medical images have very similar grey level and texture among the interested objects. Therefore, medical image segmentation requires improvements although there have been researches done since the last few decades. We design a self-learning framework to extract several objects of interest simultaneously from Computed Tomography (CT) images. Our segmentation method has a learning phase that is based on reinforcement learning (RL) system. Each RL agent works on a particular sub-image of an input image to find a suitable value for each object in it. The RL system is define by state, action and reward. We defined some actions for each state in the sub-image. A reward function computes reward for each action of the RL agent. Finally, the valuable information, from discovering all states of the interest objects, will be stored in a Q-matrix and the final result can be applied in segmentation of similar images. The experimental results for cranial CT images demonstrated segmentation accuracy above 95%. 相似文献

19.

一种基于自生成样本学习的奖赏塑形方法

钱煜俞扬周志华《软件学报》2013,24(11):2667-2675

强化学习通过从以往的决策反馈中学习,使Agent 做出正确的短期决策,以最大化其获得的累积奖赏值.以往研究发现,奖赏塑形方法通过提供简单、易学的奖赏替代函数(即奖赏塑性函数)来替换真实的环境奖赏,能够有效地提高强化学习性能.然而奖赏塑形函数通常是在领域知识或者最优策略示例的基础上建立的,均需要专家参与,代价高昂.研究是否可以在强化学习过程中自动地学习有效的奖赏塑形函数.通常,强化学习算法在学习过程中会采集大量样本.这些样本虽然有很多是失败的尝试,但对构造奖赏塑形函数可能提供有用信息.提出了针对奖赏塑形的新型最优策略不变条件,并在此基础上提出了RFPotential 方法,从自生成样本中学习奖赏塑形.在多个强化学习算法和问题上进行了实验,其结果表明,该方法可以加速强化学习过程. 相似文献

20.

Semi-Markov adaptive critic heuristics with application to airline revenue management

Ketaki KULKARNI Abhijit GOSAVI Susan MURRAY Katie GRANTHAM 《控制理论与应用(英文版)》2011,9(3):421-430

The adaptive critic heuristic has been a popular algorithm in reinforcement learning(RL) and approximate dynamic programming(ADP) alike.It is one of the first RL and ADP algorithms.RL and ADP algorithms are particularly useful for solving Markov decision processes(MDPs) that suffer from the curses of dimensionality and modeling.Many real-world problems,however,tend to be semi-Markov decision processes(SMDPs) in which the time spent in each transition of the underlying Markov chains is itself a random variable.Unfortunately for the average reward case,unlike the discounted reward case,the MDP does not have an easy extension to the SMDP.Examples of SMDPs can be found in the area of supply chain management,maintenance management,and airline revenue management.In this paper,we propose an adaptive critic heuristic for the SMDP under the long-run average reward criterion.We present the convergence analysis of the algorithm which shows that under certain mild conditions,which can be ensured within a simulator,the algorithm converges to an optimal solution with probability 1.We test the algorithm extensively on a problem of airline revenue management in which the manager has to set prices for airline tickets over the booking horizon.The problem has a large scale,suffering from the curse of dimensionality,and hence it is difficult to solve it via classical methods of dynamic programming.Our numerical results are encouraging and show that the algorithm outperforms an existing heuristic used widely in the airline industry. 相似文献