期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Sugimoto N Haruno M Doya K Kawato M 《Neural computation》2012,24(3):577-606

Reinforcement learning (RL) can provide a basic framework for autonomous robots to learn to control and maximize future cumulative rewards in complex environments. To achieve high performance, RL controllers must consider the complex external dynamics for movements and task (reward function) and optimize control commands. For example, a robot playing tennis and squash needs to cope with the different dynamics of a tennis or squash racket and such dynamic environmental factors as the wind. In addition, this robot has to tailor its tactics simultaneously under the rules of either game. This double complexity of the external dynamics and reward function sometimes becomes more complex when both the multiple dynamics and multiple reward functions switch implicitly, as in the situation of a real (multi-agent) game of tennis where one player cannot observe the intention of her opponents or her partner. The robot must consider its opponent's and its partner's unobservable behavioral goals (reward function). In this article, we address how an RL agent should be designed to handle such double complexity of dynamics and reward. We have previously proposed modular selection and identification for control (MOSAIC) to cope with nonstationary dynamics where appropriate controllers are selected and learned among many candidates based on the error of its paired dynamics predictor: the forward model. Here we extend this framework for RL and propose MOSAIC-MR architecture. It resembles MOSAIC in spirit and selects and learns an appropriate RL controller based on the RL controller's TD error using the errors of the dynamics (the forward model) and the reward predictors. Furthermore, unlike other MOSAIC variants for RL, RL controllers are not a priori paired with the fixed predictors of dynamics and rewards. The simulation results demonstrate that MOSAIC-MR outperforms other counterparts because of this flexible association ability among RL controllers, forward models, and reward predictors. 相似文献

2.

Mosaic model for sensorimotor learning and control 总被引：1，自引：0，他引：1

Haruno M Wolpert DM Kawato M 《Neural computation》2001,13(10):2201-2220

Humans demonstrate a remarkable ability to generate accurate and appropriate motor behavior under many different and often uncertain environmental conditions. We previously proposed a new modular architecture, the modular selection and identification for control (MOSAIC) model, for motor learning and control based on multiple pairs of forward (predictor) and inverse (controller) models. The architecture simultaneously learns the multiple inverse models necessary for control as well as how to select the set of inverse models appropriate for a given environment. It combines both feedforward and feedback sensorimotor information so that the controllers can be selected both prior to movement and subsequently during movement. This article extends and evaluates the MOSAIC architecture in the following respects. The learning in the architecture was implemented by both the original gradient-descent method and the expectation-maximization (EM) algorithm. Unlike gradient descent, the newly derived EM algorithm is robust to the initial starting conditions and learning parameters. Second, simulations of an object manipulation task prove that the architecture can learn to manipulate multiple objects and switch between them appropriately. Moreover, after learning, the model shows generalization to novel objects whose dynamics lie within the polyhedra of already learned dynamics. Finally, when each of the dynamics is associated with a particular object shape, the model is able to select the appropriate controller before movement execution. When presented with a novel shape-dynamic pairing, inappropriate activation of modules is observed followed by on-line correction. 相似文献

3.

Structural and electronic properties of extremely long perylene bisimide nanofibers formed through a stoichiometrically mismatched, hydrogen-bonded complexation

Yagai S Seki T Murayama H Wakikawa Y Ikoma T Kikkawa Y Karatsu T Kitamura A Honsho Y Seki S 《Small (Weinheim an der Bergstrasse, Germany)》2010,6(23):2731-2740

Extremely long nanofibers, whose lengths reach the millimeter regime, are generated via co-aggregation of a melamine-appended perylene bisimide semiconductor and a substituted cyanurate, both of which are ditopic triple-hydrogen-bonding building blocks; they co-aggregate in an unexpected stoichiometrically mismatched 1:2 ratio. Various microscopic and X-ray diffraction studies suggest that hydrogen-bonded polymeric chains are formed along the long axis of the nanofibers by the 1:2 complexation of the two components, which further stack along the short axis of the nanofibers. The photocarrier generation mechanism in the nanofibers is investigated by time-of-flight (TOF) experiments under electric and magnetic fields, revealing the birth and efficient recombination of singlet geminate electron-hole pairs. Flash-photolysis time-resolved microwave conductivity (FP-TRMC) measurements revealed intrinsic 1D electron mobilities up to 0.6 cm(2) V(-1) s(-1) within nanofibers. 相似文献

4.

Ultra-high-precision time control system over any long time delay for laser pump and synchrotron x-ray probe experiment

Fukuyama Y Yasuda N Kim J Murayama H Ohshima T Tanaka Y Kimura S Kamioka H Moritomo Y Toriumi K Tanaka H Kato K Ishikawa T Takata M 《The Review of scientific instruments》2008,79(4):045107

An ultra-high-precision clock system for long time delay has been developed for picosecond time-resolved x-ray diffraction measurements using synchrotron radiation (SR) pulses and synchronized femtosecond laser pulses. The time delay control between pump laser pulse and the probe SR pulse was achieved by combining an in-phase quadrature modulator and a synchronous counter. This method allowed us to change the delay time by a nearly infinite amount while maintaining the precision of +/-8.40 ps. Time-resolved diffraction measurements using the delay control system were demonstrated for precise measurement of an acoustic velocity in a single crystal of gallium arsenide. 相似文献

5.

Local expression of C-type natriuretic peptide markedly suppresses neointimal formation in rat injured arteries through an autocrine/paracrine loop

H Ueno A Haruno N Morisaki M Furuya K Kangawa A Takeshita Y Saito 《Canadian Metallurgical Quarterly》1997,96(7):2272-2279

BACKGROUND: In vivo gene transfer into injured arteries may provide a new means to facilitate molecular understanding of and to treat the intractable fibroproliferative arterial diseases. Selection of an optimal molecule to be transferred will be a key to successful gene therapy in the future. We tested the hypothesis that a secreted multifactorial molecule should act more efficiently through an autocrine/paracrine loop to suppress neointimal formation elicited in injured arteries than a simple growth-inhibiting molecule that might be expressed inside cells. METHODS AND RESULTS: We constructed an adenoviral vector (AdCACNP) expressing C-type natriuretic peptide (CNP), a secreted stimulator of membrane-bound guanyl cyclase. AdCACNP directs cells to secrete large quantities of biologically active CNP. Serum-stimulated DNA synthesis and cell proliferation were only moderately suppressed in arterial smooth muscle cells infected with AdCACNP in vitro. However, when AdCACNP was applied to balloon-injured rat carotid arteries in vivo, neointimal formation was markedly reduced (90% reduction) in an infection-site-specific manner without an increase in plasma CNP level. CONCLUSIONS: Our results showed that CNP, a secreted multifactorial molecule, was indeed effective in suppressing fibroproliferative response in injured arteries and suggest that the potent antiproliferation effect may not be the most critical factor for the effective suppression of neointimal formation. An adenovirus-mediated expression of CNP could be an effective and site-specific form of molecular intervention in proliferative arterial diseases. 相似文献

6.

Using Decision Trees to Construct a Practical Parser

Haruno Masahiko Shirai Satoshi Ooyama Yoshifumi 《Machine Learning》1999,34(1-3):131-149

This paper describes a novel and practical Japanese parser that uses decision trees. First, we construct a single decision tree to estimate modification probabilities; how one phrase tends to modify another. Next, we introduce a boosting algorithm in which several decision trees are constructed and then combined for probability estimation. The constructed parsers are evaluated using the EDR Japanese annotated corpus. The single-tree method significantly outperforms the conventional Japanese stochastic methods. Moreover, the boosted version of the parser is shown to have great advantages; (1) a better parsing accuracy than its single-tree counterpart for any amount of training data and (2) no over-fitting to data for various iterations. The presented parser, the first non-English stochastic parser with practical performance, should tighten the coupling between natural language processing and machine learning. 相似文献