Cosyne 2008 Workshops
March 3-4, 2008
Snow Bird, Utah
Speaker Name
Razvan Florian, Center for Cognitive and Neural Studies, Cluj, Romania
Talk Title
Relating reinforcement learning and STDP
Talk Abstract
It has been shown analytically that reinforcement learning (RL) can be implemented in stochastic spiking neural networks by a plasticity mechanism similar to STDP, modulated by a global reward signal (Florian, 2005, 2007; de Queiroz et al., 2006; Baras and Meir, 2007). For this learning rule, plasticity results associatively from pre-post pairs of spikes, but there is no correspondent to post-pre associative plasticity encountered in typical forms of STDP unless homeostasis is explicitly considered (Pfister et al., 2006; Florian, 2007). Modulating typical STDP with the reward signal has also been shown to lead to RL, in simulations (Soula et al., 2004, 2005; Florian, 2005, 2007; Izhikevich, 2007; Henry et al., 2007; Farries and Fairhall, in press) and also analytically, under certain conditions (Legenstein et al., 2008). It has also been found that STDP can be neuromodulated in the brain, but the modulation of the typical form of STDP results from different mechanisms for depression and potentiation (Seol et al., 2007). From all this we can conclude that post-pre associative plasticity is sufficient but not necessary for implementing RL in spiking neural networks. We present alternatives to this type of plasticity for RL in spiking neural networks.
References
Baras, D. and Meir, R. (2007), Reinforcement learning, spike-time-dependent plasticity and the BCM rule, Neural Computation 19, 22452279. http://eprints.pascal-network.org/archive/00002561/01/RL-STDP_Final.pdf
de Queiroz, M. S., de Berredo, R. C. and de Pádua Braga, A. (2006), Reinforcement learning of a simple control task using the spike response model, Neurocomputing 70(13), 1420. http://www.cpdee.ufmg.br/~apbraga/journals/spiking-neuroc.pdf
Farries, M. A. and Fairhall, A. L. (2007), Reinforcement learning with modulated spike timing-dependent synaptic plasticity, Journal of Neurophysiology 98, 3648-3665. http://dx.doi.org/10.1152/jn.00364.2007
Florian, R. V. (2005), A reinforcement learning algorithm for spiking neural networks, in D. Zaharie, D. Petcu, V. Negru, T. Jebelean, G. Ciobanu, A. Cicortas, A. Abraham and M. Paprzycki, eds, Proceedings of the Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Comput- ing (SYNASC 2005), IEEE Computer Society, Los Alamitos, CA, pp. 299306. http://www.coneural.org/florian/papers/05_RL_for_spiking_NNs.php
Florian, R. V. (2007), Reinforcement learning through modulation of spike-timing dependent plasticity, Neural Computation 19(6), 14681502. http://www.coneural.org/florian/papers/2007_florian_modulated_STDP.php
Henry, F., Daucé, E. and Soula, H. (2007), Temporal pattern identification using spike-timing dependent plasticity, Neurocomputing 70, 2009-2016. http://dx.doi.org/10.1016/j.neucom.2006.10.082
Izhikevich, E. M. (2007), Solving the distal reward problem through linkage of STDP and dopamine signaling, Cerebral Cortex 17(10), 24432452. http://vesicle.nsi.edu/users/izhikevich/publications/dastdp.pdf
Legenstein, R., Pecevski, D. and Maass, W. (2008), Theoretical analysis of learning with reward-modulated spike-timing-dependent plasticity, in J. Platt, D. Koller, Y. Singer and S. Roweis, eds, Advances in Neural Information Processing Systems 20, MIT Press, Cambridge, MA. http://books.nips.cc/papers/files/nips20/NIPS2007_0643.pdf
Pfister, J.-P., Toyoizumi, T., Barber, D. and Gerstner, W. (2006), Optimal spike-timing-dependent plasticity for precise action potential firing in supervised learning, Neural Computation 18(6), 13181348. http://diwww.epfl.ch/~gerstner/PUBLICATIONS/Pfister06.pdf
Seol, G. H., Ziburkus, J., Huang, S., Song, L., Kim, I. T., Takamiya, K., Huganir, R. L., Lee, H.-K. and Kirkwood, A. (2007), Neuromodulators control the polar- ity of spike-timing-dependent synaptic plasticity, Neuron 55, 919929. http://dx.doi.org/10.1016/j.neuron.2007.08.013
Soula, H., Alwan, A. and Beslon, G. (2004), Obstacle avoidance learning in a spiking neural network, in Last Minute Results of Simulation of Adaptive Behavior, Los Angeles, CA. http://www.koredump.org/hed/abstract_sab2004.pdf
Soula, H., Alwan, A. and Beslon, G. (2005), Learning at the edge of chaos: Temporal coupling of spiking neuron controller of autonomous robotic, in Proceedings of AAAI Spring Symposia on Developmental Robotics, AAAI Press, Menlo Park, CA, USA. http://koredump.org/hed/soula_aaai05.pdf