Temporal sequence learning facilitates the near-term prediction of future events, a skill that is essential in order for intelligent agents to function in a dynamic world. Agents acting in real-time dynamic environments must anticipate the movement of other agents/objects to avoid collisions and capture prey, anticipate the consequences of certain sequences of actions, etc. Classical (Pavlovian) conditioning clearly demonstrates this basic capability, however the neural mechanisms which mediate it are not currently understood. Further, contemporary methods in machine learning and neural modeling either reduce the problem to a simple Markov Decision Problem (in which state-augmentation is avoided altogether or becomes problematic in terms of combinatorial state explosion), or employ learning mechanisms that do not reflect simple human abilities in terms of fast, adaptive learning.
Short term memory is particularly adept at the rapid acquisition of knowledge at the cost of some level of error and the likely possibility that the validity of the learned knowledge may be very short. Humans use short term memory to rapidly adapt to changing environments, retaining the ability to adapt to completely new environmental conditions. Machines operating autonomously in dynamic or hostile environments must be designed with similar capabilities.
This work presents a novel short-term-memory-based machine learning model that exhibits the biologically relevant characteristics of on-line, rapid, adaptable learning while avoiding the combinatorial state explosion of traditional Markov-based approaches. The model is based on associative hindsight learning of salient event outcomes together with the appropriate selection of context that predicts those events. The model employs an entropy measure of predictive utility to both isolate highly predictive rules governing the environment and prune away those rules that are not useful.
Based on the success of this model in both synthetic and game-playing environments, extensions to basic reinforcement learning models are suggested that provide closer alignment with established experimental findings in the field of animal learning behavior.
University of Minnesota Ph.D. dissertation. May 2010. Major: Computer Science. Advisors:Maria Gini, Paul Schrater. 1 computer file (PDF); xvi, 123 pages. Ill. (some col.)
Jensen, Steven L..
Learning in dynamic temporal domains using contextual prediction entropy as a guiding principle..
Retrieved from the University of Minnesota Digital Conservancy,
Content distributed via the University of Minnesota's Digital Conservancy may be subject to additional license and use restrictions applied by the depositor.