Publications
About Us | Profiles | Publications | Resources


On-line Appendices

Solving Large MDPs Quickly with Partitioned Value Iteration

Particle Swarm Optimization

Christopher K. Monson and Kevin D. Seppi
The Kalman Swarm: A New Approach to Particle Motion in Swarm Optimization
GECCO 2004
PDF PS
Particle Swarm Optimization is gaining momentum as a simple and effective optimization technique. We present a new approach to PSO that significantly reduces the number of iterations required to reach good solutions. In contrast with much recent research, the focus of this work is on fundamental particle motion, making use of the Kalman Filter to update particle positions. This enhances exploration without hurting the ability to converge rapidly to good solutions.
Christopher K. Monson and Kevin D. Seppi
Improving on the Kalman Swarm: Extacting its Essential Characteristics
GECCO 2004 Late-Breaking Paper
PDF PS
The Kalman Swarm (KSwarm) is a new approach to particle motion in PSO that reduces the number of iterations required to reach good solutions. Unfortunately, it has much higher computational complexity than basic PSO. This paper addresses the runtime of KSwarm in a new algorithm called ``Linear Kalman Swarm'' (LinkSwarm) which has linear complexity and performs even better than KSwarm. Some possible reasons for the success of KSwarm are also explored.

Variable Resolution Discretization

Christopher K. Monson, David Wingate, Kevin D. Seppi, Todd S. Peterson
Variable Resolution Discretization in the Joint Space
ICMLA 2004
PDF PS
We present JoSTLe, an algorithm that performs value iteration on control problems with continuous actions, allowing this useful reinforcement learning technique to be applied to problems where a priori action discretization is inadequate. The algorithm is an extension of a variable resolution technique that works for problems with continuous states and discrete actions. Results are given that indicate that JoSTLe is a promising step toward reinforcement learning in a fully continuous domain.
Christopher K. Monson
Reinforcement Learning in the Joint Space: Value Iteration in Worlds with Continuous States and Actions
Master's Thesis
PDF PS
Continuous space reinforcement learning algorithms frequently fail to address the possibility of a continuous action space, presumably because of the difficulty of discovering the best action for a particular state. This can, in some cases, severely limit the ability of a learning algorithm to tackle some common problems where different portions of the state space require distinct action granularity. Naive action discretization does not suffice for problems of this nature, so traditional reinforcement approaches that consider only the continuous state space fail to solve these kinds of problems.

JoSTLe (Joint Space Triangulation Learner) addresses the need for a reinforcement learning approach that can handle a continuous action space by means of intelligent discretization. It employs the variable resolution discretization techniques of Munos and Moore, but in an augmented ``joint'' space, one that includes actions as well as states.

The algorithm is shown to work on a problem that requires the treatment of a continuous action space, as well as one that does not. The efficacy of the algorithm as well as its sensitivity to parameter tuning are shown through mathematical arguments and experimental data.

Value Iteration

David Wingate and Kevin D. Seppi
Efficient Value Iteration Using Partitioned Models
ICMLA 2003
PDF PS
In order to solve large-scale value iteration problems, more intelligent allocation of computing time is needed. We introduce the idea of an information frontier, which allows us to identify maximally productive regions of the problem space. We present a potential information flow metric which allows us to quantify the frontier precisely. We also introduce a partitioning scheme, which effectively combines with the flow metric to reduce the complexity of problematic operations. The framework is powerful, and can be used to parallelize value-iteration, effectively manage memory in large-scale problems, or further multi-agent cooperative solution methodologies. A complete algorithm is developed and successfully tested on several problems. Experimental evidence is presented which demonstrates the efficacy of the approach.

Task Transfer Related Papers

James L. Carroll, Todd Peterson, Kevin Seppi
Reinforcement Learning Task Clustering (RLTC)
ICMLA 2003
PDF PS
This work represents the first step towards a task library system in the reinforcement learning domain. Task libraries could be useful in speeding up the learning of new tasks through task transfer. Related transfer can increase learning rate and can help prevent convergence to sub-optimal policies in reinforcement learning. Unrelated transfer can be extremely detrimental to the learning rate. Thus task transfer is useful in reinforcement learning if the source task and the target task are sufficiently related. Task similarity in reinforcement learning can be determined using many different similarity metrics, and simple clustering mechanisms can be applied to determine a set of related tasks. Invariants can be determined among the set of related tasks and then used in transfer. This paper uses information gathered from a set of simple grid world tasks to show that clustering of tasks based upon a similarity metric can be helpful in determining the set of source tasks which should be utilized in transfer.
Todd S. Peterson, Nancy E. Owens and James L. Carroll
Towards Automatic Shaping in Robot Navigation
ICRA 2001
PDF PS
Shaping is a potentially powerful tool in reinforcement learning applications. Shaping often fails to function effectively because of a lack of understanding about its effects when applied in reinforcement learning settings and the use of inadequate algorithms in its implementation. Because of these difficulties current shaping techniques require some form of manual intervention. We examine some of the principles involved in shaping and present a new algorithm for automatic transferral of knowledge which uses Q-values established in a previous task to guide exploration in the learning of a new task. This algorithm is applied to two different but related robot navigation tasks.

James L. Carroll, Todd S. Peterson, Nancy E. Owens
Memory-guided Exploration in Reinforcement Learning
IJCNN 2001
PDF PS

The life-long learning architecture attempts to create an adaptive agent through the incorporation of prior knowledge over the lifetime of a learning agent. Our paper focuses on task transfer in reinforcement learning and specifically in Q-learning. There are three main model free methods for performing task transfer in Q-learning: direct transfer, soft transfer and memory-guided exploration. In direct transfer Q-values from a previous task are used to initialize the Q-values of the next task. Soft transfer initializes the Q-values of the new task with a weighted average of the standard initialization value and the Q-values of the previous task. In memory-guided exploration the Q-values of previous tasks are used as a guide in the initial exploration of the agent. The weight that the agent gives to its past experience decreases over time. We explore stability issues related to the off-policy nature of memory-guided exploration and compare memory-guided exploration to soft transfer and direct transfer in three different environments.

Nancy Owens and Todd Peterson
Using a Reinforcement Learning Controller to Overcome Simulator/Environment Discrepancies
Proceedings of the 2001 IEEE Systems, Man, and Cybernetics Conference
Microsoft Word

James L. Carroll, Todd S. Peterson
Fixed vs. Dynamic Sub-transfer in Reinforcement Learning. April 30, 2002
ICMLA 2002. M. Arif Wani. Las Vegas Nevada, USA June 24-27, 2002 CSREA Press
PS

We survey various task transfer methods in Q-learning and present a variation on fixed sub-transfer which we call dynamic sub-transfer. We discuss the benefits and drawbacks of dynamic sub-transfer as compared with the other transfer methods, and we describe qualitatively the situations where this method would be prefered over the fixed version of sub-transfer. We test this method against several other transfer methods in a simple three room grid world where portions of the source's policy are relevant to the target task and other portions are not. In this situation we found that dynamic sub-transfer converged to the optimal solution, avoiding the sub-optimality inherent in fixed sub-transfer, while also avoiding some of the convergence problems often experienced by fixed sub-transfer.



Copyright Notice | Advanced Search | Return to Top