|
On-line Appendices
Solving Large MDPs Quickly with Partitioned Value Iteration
Particle Swarm Optimization
Christopher K. Monson and Kevin D. Seppi
The Kalman Swarm: A New Approach to Particle Motion in Swarm Optimization
GECCO 2004
PDF
PS
Particle Swarm Optimization is gaining momentum as a simple and
effective optimization technique. We present a new approach to PSO that
significantly reduces the number of iterations required to reach good
solutions. In contrast with much recent research, the focus of this
work is on fundamental particle motion, making use of the Kalman
Filter to update particle positions. This enhances exploration without
hurting the ability to converge rapidly to good solutions.
Christopher K. Monson and Kevin D. Seppi
Improving on the Kalman Swarm: Extacting its Essential Characteristics
GECCO 2004 Late-Breaking Paper
PDF
PS
The Kalman Swarm (KSwarm) is a new approach to particle motion in PSO that
reduces the number of iterations required to reach good solutions.
Unfortunately, it has much higher computational complexity than basic PSO.
This paper addresses the runtime of KSwarm in a new algorithm called ``Linear
Kalman Swarm'' (LinkSwarm) which has linear complexity and performs even better
than KSwarm. Some possible reasons for the success of KSwarm are also
explored.
Variable Resolution Discretization
Christopher K. Monson, David Wingate, Kevin D. Seppi, Todd S. Peterson
Variable Resolution Discretization in the Joint Space
ICMLA 2004
PDF
PS
We present JoSTLe, an algorithm that performs value iteration on control
problems with continuous actions, allowing this useful reinforcement learning
technique to be applied to problems where a priori action discretization is
inadequate. The algorithm is an extension of a variable resolution technique
that works for problems with continuous states and discrete actions. Results
are given that indicate that JoSTLe is a promising step toward reinforcement
learning in a fully continuous domain.
Christopher K. Monson
Reinforcement Learning in the Joint Space: Value Iteration in Worlds
with Continuous States and Actions
Master's Thesis
PDF
PS
Continuous space reinforcement learning algorithms frequently fail to
address the possibility of a continuous action space, presumably
because of the difficulty of discovering the best action for a
particular state. This can, in some cases, severely limit the ability
of a learning algorithm to tackle some common problems where different
portions of the state space require distinct action granularity.
Naive action discretization does not suffice for problems of this
nature, so traditional reinforcement approaches that consider only the
continuous state space fail to solve these kinds of problems.
JoSTLe (Joint Space Triangulation Learner) addresses the need for a
reinforcement learning approach that can handle a continuous action
space by means of intelligent discretization. It employs the variable
resolution discretization techniques of Munos and Moore, but in an
augmented ``joint'' space, one that includes actions as well as
states.
The algorithm is shown to work on a problem that requires the treatment of a
continuous action space, as well as one that does not. The efficacy of the
algorithm as well as its sensitivity to parameter tuning are shown through
mathematical arguments and experimental data.
Value Iteration
David Wingate and Kevin D. Seppi
Efficient Value Iteration Using Partitioned Models
ICMLA 2003
PDF
PS
In order to solve large-scale value iteration problems, more
intelligent allocation of computing time is needed. We introduce the
idea of an information frontier, which allows us to identify maximally
productive regions of the problem space. We present a potential
information flow metric which allows us to quantify the frontier
precisely. We also introduce a partitioning scheme, which effectively
combines with the flow metric to reduce the complexity of problematic
operations. The framework is powerful, and can be used to parallelize
value-iteration, effectively manage memory in large-scale problems, or
further multi-agent cooperative solution methodologies. A complete
algorithm is developed and successfully tested on several problems.
Experimental evidence is presented which demonstrates the efficacy of
the approach.
Task Transfer Related Papers
James L. Carroll, Todd Peterson, Kevin Seppi
Reinforcement Learning Task Clustering (RLTC)
ICMLA 2003
PDF
PS
This work represents the first step towards a task library system in
the reinforcement learning domain. Task libraries could be useful in
speeding up the learning of new tasks through task transfer. Related
transfer can increase learning rate and can help prevent convergence
to sub-optimal policies in reinforcement learning. Unrelated transfer
can be extremely detrimental to the learning rate. Thus task transfer
is useful in reinforcement learning if the source task and the target
task are sufficiently related. Task similarity in reinforcement
learning can be determined using many different similarity metrics,
and simple clustering mechanisms can be applied to determine a set of
related tasks. Invariants can be determined among the set of related
tasks and then used in transfer. This paper uses information gathered
from a set of simple grid world tasks to show that clustering of tasks
based upon a similarity metric can be helpful in determining the set
of source tasks which should be utilized in transfer.
Todd S. Peterson, Nancy E. Owens and James L. Carroll
Towards Automatic Shaping in Robot Navigation
ICRA 2001
PDF
PS
Shaping is a potentially powerful tool in reinforcement learning
applications. Shaping often fails to function effectively because
of a lack of understanding about its effects when applied in
reinforcement learning settings and the use of inadequate
algorithms in its implementation. Because of these difficulties
current shaping techniques require some form of manual
intervention. We examine some of the principles involved in
shaping and present a new algorithm for automatic transferral of
knowledge which uses Q-values established in a previous task to
guide exploration in the learning of a new task. This algorithm
is applied to two different but related robot navigation tasks.
James L. Carroll, Todd S. Peterson, Nancy E. Owens
Memory-guided Exploration in Reinforcement Learning
IJCNN 2001
PDF
PS
The life-long learning architecture attempts to create an adaptive
agent through the incorporation of prior knowledge over the lifetime
of a learning agent. Our paper focuses on task transfer in
reinforcement learning and specifically in Q-learning. There are
three main model free methods for performing task transfer in
Q-learning: direct transfer, soft transfer and memory-guided
exploration. In direct transfer Q-values from a previous task are
used to initialize the Q-values of the next task. Soft transfer
initializes the Q-values of the new task with a weighted average of
the standard initialization value and the Q-values of the previous
task. In memory-guided exploration the Q-values of previous tasks are
used as a guide in the initial exploration of the agent. The weight
that the agent gives to its past experience decreases over time. We
explore stability issues related to the off-policy nature of
memory-guided exploration and compare memory-guided exploration to
soft transfer and direct transfer in three different environments.
Nancy Owens and Todd Peterson
Using a Reinforcement Learning Controller to Overcome
Simulator/Environment Discrepancies
Proceedings of the 2001 IEEE Systems, Man, and Cybernetics Conference
Microsoft Word
James L. Carroll, Todd S. Peterson
Fixed vs. Dynamic Sub-transfer in Reinforcement Learning. April 30, 2002
ICMLA 2002. M. Arif Wani. Las Vegas Nevada, USA June 24-27, 2002 CSREA Press
PS
We survey various task transfer methods in Q-learning and present a
variation on fixed sub-transfer which we call dynamic sub-transfer.
We discuss the benefits and drawbacks of dynamic sub-transfer as
compared with the other transfer methods, and we describe
qualitatively the situations where this method would be prefered over
the fixed version of sub-transfer. We test this method against
several other transfer methods in a simple three room grid world where
portions of the source's policy are relevant to the target task and
other portions are not. In this situation we found that dynamic
sub-transfer converged to the optimal solution, avoiding the
sub-optimality inherent in fixed sub-transfer, while also avoiding
some of the convergence problems often experienced by fixed
sub-transfer.
|