DEPARTMENT OF COMPUTING

Course Home | Syllabus | Assignments | Schedule | Notes | Downloads | [print]

CS 4320: Machine Learning

Assignment: Temporal Difference Q-Learning (Reinforcement Learning)

Train a reinforcement agent to perform in the Taxi-v3 environment.

It is expected that you will use the FrozenLake example code as a starting point for your code development.

Search the hyper parameter space to identify the best combination of values for alpha, gamma, and epsilon-chance-factor. Best is determined by the average reward during training epochs, and the average reward during scoring epochs. Note, this means you’ll be reporting 2 combinations.

Use 5000 learning epochs and 100 scoring epochs.

Create a report that includes:

Required Steps

Last Updated 03/20/2023