Revision 2071c852751535b6df1d28671a9dbb31a4fb600d (click the page title to view the current version)

Tensorflow for Reinformcement Learning

Slides rl-3.pdf

Task 1 - Tensorflow/Keras

We will start today with getting used to tensorflow/keras, you can also adapt the exercises to pytorch or similar if prefered (but code-examples will be given in tensorflow).

First, install tensorflow:

We will only be using the Keras API, you can find the documentation here

Verify in python with:

Part A - Perceptron

We can make a single perceptron with keras like this:

and do a forward propagation with:

Furthermore, we can get and set the current weights with:

Tasks/Questions - Test out different values for the weight and bias - How do you forward-propogate multiple values at once? - Can you plot the graph for some range of inputs?

Task 2 - Q-Values from an ANN

We still want to work with Q-values, meaning that we would like some value for all possible actions as output from our neural network. Our FrozenLake environment has 4 possible actions, and we already know the q-values for all possible states, making it easy to fit a neural network.

Part A - Creating a network

The following code will create a neural network that inputs a state (one value) and outputs 4 values (one for each action), it will also assume 16 possible states (0-15):

Some of this code can be safely ignored (normalization and the compile method).

Tasks/Questions - What is the design of this neural network?

Part B - Training

As we already have Q-Values, let us train the network on the data:

Tasks/Questions - Test out the forward propagation, are the values similar to what you expect from a Q-table? - Plot the utility given optimal play.

Part C - FrozenLake

Given the model trained above and an optimal policy (argmax of output), can you move around the environment/solve the problem?

Task 3 - DQN

Given exercises from last week, we now only need an implementation of a replay-buffer to implement a DQN (Deep Q-network) agent. The replay-buffer needs two methods, one to store experiences (state, action, reward, next_state), and one to sample from the replay-buffer.

Implement these two methods

Task 4 - MountainCar

Until now we have been working on the FrozenLake environment. Try to solve the MountainCar environment using techniques we have learned in this course. I will update this page with a DQN-solution later (hopefully before the end of the day).