# Tensorflow for Reinformcement Learning

Slides rl-3.pdf

We will start today with getting used to tensorflow/keras, you can also adapt the exercises to pytorch or similar if prefered (but code-examples will be given in tensorflow).

First, install tensorflow:

pip install tensorflow

We will only be using the Keras API, you can find the documentation here

Verify in python with:

import tensorflow as tf
print(tf.__version__)

### Part A - Perceptron

We can make a single perceptron with keras like this:

from tensorflow import keras
from tensorflow.python.keras.layers import Dense

model = tf.keras.Sequential([
Dense(units=1, input_dim=1)
])

and do a forward propagation with:

import numpy as np

model(np.array([]))  # For x = 5

Furthermore, we can get and set the current weights with:

# Get weights
model.layers.get_weights()

# Set weights (TODO: replace w1 and b1)
model.layers.set_weights([np.array([[w1]]), np.array([b1]))

Tasks/Questions - Test out different values for the weight and bias - How do you forward-propogate multiple values at once? - Can you plot the graph for some range of inputs?

## Task 2 - Q-Values from an ANN

We still want to work with Q-values, meaning that we would like some value for all possible actions as output from our neural network. Our FrozenLake environment has 4 possible actions, and we already know the q-values for all possible states, making it easy to fit a neural network.

### Part A - Creating a network

The following code will create a neural network that inputs a state (one value) and outputs 4 values (one for each action), it will also assume 16 possible states (0-15):

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.python.keras.layers import Dense

x_data = np.linspace(0, 15, 16)
normalizer = keras.layers.Normalization(input_shape=[1,], axis=None)

model = keras.Sequential([
normalizer,
Dense(64, activation='relu'),
Dense(64, activation='relu'),
Dense(4)
])

model.compile(
loss='mse'
)

Some of this code can be safely ignored (normalization and the compile method).

Tasks/Questions - What is the design of this neural network?

### Part B - Training

As we already have Q-Values, let us train the network on the data:

y_data = np.array([
[0.54, 0.53, 0.53, 0.52],
[0.34, 0.33, 0.32, 0.50],
[0.44, 0.43, 0.42, 0.47],
[0.31, 0.31, 0.30, 0.46],
[0.56, 0.38, 0.37, 0.36],
[0., 0., 0., 0.],
[0.36, 0.2, 0.36, 0.16],
[0., 0., 0., 0.],
[0.38, 0.41, 0.40, 0.59],
[0.44, 0.64, 0.45, 0.40],
[0.62, 0.50, 0.40, 0.33],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0.46, 0.53, 0.74, 0.50],
[0.73, 0.86, 0.82, 0.78],
[1, 1, 1, 1]
])

model.fit(
x_data,
y_data,
epochs=50000,
verbose=0)

Tasks/Questions - Test out the forward propagation, are the values similar to what you expect from a Q-table? - Plot the utility given optimal play.

### Part C - FrozenLake

Given the model trained above and an optimal policy (argmax of output), can you move around the environment/solve the problem?