--- title: Tensorflow for Reinformcement Learning categories: session --- **Slides** [rl-3.pdf]() ## Task 1 - Tensorflow/Keras We will start today with getting used to tensorflow/keras, you can also adapt the exercises to pytorch or similar if prefered (but code-examples will be given in tensorflow). First, install tensorflow: ```bash pip install tensorflow ``` We will only be using the Keras API, you can find the documentation [here](https://www.tensorflow.org/guide/keras/sequential_model) Verify in python with: ```python import tensorflow as tf print(tf.__version__) ``` ### Part A - Perceptron We can make a single perceptron with keras like this: ```python from tensorflow import keras from tensorflow.python.keras.layers import Dense model = tf.keras.Sequential([ Dense(units=1, input_dim=1) ]) ``` and do a forward propagation with: ```python import numpy as np model(np.array([[5]])) # For x = 5 ``` Furthermore, we can get and set the current weights with: ```python # Get weights model.layers[0].get_weights() # Set weights (TODO: replace w1 and b1) model.layers[0].set_weights([np.array([[w1]]), np.array([b1])) ``` **Tasks/Questions** - Test out different values for the weight and bias - How do you forward-propogate multiple values at once? - Can you plot the graph for some range of inputs? ## Task 2 - Q-Values from an ANN We still want to work with Q-values, meaning that we would like some value for all possible actions as output from our neural network. Our FrozenLake environment has 4 possible actions, and we already know the q-values for all possible states, making it easy to fit a neural network. ### Part A - Creating a network The following code will create a neural network that inputs a state (one value) and outputs 4 values (one for each action), it will also assume 16 possible states (0-15): ```python import numpy as np import tensorflow as tf from tensorflow import keras from tensorflow.python.keras.layers import Dense x_data = np.linspace(0, 15, 16) normalizer = keras.layers.Normalization(input_shape=[1,], axis=None) normalizer.adapt(np.array(x_data)) model = keras.Sequential([ normalizer, Dense(64, activation='relu'), Dense(64, activation='relu'), Dense(4) ]) model.compile( optimizer=tf.optimizers.Adam(learning_rate=0.001), loss='mse' ) ``` Some of this code can be safely ignored (normalization and the compile method). **Tasks/Questions** - What is the design of this neural network? ### Part B - Training As we already have Q-Values, let us train the network on the data: ```python y_data = np.array([ [0.54, 0.53, 0.53, 0.52], [0.34, 0.33, 0.32, 0.50], [0.44, 0.43, 0.42, 0.47], [0.31, 0.31, 0.30, 0.46], [0.56, 0.38, 0.37, 0.36], [0., 0., 0., 0.], [0.36, 0.2, 0.36, 0.16], [0., 0., 0., 0.], [0.38, 0.41, 0.40, 0.59], [0.44, 0.64, 0.45, 0.40], [0.62, 0.50, 0.40, 0.33], [0., 0., 0., 0.], [0., 0., 0., 0.], [0.46, 0.53, 0.74, 0.50], [0.73, 0.86, 0.82, 0.78], [1, 1, 1, 1] ]) model.fit( x_data, y_data, epochs=50000, verbose=0) ``` **Tasks/Questions** - Test out the forward propagation, are the values similar to what you expect from a Q-table? - Plot the utility given optimal play. ### Part C - FrozenLake Given the model trained above and an optimal policy (argmax of output), can you move around the environment/solve the problem? ## Task 3 - DQN Given exercises from last week, we now only need an implementation of a replay-buffer to implement a DQN (Deep Q-network) agent. The replay-buffer needs two methods, one to store experiences (state, action, reward, next_state), and one to sample from the replay-buffer. **Implement these two methods** ## Task 4 - MountainCar Until now we have been working on the FrozenLake environment. Try to solve the [MountainCar](https://www.gymlibrary.ml/environments/classic_control/mountain_car/) environment using techniques we have learned in this course. I will update this page with a DQN-solution later (hopefully before the end of the day).