I use tee command to display the output of the program while redirect it to a log file, which enable me to do the analysis easily.
First let see the complete output when the program does a test of the network.
we can see that for each 250000 steps, the training process will perform a test, and output the results, including the average reward the agent get for playing the game.
We can make it more straight-forward by plot the result.
I extract the steps vs average reward from the above results using some simple Linux shell command
Then we can copy it to the excel and graph.
Here is the result.
You can see it learning fast at beginning, but learning much slowly when approaching 5 million steps.
I will make the training process continue, wish it will learning much better in the coming days.
