# 1. What happens if the temporal difference algorithm of Problem 13 plays tic-tac-toe against itself?

1. What happens if the temporal difference algorithm
of Problem 13 plays tic-tac-toe against itself?

2. Analyze Samuels checker playing program from a
reinforcement learning perspective. Sutton and Barto (1998, Section 11.2) offer
suggestions in this analysis.

3. Can you analyze the inverted pendulum problem, presented
in Section 9.2.2 from a reinforcement learning perspective? Build some simple
reward measures and use the temporal difference algorithm in your analysis.

4. Another problem type excellent for reinforcement
learning is the so-called gridworld. We present a simple 4 _4
gridworld in Figure 10.26. The two greyed corners are the desired terminal
states for the agent. From all other states, agent movement is either up, down,
left, or right. The agent cannot move off the grid: attempting to leaves the
state unchanged. The reward for all transitions, except to the terminal states,
is _1.
Work through a sequence of grids that produce a solution based on the temporal
difference algorithm presented in Section 10.7.2

