Experiment by Antonino Perricone, more information here.
This is the "biped version".
Speed multipler:
You can add options inside the URL of this page, using a # or a ? to separe a "NAME=value" pair.
for example: (..)index.html#leg_1_len=70#straight_factor=1
Possible options are:
leg_1_len: length of first part of the leg (number, default: 50)
leg_2_len: length of second part of the leg (number, default: 50)
use_one_dof: if true the actions include only one angle change. (true/false, default: false)
movement_factor: reward component, the forward movement is added in reward calculation multiplied by. (number, default: 1)
straight_factor: reward component, the vertical position of "hip" is added in reward calculation multiplied by. (number, default: 0)
oscillate_factor: reward component, the vertical movement of "hip" is subtracted in reward calculation multiplied by. (number, default: 0)
gamma: γ discount-rate factor, see below (number, default: 0.999)
minError: error for stopping Q evalutation, see below (number, default: 0.01)
state_step: size in degree of states, It will be rounded to near dividend of 90 (number, default:22.5)
nRewardSave: statistic options, number of states to average to calculate current speed. (number, default: 100)
The value of a state is Q(s,a) = r(s,a) + γ maxa' Q(s',a'). Its calculation is iterative, it stops when the difference
between old calculated value and the new one is below than minError.
Therefore γ usually is near 1, but not greater or equal than 1. and minError is near 0 but not negative or 0.