Most fundamentally come down to the neurones connections and behaviour can be modelled as minimizing prediction error in their stimulus.
Dopamine encoding an outcome compared to a baseline expectation is very close to tempeoal difference learning.
This is basically what slot machines exploit with their variable reward schemes.