A learning paradigm where models learn by interacting with the environment and receiving feedback.

Used to build game playing AI’s where models are given a reward for choosing utility maximizing actions.

providing an algorithm with a set of rules and constraints and letting it learn how to achieve its goals using a reward function to penalize bad actions or reward good actions