Q-Learning has several distinctive features:
- Model-Free: Doesn't need a model of the environment (no need to know transition probabilities)
- Off-Policy: Learns the optimal policy independent of the agent's actions
- Tabular Method: Uses a Q-table to store state-action values (unlike policy gradient methods)
- Temporal Difference: Updates estimates based on other learned estimates (bootstrapping)
Compared to SARSA (another popular algorithm), Q-Learning is more aggressive in always updating towards the best possible action, while SARSA follows the current policy.