The exploration-exploitation trade-off is a well-known problem that occurs in scenarios where a learning system has to repeatedly make a choice with uncertain pay-offs. In essence, the dilemma for a decision-making system that only has incomplete knowledge of the world is whether to repeat decisions that have worked well so far (exploit) or to make novel decisions, hoping to gain even greater rewards (explore). TowardsDataScience