#158 · Reinforcement Learning · Medium
⊣ Solve on deep-ml.comImplement Epsilon-Greedy Action Selection for the n-armed bandit problem. With probability epsilon, select a random action (exploration); otherwise, select the action with the highest estimated value (exploitation).
import numpy as np
def epsilon_greedy(q_values: np.ndarray, epsilon: float) -> int:
if np.random.rand() < epsilon:
return np.random.randint(len(q_values))
else:
return int(np.argmax(q_values))