Numerous mobile robotic applications require agents to persistently explore and exploit spatiotemporally varying, partially observable environments. Ultimately, the mathematical notion of regret, which quite simply represents the instantaneous or time-averaged difference between the optimal reward and realized reward, serves as a meaningful measure of how well the agents have exploited the environment. However, while numerous theoretical regret bounds have been derived within the machine learning community, restrictions on the manner in which the environment evolves preclude their...