We study preference learning through recommendations in multi-agent game settings, where a moderator repeatedly interacts with agents whose utility functions are unknown. In each round, the moderator issues action recommendations and observes whether agents follow or deviate from them. We consider two canonical behavioral feedback models-best response and quantal response-and study how the information revealed by each model affects the learnability of agents' utilities. We show that under quantal-response feedback the game is learnable, up to a positive affine equivalence class, with logarithmic sample complexity in the desired precision, whereas best-response feedback can only identify a larger set of agents' utilities. We give a complete geometric characterization of this set. Moreover, we introduce a regret notion based on agents' incentives to deviate from recommendations and design an online algorithm with low regret under both feedback models, with bounds scaling linearly in the game dimension and logarithmically in time. Our results lay a theoretical foundation for AI recommendation systems in strategic multi-agent environments, where recommendation compliances are shaped by strategic interaction.
Abstract:
Publication date:
February 19, 2026
Publication type:
Preprint
Citation:
Alanqary, A., Baba, Z., Wu, M., & Bayen, A. M. (2026). Learning to Recommend in Unknown Games (arXiv:2602.16998). arXiv. https://doi.org/10.48550/arXiv.2602.16998