摘要:We consider a model where an agent has a repeated decision to make and wishes to maximize their total payoff. Payoffs are influenced by an action taken by the agent, but also an unknown state of the world that evolves over time. Before choosing an action each round, the agent can purchase noisy samples about the state of the world. The agent has a budget to spend on these samples, and has flexibility in deciding how to spread that budget across rounds. We investigate the problem of choosing a sampling algorithm that optimizes total expected payoff. For example: is it better to buy samples steadily over time, or to buy samples in batches? We solve for the optimal policy, and show that it is a natural instantiation of the latter. Under a more general model that includes per-round fixed costs, we prove that a variation on this batching policy is a 2-approximation.
关键词:Online Algorithms; Value of Data; Markov Processes