In this paper we study learning procedures when counterfactuals (payo s of not-chosen actions) are not observed. The decision maker reasons in two steps: First, she updates her propensities for each action after every payo experience, where propensity is de ned as how much she prefers each action. Then, she transforms these propensities into choice probabilities. We introduce natural axioms in the way propensities are updated and the way propensities are translated into choice, and study the decision marker's behavior when such axioms are in place.