If a concept class can be represented with a certain amount of memory, can it be efficiently learned with the same amount of memory? What concepts can be efficiently learned by algorithms that extract only a few bits of information from each example? We introduce a formal framework for studying these questions, and investigate the relationship between the fundamental resources of memory or communication and the sample complexity of the learning task. We relate our memory-bounded and communication-bounded learning models to the well-studied statistical query model. This connection can be leveraged to obtain both upper and lower bounds: we show several strong lower bounds on learning parity functions with bounded communication (for example, that any multi-round multiparty protocol for learning parity functions over length n inputs in which each party receives a list of n 4 examples but is limited to at most n 16 bits of communication, requires an exponential number of parties), as well as the first upper bounds on solving generic sparse linear regression problems with limited memory.