Virtual Reality (VR) is a multi-sensory technology that stimulates learning and has the potential for pedagogical applications. While researchers in VR have demonstrated several applications to support understanding and learning in STEM education, the research regarding which features of VR leverage learning is in its infancy. The existing studies exploring how learners interact with VR are based on human observations or learners’ perceptions. This paper describes a novel mechanism to capture learner’s interaction behavior, in the context of a mobile-based static VR to learn the human circulatory system. The data capturing mechanism is based on screen recordings of VR interaction, which is further annotated manually to form a time-sequenced action series. In a preliminary test conducted with three learners, the interaction data was analyzed based on the time spent in each action in the VR environment, frequently co-occurring actions, and sequence of actions. The test results are described and the implications of using such a mechanism to capture learners’ interaction behavior is discussed. We conclude that capturing data in this manner gives a rich and detailed profile of learners and enables use of various analytics methods to provide personalized and adaptive support to learners.