The present study argues for communicational unity of spoken language and gesture. In particular, it argues that textual structure, in addition to grammar, is the plane on which language and gesture, through the emerging indexical relationships between and within the two different kinds of signs, together reveal communicational intent. The study first discusses how the discursive textual structure is built up through various indexical relationships between and within the textual and gestural elements. We will see two analytically different arrangements of the linguistic and gestural signs. Indexical unity between speech and gesture realized by the performativity of speech as well as the tight synchronization tendency between the two signs is one, and poetic structure realized by the iconic-indexical relationships within each type of the signs is the other that we will semiotically examine here. It is emphasized that these two constitute, respectively, the vertical and horizontal threads that interweave one piece of fabric which we call a discursive textual structure. An example of such a structure is shown in an excerpt from a cartoon story narration by a Japanese female speaker and her interlocutor. The example is meant to demonstrate the significance and utility of the analytic framework promoted in the present study. It shows that what McNeill calls catchments, i.e., gesturally achieved poetic structure, help disambiguate the referent of the actor/subject NP of a verb phrase ‘ soko made kuru (come up to there),’ which is ambiguous because of an apparent violation of Hinds' ellipsis chain hypothesis by continuing to elide the subject NP when it is expected to be explicitly marked as a topic. Communicational intent, it is argued at last, emerges through the indexical relationship, at least in part, between and within the linguistic and gestural signs.