摘要:Machine learning and text mining offer new models for text analysis in the humanities by searching for meaningful patterns across many hundreds or thousands of documents. In this study, we apply comparative text mining to a large database of 20th century Black Drama in an effort to examine linguistic distinctiveness of gender, race, and nationality. We first run tests on the plays of American versus non-American playwrights using a variety of learning techniques to classify these works, identifying those which are incorrectly classified and the features which distinguish the plays. We achieve a significant degree of performance in this cross-classification task and find features that may provide interpretative insights. Turning our attention to the question of gendered writing, we classify plays by male and female authors as well as the male and female characters depicted in these works. We again achieve significant results which provide a variety of feature lists clearly distinguishing the lexical choices made by male and female playwrights. While classification tasks such as these are successful and may be illuminating, they also raise several critical issues. The most successful classifications for author and character genders were accomplished by normalizing the data in various ways. Doing so creates a kind of distance from the text as originally composed, which may limit the interpretive utility of classification tools. By framing the classification tasks as binary oppositions (male/female, etc), the possibility arises of stereotypical or lowest common denominator results which may gloss over important critical elements, and may also reflect the experimental design. Text mining opens new avenues of textual and literary research by looking for patterns in large collections of documents, but should be employed with close attention to its methodological and critical limitations.