摘要:We study two aspects of information semantics: (i) the collection of all relationships, (ii) tracking and spotting anomaly and change. The first is implemented by endowing all relevant information spaces with a Euclidean metric in a common projected space. The second is modeled by an induced ultrametric. A very general way to achieve a Euclidean embedding of different information spaces based on cross-tabulation counts (and from other input data formats) is provided by Correspondence Analysis. From there, the induced ultrametric that we are particularly interested in takes a sequential – e.g. temporal – ordering of the data into account. Following a review of approaches adopted in the analysis of filmscript we look at how similar approaches can be applied to the scholarly literature.
关键词:Correspondence Analysis, hierarchical clustering, contiguity constrained clustering, text analysis