摘要:The effect of term weighting on selecting intrinsic dimensionality of data is discussed. Experiments are conducted, using different term weighting and dimensionality selection methods, on four testing document collections (namely Medline, Cranfield, CACM and CISI). The results point that transforming the data matrix using a term weighting scheme plays a vital role in identifying the intrinsic dimensionality.
关键词:Dimensionality selection; Latent semantic indexing;Ssingular value;decomposition; Term weighting.