首页    期刊浏览 2024年10月04日 星期五
登录注册

文章基本信息

  • 标题:EFFECTIVE SEMANTIC TEXT SIMILARITY METRIC USING NORMALIZED ROOT MEAN SCALED SQUARE ERROR
  • 本地全文:下载
  • 作者:ISSA ATOUM ; MARUTHI ROHIT AYYAGARI
  • 期刊名称:Journal of Theoretical and Applied Information Technology
  • 印刷版ISSN:1992-8645
  • 电子版ISSN:1817-3195
  • 出版年度:2019
  • 卷号:97
  • 期号:12
  • 页码:3436-3447
  • 出版社:Journal of Theoretical and Applied
  • 摘要:The Pearson correlation is a performance measure that indicates the extent to which two variables are linearly related. When Pearson is applied to the semantic similarity domain, it shows the degree of correlation between scores of dataset test-pairs, the human and the observed similarity scores. However, the Pearson correlation is sensitive to outliers of benchmark datasets. Although many works have tackled the outlier problem, little research has focused on the internal distribution of the benchmark dataset�s bins. A representative and well-distributed text benchmark dataset embody a wide range of similarity scores values; therefore, the benchmark dataset could be considered a cross-sectional dataset. Although a perfect text similarity method could report a high Pearson correlation, the standard Pearson correlation is unaware of correlated individual text pairs in a single dataset�s cross-section due to outliers. Therefore, this paper proposes the normalized mean scaled square error method, inferred from the standard scaled error to eliminate the outliers. The newly proposed metric was applied to five benchmark datasets. Results showed that the metric is interpretable, robust to outliers, and competitive to other related metrics.
  • 关键词:Pearson; Absolute Error; Text Similarity; Correlation; Scaled Square Error; Outliers
国家哲学社会科学文献中心版权所有