摘要:Natural Language Processing algorithms are resource demanding, especially when tuning to inflective language like Polish is needed. The paper presents time and memory requirements of part of speech tagging and clustering algorithms applied to two corpora of the Polish language. The algorithms are benchmarked on three high performance platforms of different architectures. Additionally sequential versions and OpenMP implementations of clustering algorithms were compared.
关键词:benchmarking; tagowanie częściami mowy; klasteryzacja dokumentów; przetwarzanie języka naturalnego; architektury wysokiej wydajności