标题:The Effect of Article Characteristics on Citation Number in a Diachronic Dataset of the Biomedical Literature on Chronic Inflammation: An Analysis by Ensemble Machines
摘要:Citations are core metrics to gauge the relevance of scientific literature. Identifying features that can predict a high citation count is therefore of primary importance. For the present study, we generated a dataset of 121,640 publications on chronic inflammation from the Scopus database, containing data such as titles, authors, journal, publication date, type of document, type of access and citation count, ranging from 1951 to 2021. Hence we further computed title length, author count, title sentiment score, number of colons, semicolons and question marks in the title and we used these data as predictors in Gradient boosting, Bagging and Random Forest regressors and classifiers. Based on these data, we were able to train these machines, and Gradient Boosting achieved an F1 score of 0.552 on classification. These models agreed that document type, access type and number of authors were the best predicting factors, followed by title length.
关键词:citations; title; machine learning citations ; title ; machine learning