首页    期刊浏览 2025年12月26日 星期五
登录注册

文章基本信息

  • 标题:Web-Scale Language-Independent Cataloging of Noisy Product Listings forE-Commerce
  • 本地全文:下载
  • 作者:Pradipto Das ; Yandi Xia ; Aaron Levine
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2017
  • 卷号:2017
  • 页码:969-979
  • 语种:English
  • 出版社:ACL Anthology
  • 摘要:The cataloging of product listings through taxonomy categorization is a fundamental problem for any e-commerce marketplace, with applications ranging from personalized search recommendations to query understanding. However, manual and rule based approaches to categorization are not scalable. In this paper, we compare several classifiers for categorizing listings in both English and Japanese product catalogs. We show empirically that a combination of words from product titles, navigational breadcrumbs, and list prices, when available, improves results significantly. We outline a novel method using correspondence topic models and a lightweight manual process to reduce noise from mis-labeled data in the training set. We contrast linear models, gradient boosted trees (GBTs) and convolutional neural networks (CNNs), and show that GBTs and CNNs yield the highest gains in error reduction. Finally, we show GBTs applied in a language-agnostic way on a large-scale Japanese e-commerce dataset have improved taxonomy categorization performance over current state-of-the-art based on deep belief network models.
国家哲学社会科学文献中心版权所有