首页    期刊浏览 2024年11月24日 星期日
登录注册

文章基本信息

  • 标题:A merged lung cancer transcriptome dataset for clinical predictive modeling
  • 本地全文:下载
  • 作者:Su Bin Lim ; Swee Jin Tan ; Wan-Teck Lim
  • 期刊名称:Scientific Data
  • 电子版ISSN:2052-4463
  • 出版年度:2018
  • 卷号:5
  • DOI:10.1038/sdata.2018.136
  • 语种:English
  • 出版社:Nature Publishing Group
  • 摘要:The Gene Expression Omnibus (GEO) database is an excellent public source of whole transcriptomic profiles of multiple cancers. The main challenge is the limited accessibility of such large-scale genomic data to people without a background in bioinformatics or computer science. This presents difficulties in data analysis, sharing and visualization. Here, we present an integrated bioinformatics pipeline and a normalized dataset that has been preprocessed using a robust statistical methodology; allowing others to perform large-scale meta-analysis, without having to conduct time-consuming data mining and statistical correction. Comprising 1,118 patient-derived samples, the normalized dataset includes primary non-small cell lung cancer (NSCLC) tumors and paired normal lung tissues from ten independent GEO datasets, facilitating differential expression analysis. The data has been merged, normalized, batch effect-corrected and filtered for genes with low variance via multiple open source R packages integrated into our workflow. Overall this dataset (with associated clinical metadata) better represents the diseased population and serves as a powerful tool for early predictive biomarker discovery.
国家哲学社会科学文献中心版权所有