Pan-Cancer analysis of thousands of tumor genomes to identify cancer drivers
Identifying the complete list of genes involved in cancer development has been a major objective of cancer researchers for more than 30 years, as this is a first step towards the development of therapies that effectively and selectively target cancer genes to specifically kill tumour cells. In recent years systematic approaches to the quest for cancer genes have been undertaken. These involve sampling cancer genomes and sequencing most coding exons or the whole genome. Thousands tumour genomes are being sequenced in the world. Most have been generated as part of large projects and consortia, such as the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). When a tumour genome is sequenced, hundreds or even thousands of somatic mutations are detected; therefore, identifying which of those are involved in driving the tumourigenic process is a major challenge. With hundreds of available sequenced tumour genomes of each cancer type, this problem can be approached by identifying signals of positive selection in the pattern of mutations observed per gene across tumours. We have developed a computational approach to detect driver genes by combining multiple signals of positive selection and have applied it to 3,205 tumors from 12 different cancer types from the TCGA Pan-Cancer project (Weinstein et al., Nature Genetics 2013). We have identified 291 high confident cancer driver genes. Among those genes, some have not been previously identified as cancer drivers and 16 have clear preference to sustain mutations in one specific tumour type. The novel driver candidates complement our current picture of the emergence of these diseases (Tamborero et al., Scientific Reports 2013). In addition we have described IntOGen-mutations, a novel web platform for cancer genomes interpretation, which analyses not only TCGA pan-cancer data but also additional datasets generated by other initiatives such as those included within the ICGC. The resource allows users to identify driver mutations, genes and pathways acting on thousands of tumours from different cancer sites and to analyze newly sequenced tumor genomes and identify relevant mutations by putting them in the context of the accumulated knowledge (Gonzalez-Perez et al., Nature Methods 2013).