Getting Started
The Common Metabolic Diseases Genome Atlas (CMDGA) provides epigenomic and other functional genomic data to promote understanding of the underlying genetic basis of common metabolic diseases. CMDGA is developed at the University of California San Diego as part of the AMP®-CMD consortium of academic, industry and non-profit institutions worldwide.
This document describes the data available in the portal and how to access and download them.
For questions about Common Metabolic Diseases Genome Atlas data, please email the team.
Information available on the portal
The portal consists of functional genomics data from tissues and cells relevant to common metabolic diseases, including:
- Metadata, raw and processed data files for experimental assays
- Metadata and data files for annotations
- Metadata and data files for single cell embeddings
- Metadata and data files for gene perturbation datasets
- Metadata and data files for statistical models
- Detailed pipelines and software used to process datasets
The information and data in the portal can be accessed by:
- The Search Tools on the Homepage and menus of the portal for browsing experimental assays, annotations, and raw and processed data files
- The Single cell browser displays cell embeddings in two dimensions based on their gene expression or accessible chromatin profiles, using the cellxgene software
Explore the Portal and Tools
Datasets can be explored with data specific Search Tools on the homepage or viewed as a user-friendly matrix that is accessible through the Data menu
Data types:
Experiments: Molecular assays generated by a research group. Experiments can be filtered by the following categories on the left sidebar of the experiments page: assay category, assay, experiment status, target of assay, lab, and replication type.
Annotations: analytical distillations of experimental data generated by software tools. Annotations can be filtered by annotation type, underlying assay, biosample and lab.
Single cell embeddings: genomic profiles of individual cells derived from multiple experiments used to visualize single cells in reduced dimension space and facilitate analyses of single cell profiles in common software packages. The h5ad files are composed of a cell by feature (e.g. gene, peak) matrix, metadata such as cluster labels for barcodes, and the embeddings (e.g., tSNE, UMAP, PCA). Embeddings can be filtered by embedding type, biosample, lab, and underlying assay.
Gene perturbations: manipulations of genes in cell or animal models for example using siRNA/shRNA, CRISPR editing or expression vectors. Gene perturbation studies in CMDGA can be filtered by type, assay throughput, target of assay, and biosample.
Statistical models: statistical models, for example PyTorch for machine learning models, which can be used to model functional genomics data.
Single Cell Browser
The primary visualization is a scatterplot of cells embedded in two dimensions based on their gene expression values or spatial coordinates. The visualization is filterable based on meatadata, expression and embedding. Features such as cross-filtering, subsetting, coloring, computing differential expression (e.g., t-tests) and computationally re-embedding subpopulations of the data enables indentification of relationships between subpopulations of cells, gene expression, and cell metadata. Painting the scatterplot by a variable automatically renders its conditional distribution across each other variable such as donor contribution to different cell types, case v/s control etc. Plot in right pannel are histogram of the number of cells that have a specific log fold change for the selected/searched gene. UMAP can be colored by gene expression values to identify cell subpopulations based on “marker genes”. When marker genes are not known a priori, cellxgene’s interactive computation capabilities to calculate differential expression (e.g., t-tests) between any arbitrary set of cells (e.g., “how do donors differ?” or “how is cluster 1 different from another subset of the data?”). This process of data partitioning, cross-characterization, and inference is often repeated many times over different facets of the data to gain insights into single cell biology.
Other resources
Diabetes Epigenome Atlas (DGA) and T2D Knowledge Portal
View the video from September 26, 2019 webinar: Diabetes Epigenome Atlas (DGA) and T2D Knowledge Portal: connected and complementary resources for research on the genomic and genetic basis of T2D.
Ontologies
The following ontologies are used for metadata: