The Common Metabolic Diseases Genome Atlas (CMDGA) provides epigenomic and other functional genomic data to promote understanding of the underlying genetic basis of common metabolic diseases. CMDGA is developed at the University of California San Diego as part of the AMP-CMD consortium of academic, industry and non-profit institutions worldwide.
This document describes the data available in the portal and how to access and download them.
For questions about Common Metabolic Diseases Genome Atlas data, please email the team.
Information available on the portal
The portal consists of functional genomics data from tissues and cells relevant to common metabolic diseases, including:
- Metadata, raw and processed data files for experimental assays
- Metadata and data files for annotations
- Metadata and data files for single cell embeddings
- Metadata and data files for gene perturbation datasets
- Metadata and data files for statistical models
- Detailed pipelines and software used to process datasets
The information and data in the portal can be accessed by:
- The Search Tools on the Homepage and menus of the portal for browsing experimental assays, annotations, and raw and processed data files
- Variant Search tool allows querying of genetic variants, genomic regions, and genes for overlap with experimental files
- The Single cell browser displays cell embeddings in two dimensions based on their gene expression or accessible chromatin profiles, using the cellxgene software
- The Gene expression profiling application displays expression profiles across cell types or other dimensions derived from single cell expression assays
Explore the Portal and Tools
Datasets can be explored with data specific Search Tools on the homepage or viewed as a user-friendly matrix that is accessible through the Data menu
Experiments: Molecular assays generated by a research group. Experiments can be filtered by the following categories on the left sidebar of the experiments page: assay category, assay, experiment status, target of assay, lab, and replication type.
Annotations: analytical distillations of experimental data generated by software tools. Annotations can be filtered by annotation type, underlying assay, biosample and lab.
Single cell embeddings: genomic profiles of individual cells derived from multiple experiments used to visualize single cells in reduced dimension space and facilitate analyses of single cell profiles in common software packages. The h5ad files are composed of a cell by feature (e.g. gene, peak) matrix, metadata such as cluster labels for barcodes, and the embeddings (e.g., tSNE, UMAP, PCA). Embeddings can be filtered by embedding type, biosample, lab, and underlying assay.
Gene perturbations: manipulations of genes in cell or animal models for example using siRNA/shRNA, CRISPR editing or expression vectors. Gene perturbation studies in CMDGA can be filtered by type, assay throughput, target of assay, and biosample.
Statistical models: statistical models, for example PyTorch for machine learning models, which can be used to model functional genomics data.
Annotate Genetic Variants
The Variant Search takes an rs Id or single basepair region as input
This variant search returns
- Peaks that intersect - coordinates, state and value
- Files that created those peaks (bed)
- Annotations with facets that those files belong to
Variant search results can be filtered based on annotation type, biosample or file type and are downloadable as a TSV file via the "Download Element" button.
Single Cell Browser
The primary visualization is a scatterplot of cells embedded in two dimensions based on their gene expression values or spatial coordinates. The visualization is filterable based on meatadata, expression and embedding. Features such as cross-filtering, subsetting, coloring, computing differential expression (e.g., t-tests) and computationally re-embedding subpopulations of the data enables indentification of relationships between subpopulations of cells, gene expression, and cell metadata. Painting the scatterplot by a variable automatically renders its conditional distribution across each other variable such as donor contribution to different cell types, case v/s control etc. Plot in right pannel are histogram of the number of cells that have a specific log fold change for the selected/searched gene. UMAP can be colored by gene expression values to identify cell subpopulations based on “marker genes”. When marker genes are not known a priori, cellxgene’s interactive computation capabilities to calculate differential expression (e.g., t-tests) between any arbitrary set of cells (e.g., “how do donors differ?” or “how is cluster 1 different from another subset of the data?”). This process of data partitioning, cross-characterization, and inference is often repeated many times over different facets of the data to gain insights into single cell biology.
Gene Expression Browser
The dotplot is intuitive way of visualizing how feature expression changes across different identity classes (cell or tissue clusters). The size of the dot encodes the percentage of cells within a class, while the color encodes the AverageExpression level across all cells within a class. When we have TPM value, both the size of the dot and the color encode TPM value. The user can select multiple marker genes and clusters (cell or tissue) from the lefthand panel menu. The encoded color may be changed to a preferred color scheme for publication quality images that can be downloaded as SVG image files. In addition, the corresponding data can be visualized in tabular format and downloaded as a CSV file.
The maker genes are displayed as rows and cells/tissues are shown as columns. By default the heatmap displays the top most gene expression for a particular cell type and corresponding expression of those genes in the remaining cell types. The lefthand panel has a menu to select cell types and a slider bar to display the top # of expressed genes. Scale and cluster the heatmap either by genes or cell types. In addition, the corresponding data can be visualized in tabular format and downloaded as a CSV file.
Diabetes Epigenome Atlas (DGA) and T2D Knowledge Portal
View the video from September 26, 2019 webinar: Diabetes Epigenome Atlas (DGA) and T2D Knowledge Portal: connected and complementary resources for research on the genomic and genetic basis of T2D.
The following ontologies are used for metadata: