Submitting Data

Introduction

The Common Metabolic Diseases Genome Atlas (CMDGA) team is here to help you submit your data to the CMDGA Portal. If you have data ready for submission, please reach out to our data manager at ysun@health.ucsd.edu to start the process. Once notified, the CMDGA team will provide you with an API access key, metadata collection instructions, and tools to facilitate your data submission.
 

Submission Process

1. API Access Key Pairs

When you're ready to submit data to CMDGA, our team will create a CMDGA user account for you. API access key pairs are required to authenticate users for data submission. Please provide an email address associated with a Gmail or GitHub account, and we will ensure you have the appropriate permissions for data submission.

To request the key pairs, log in using the link at the bottom right of the landing page. Once logged in, click on "Profile." On your "User Profile" page, select "Create Access Key." Your Access Key ID and Access Key Secret will appear in a pop-up window. Please note them down, as they will only be displayed once. If lost, new key pairs can be requested.

2. Collecting Metadata for Submission

Providing comprehensive and accurate metadata is crucial for upholding the rigorous standards set by the CMDGA and enhancing the Portal's value to the scientific community. The current data model encompasses various object types, such as human_donor, biosample, assay, annotation, etc.  each with specific metadata properties to accurately capture inter-component relationships. The CMDGA team will conduct a thorough review of all prepared metadata. Submitted data will be accessible to AMP consortium members or the submitting lab (if requested), but it will not be publicly released until the CMDGA team completes their review and receives approval from the submitting laboratory

The data model accommodates the submission of objects categorized under the following general groups: samples, donors, file sets, files, ontology terms, and additional categories.

 
CMDGA's data model can be accessed here - https://cmdga.org/profiles

Data Submission Tools

GoogleSheet (AppScript)

This tool is a web-based Google spreadsheet with an embedded script to facilitate data submission.

Each object type, also known as profiles, requires its own spreadsheet as it has its own set of metadata properties.

Submission Examples

Let's examine the annotation datatype, identify the necessary properties, determine the property types, and assign example values.

Annotation Example

Required properties must be provided to successfully submit an object record. Optional properties are recommended if available and applicable.

Example of required and optional properties for annotation can be found here:

<embed example sheet here>

Understanding Identifiers and the Importance of the Alias Identifier

For every object submitted to the CMDGA portal, the system automatically generates a unique identifier (UUID). For certain objects, an additional accession is generated, following the format D[SR|FF|DO][0-9]{4}[A-Z]{4}, where [SR|FF|DO] refer to the object type. For example, annotation and files will have accessions automatically generated as DSR[0-9]{4}[A-Z]{4} and DFF[0-9]{4}[A-Z]{4}, respectively.

IMPORTANT: While accessions and unique identifiers (UUIDs) are automatically generated and can be used to find your object of interest, we highly encourage the use of the alias property, another form of a unique identifier. Aliases are not assigned by the system and provide an opportunity for submitters to assign an identifier that makes sense for internal records, such as an identifier from the lab's LIMS system.

Aliases should be formatted as follows: ‘[lab name]:[chosen identifier]’ (e.g., john-doe).

*Note: These three types of IDs (UUID, accession, and aliases) can be used interchangeably to refer to an object in the spreadsheets used for object submission or modification.

Reviewing Submissions

After successful submission, appending the object type followed by an identifier of the object (such as UUID, accession, or alias) to the URL of the server will allow you to view your object.

Examples: appending an identifier to the URL

Updating Submitted Objects

If your objects contain metadata errors that need correction, you can easily patch your object property values. The first column header in your spreadsheet should be either accession (for Google Sheets Submitter) or record_id (for cmdga_utils). The properties to be updated should be specified in the subsequent columns.

Order of Submission Matters

IMPORTANT: The order of submission by object type matters! Objects can be related or linked to each other. The creation of these relationships depends on the proper order of submission. For example, a biosample object relates to a specific donor object (a unique identifier must be specified). Therefore, the donor(s) needs to be submitted first; otherwise, you will not be able to reference them upon submission, causing an error if the donor property is required. Current Order of submission <link>