SRA uploads to NCBI
SRA uploads to NCBI
This post details the NCBI Sequence Read Archive upload for J. Ashey’s sediment stress project. This protocol is based on A. Huffmyer’s SRA upload post and Putnam Lab SRA upload protocol.
Overview
These sequences are from a project assessing sedimentation stress in Caribbean and Pacific corals. The project’s github can be found here.
Pacific corals (Montipora capitata, Pocillopora acuta, and Porites lobata) from Kāneʻohe Bay, Oʻahu, Hawaiʻi, were exposed to unsterilized terrigenous red soil for up to 7 days. Caribbean corals (Acropora cervicornis, Montestraea cavernosa, and Orbicella faveolata) from Key West, Florida, were exposed to sterilized white carbonate sediment for 18 days. After each experiment, fragments from each species were frozen and stored at -80°C until extraction.
RNA extractions were done with the Direct-zol™ RNA MiniPrep (Zymo Research; Cat# R2070) kit. For library prep, samples were diluted and processed following the TruSeq stranded mRNA Library Prep for NeoPrep kit (Document # 15049725 v03, Illumina) protocol. Quality controlled libraries were sequenced on HiSeq 50 cycle single read sequencing v4 by the High Throughput Genomics Core Facility at the University of Utah.
1. BioProject
I created a new submission on NCBI Submission Portal for a new BioProject.
- Provided all my info as the submitter
- Project type
- Project data type is raw sequence reads
- Sample scope is multispecies
- Target
- Not putting organism name because I am submitting reads for multiple species
- Under multispecies description, add species names from the study (Montipora capitata, Pocillopora acuta, Porites lobata, Acropora cervicornis, Montestraea cavernosa, and Orbicella faveolata)
- General info
- Release date set as Dec. 20, 2022 to allow time for edits
- Project title is “Characterizing transcriptomic responses to sedimentation across location and morphology in reef-building corals”
- Project description is “Gene expression in response to sedimentation across location (Florida, Hawai’i) and morphology (branching, intermediate, massive). Data includes RNAseq (gene expression) sequences from 6 reef-building coral species (Montipora capitata, Pocillopora acuta, Porites lobata, Acropora cervicornis, Montestraea cavernosa, and Orbicella faveolata)
- Project is not part of a larger initiative already registered with NCBI
- Grants associated with this project:
-
1939795 and # 1939263 Harnessing the Data Revolution; National Science Foundation
-
Submitted at 2:18PM 20221213 to NCBI. Submission ID is SUB12414011; BioProject ID is PRJNA911752.
2. BioSamples
Using the Invertebrate attribute table b/c I’m uploading adult coral samples. I deleted some of the columns that I wasn’t using.
Trying to submit Invertebrate attribute file, but I keep getting errors. This is an example of the rows in my attribute table:
*sample_name | sample_title | bioproject_accession | *organism | isolate | breed | host | isolation_source | *collection_date | *geo_loc_name | *tissue | collected_by | dev_stage | env_broad_scale | host_tissue_sampled | identified_by | lat_lon | description | Treatment | SequencingID | SedimentType | AnalyzedBy | Permit No. | Grants |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
17_ctl2_Of_ZTH | JA1 | PRJNA911752 | Obicella faveolata | Coral host | Not applicable | Not applicable | Not applicable | 9-Jun-16 | Florida Keys; USA: Florida: Fort Pierce | Whole organism | Francois Seneca | Adult | coral reef [ENVO:00000150] | Whole host organism | Ellis and Solander | 27.460350N, -80.311073 | Adult life stage sample of Orbicella faveolata collected from Key West National Marine Sanctuary, Florida coral reef environment. Individuals were transported to Fort Pierce, Florida, for exposure to sterilized white carbonate sediment for 18 days. Samples were preserved in liquid nitrogen and stored at -80°C until processing. | Control | 14004RX1 | Sterilized coral rubble sediment | Jill Ashey | #FKNMS-2016-017 | NSF HDR Awards #1939795 and #1939263 |
18_T3.3_Of_VLL | JA2 | PRJNA911752 | Obicella faveolata | Coral host | Not applicable | Not applicable | Not applicable | 9-Jun-16 | Florida Keys; USA: Florida: Fort Pierce | Whole organism | Francois Seneca | Adult | coral reef [ENVO:00000150] | Whole host organism | Ellis and Solander | 27.460350N, -80.311073 | Adult life stage sample of Orbicella faveolata collected from Key West National Marine Sanctuary, Florida coral reef environment. Individuals were transported to Fort Pierce, Florida, for exposure to sterilized white carbonate sediment for 18 days. Samples were preserved in liquid nitrogen and stored at -80°C until processing. | Treatment 3 (300 mg/L) | 14005RX1 | Sterilized coral rubble sediment | Jill Ashey | #FKNMS-2016-017 | NSF HDR Awards #1939795 and #1939263 |
I kept the green fields (mandatory), filled out some blue fields or put “Not applicable”, and filled in some of the yellow (optional) fields. I deleted the yellow columns that I wasn’t using. I also added some of my own attribute columns that are specific to my experiment: Treatment, SequencingID, SedimentType, AnalyzedBy, Permit No., Grants
My attribute file is here. When I try to upload it to the Attributes page on the BioSample submission page, it gives me this error:
“Your table upload failed because multiple BioSamples cannot have identical attributes. You should have one BioSample for each specimen, and each of your BioSamples must have differentiating information (excluding sample name, title, bioproject accession and description). This check was implemented to encourage submitters to include distinguishing information in their samples. If the distinguishing information is in the sample name, title or description, please recode it into an appropriate attribute, either one of the predefined attributes or a custom attribute you define. If it is necessary to represent true biological replicates as separate BioSamples, you might add an ‘aliquot’ or ‘replicate’ attribute, e.g., ‘replicate = biological replicate 1’, as appropriate. Note that multiple assay types, e.g., RNA-seq and ChIP-seq data may reference the same BioSample if appropriate.” And then it gives me a list of sample names.
I’m not sure what this means?? I deleted the unused columns, tried relabeling the sample names, added unique letters to the end of the sample name, etc. This is the confusing part: “You should have one BioSample for each specimen, and each of your BioSamples must have differentiating information (excluding sample name, title, bioproject accession and description)”. ??????????????
20230106 Seemed to have fixed this problem while zooming with Ariana in December. Issue was mostly with formatting, as Excel sometimes applies its own formatting to the data. I had to redo some formatting things (dates, locations) to make sure that it could be properly be uploaded to NCBI. This link is a good overview of the possible BioSample Attributes and how to format them in the Excel sheet.
This is an updated example of the rows in my attribute table:
*sample_name | sample_title | bioproject_accession | *organism | isolate | breed | host | isolation_source | *collection_date | *geo_loc_name | *tissue | collected_by | dev_stage | env_broad_scale | host_tissue_sampled | identified_by | lat_lon | description | Treatment | SequencingID | medium | AnalyzedBy | Permit No. | Grants | Identifier |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
17_ctl2_Of_ZTH | JA1 | PRJNA911752 | Obicella faveolata | Coral host | Not applicable | Not applicable | Not applicable | 9-Jun-16 | USA: Florida Keys Fort Pierce Florida | Whole organism | Francois Seneca | Adult | coral reef [ENVO:00000150] | Whole host organism | Ellis and Solander | 27.460350 N, -80.311073 W | Adult life stage sample of Orbicella faveolata collected from Key West National Marine Sanctuary, Florida coral reef environment. Individuals were transported to Fort Pierce, Florida, for exposure to sterilized white carbonate sediment for 18 days. Samples were preserved in liquid nitrogen and stored at -80°C until processing. | Control | 14004RX1 | Sterilized coral rubble sediment | Jill Ashey | #FKNMS-2016-017 | NSF HDR Awards #1939795 and #1939263 | 17_ctl2_Of_ZTH |
18_T3.3_Of_VLL | JA2 | PRJNA911752 | Obicella faveolata | Coral host | Not applicable | Not applicable | Not applicable | 9-Jun-16 | USA: Florida Keys Fort Pierce Florida | Whole organism | Francois Seneca | Adult | coral reef [ENVO:00000150] | Whole host organism | Ellis and Solander | 27.460350 N, -80.311073 W | Adult life stage sample of Orbicella faveolata collected from Key West National Marine Sanctuary, Florida coral reef environment. Individuals were transported to Fort Pierce, Florida, for exposure to sterilized white carbonate sediment for 18 days. Samples were preserved in liquid nitrogen and stored at -80°C until processing. | Treatment 3 (300 mg/L) | 14005RX1 | Sterilized coral rubble sediment | Jill Ashey | #FKNMS-2016-017 | NSF HDR Awards #1939795 and #1939263 | 18_T3.3_Of_VLL |
Submitted BioSamples at 12:05 on 1/6/23 under submission number SUB12414115.
The BioSamples were approved under the following numbers:
3. SRA - RNASeq gene expression
First, set up folder in Andromeda that contains symlinks to only the raw sequence files that we want to upload to NCBI.
cd /data/putnamlab/jillashey/Francois_data
mkdir raw_file_rnaseq_sra
cd raw_file_rnaseq_sra
# Symlink FL samples
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/17_ctl2_Of_ZTH_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/18_T33_Of_VLL.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/19_T33_Ac_WK.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/20_T12_Mc_PWC.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/21_T33_Mc_EOU.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/22_ctl2_Mc_TWF_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/23_ctl1_Of_CTX_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/24_T12_Ac_FM.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/25_ctl1_Ac_GF_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/26_T12_Of_WCL.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/27_ctl2_Ac_YG_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/28_ctl1_Mc_GBM_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/29_T23_Mc_PND.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/30_T23_Of_RPG.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/31_T22_Ac_UV.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/32_T22_Of_EVR.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/33_T43_Mc_RFV.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/34_T22_Mc_SVS.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/35_T43_Ac_MT.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/36_T43_Of_JJN.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/37_T13_Ac_ML.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/38_T23_Ac_IN.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/39_T13_Mc_FJE.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/40_T13_Of_GWS.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/41_ctl3_Ac_RN_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/42_ctl3_Mc_MGR_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/43_ctl3_Of_JVP_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/44_T41_Of_PVT_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/45_T41_Ac_SC_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/46_T41_Mc_QYH_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/47_T31_Ac_JB.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/48_T31_Of_JNO.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/49_T31_Mc_SWQ.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/50_T21_Of_YZB.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/51_T42_Of_UOF.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/52_T11_Ac_II.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/53_T21_Ac_NH.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/54_T42_Ac_JQ.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/55_T32_Mc_TWP.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/56_T42_Mc_JAW.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/57_T32_Ac_NM.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/58_T21_Mc_EAH.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/59_T11_Of_TQP.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/60_T32_Of_WXY.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Florida/data/raw/61_T11_Mc_RAP.fastq .
# Symlink HI samples
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/1_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/2_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/4_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/11_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/28_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/35_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/36_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/38_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/39_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/41_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/42_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/47_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/6_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/7_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/8_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/9_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/21_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/22_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/23_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/25_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/26_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/27_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/29_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/34_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/5_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/10_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/13_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/14_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/15_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/16_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/19_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/20_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/40_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/45_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/48_2.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/3_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/12_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/17_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/18_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/24_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/30_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/32_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/33_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/37_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/43_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/44_1.fastq .
ln -s /data/putnamlab/jillashey/Francois_data/Hawaii/data/raw/46_1.fastq .
The metadata information for the RNASeq sequences can be found here.
The path for downloading is /data/putnamlab/jillashey/Francois_data/raw_file_rnaseq_sra
.
To upload files, log on to Andromeda and enter the following:
cd /data/putnamlab/jillashey/Francois_data/raw_file_rnaseq_sra/
ftp -i
open ftp-private.ncbi.nlm.nih.gov
# enter name and password given on SRA webpage
cd uploads/jillashey_uri.edu_tLoKCBDA
mkdir sedstress_upload_rnaseq
cd sedstress_upload_rnaseq
mput *
The upload to SRA will proceed for each file with messages “transfer complete” when each is uploaded. Keep computer active until all uploads are finished.
Continue with the submission by selecting the preload folder on SRA.
RNAseq sequence files were submitted under SUB12548346.