|
Loading...
OmicSoft Server on the Cloud Tutorial
Go Back
Quick tutorial on analyzing NGS data with OmicSoft Server on the Cloud
With OmicSoft Server + Server on the Cloud can execute parallel NGS analyses on your AWS Cloud environment. Data stored on your S3 buckets are available for analysis, just like any other data on your server.
This add-on feature only requires one-time configuration by your OmicSoft Server administrator, described on OmicSoft Server Configuration with Cloud .This feature can work with OmicSoft Server installations on physical machines, or Cloud-based virtual machines, as long as the machines can reach AWS services.
OmicSoft Server on the Cloud basics
The basic logic for OmicSoft Server on the Cloud to trigger cloud-based NGS analysis is:
- Input is S3-based data within a mapped S3 folder
- Output is a mapped S3 folder
- OmicSoft Server creates temporary cloud compute “EC2” instances in your AWS environment, transfers input data from S3 to EC2, performs analysis, and transfers output data to S3
- Summarizations and analysis tables will be automatically downloaded to your OmicSoft Server project
- Each sample will be analyzed on a separate EC2 instance; there is rarely a reason to limit the number of parallel jobs
Example Server on the Cloud workflow
Create or Open a Server-based project
- Connect to your OmicSoft Server
- Open or create a new Server-based project (in the Analysis tab)
- Upload or locate your data on your mapped S3 bucket (in the Server tab, with Server Files)
Transfer Files to Server cloud
Before running server jobs on cloud, you will upload the data files to a Cloud folder, or locate data already transferred. Go to Server File | Browse Files window. It’s best practice for OmicSoft Administrators to name mapped S3 folders with “Cloud” in the folder, so you can differentiate Cloud S3 folders from network-mapped folders. In this example, the folder is named “SGECloudFolder”, and Vivian has created a subfolder to store input data.
Uploading files from your computer
If data are not already in the Cloud S3 folder, click Upload to transfer data from your local computer.
Downloading files from NCBI SRA
If you want to analyze data submitted to NCBI SRA, ERA, etc., you can quickly download to your Cloud bucket with Download FASTQ files from NCBI SRA
Direct transfer to S3 bucket (Advanced Users)
Mapped cloud folders are your own S3 buckets; experienced AWS users can directly transfer data into the S3 location, and data will be immediately available for analysis in OmicSoft Server on the Cloud.
Run your Cloud analysis
To run a cloud-based analysis, simply choose your NGS analysis module (such as Download NCBI SRA FASTQ, OmicSoft RNAseq pipeline function , Report Gene/Transcript Counts, etc), and be sure to specify S3 locations for both Input and Output locations.
OmicSoft Server will automatically transfer your S3 input data to compute instances, perform the analysis, and return output data to your S3 output location.
Summary files will be automatically loaded into your OmicSoft Suite project.
After sending the data to queue, the job progress could be monitored the same way as server project:
Run Multiple Jobs on Cloud
With Cloud-based analyses, you can specify as many parallel jobs as the number of samples, and each sample will be analyzed on a separate instance. Because AWS charges by the minute, not by the instance this will not cost any more than running a single parallel job, and this makes it much faster to perform the analyses.
The users can right click on the job and select View Full Log:
In the Log window, as you can see, the jobs are being submitted to cloud NGS instances, 2 cloud instances will be started as we have two samples to align:
You can monitor the Server Jobs tab for progress of your cloud job.
Continue your analysis
After your job is complete, your OmicSoft Suite project will say “Update project”. Click the Update project button to synchronize your local OmicSoft Studio with the latest output data.
Congratulations! Now you can successfully run server projects on cloud!
Example Cloud Workflow: GSE91061 re-analysis
Step 1: Create a Server project in OmicSoft Server
Step 2: Download FASTQ files from NCBI SRA
SRP094781 is the SRA project containing all raw FASTQ files for GSE91061. SRP094781 contains 109 samples (208 files, paired-end data) with RNA-seq data, and can be downloaded as 109 parallel EC2 jobs.
Step 3: OmicSoft RNA-Seq analysis
Find the downloaded data in your Cloud-mapped folder. You can specify 109 parallel jobs, which will launch 109 parallel alignments, then quantify the data and return all tabular results to your project for visualization and analysis.
The OmicSoft RNA-seq pipeline will align your data to the genome of your choice, quantify expression to your gene model, identify exons and exon junctions, fusions, and mutations.
|