1. Overview

1.1 nASAP Overview

nASAP is a comprehensive and efficient user-friendly nascent RNA sequencing data analysis platform, providing functions including quality control, signal extraction, pause site calculation, and regulatory network analysis of nascent RNA sequencing data.The platform is primarily implemented in Python and leverages several shell tools to form a variety of data analysis modules, which can be easily executed by users according to their requirements. nASAP is compatible with Linux, Mac, WSL, and Docker systems. The platform's documentation is continually updated and extensible, encompassing further analyses based on nASAP output.

1.2 Nascent RNA

1.2.1 Definitions:

The nascent or newly formed RNA synthesized by RNA polymerase II is called the primary transcript. It contains both non-coding sequences (introns) and sequences that would be encoded as proteins (exons). Primary transcripts are converted to functionally mature mRNAs after post-transcriptional processes (capping, tailing, and splicing).

1.2.2 Common nascent RNA sequencing techniques:
Omics Description Advantage Disadvantage
GRO-seq Nuclei are extracted, Br-UTP is added in vitro, and Br-labeled transcriptional RNA is purified and sequenced Classical, the first nascent RNA sequencing technique The resolution is only 30-50 bp (because Br-UTP only binds to A in ATCG.)
PRO-seq Nuclei are extracted, biotin-NTP is added in vitro, and biotoin-labeled transcriptional RNA is purified and sequenced Single-base resolution Only active PolII is detected, and paused PolII is not transcribed
NET-seq Grab PolII, get the RNA that binds to PolII, and sequence it RNA sequences are transcribed in vivo Performance depends on the quality of the PolII antibody, PolII's RNA is not well dissolved, and more cells need to be measured
TT-seq Transcribe the RNA with 4sU, break the RNA, purify the small fragment with 4sU and sequence it Avoid detection of previously transcribed RNA (non-nascent RNA) by interrupting the sequence There are too many operation, and it is easy to introduce human error
1.2.3 Application:

a. Nascent RNA sequencing data is utilized to uncover the transcriptional signatures of genes.
b. This technique accurately detects gene expression and enables real-time monitoring of transcriptional status.
c. Additionally, this method allows for the precise localization of transcription start sites and the identification of transcriptional direction of the gene.
d. By leveraging nascent RNA sequencing, researchers can identify new transcripts, including non-coding RNAs, that may have been overlooked using other methods.
e. Furthermore, this approach can identify active enhancer RNAs (eRNAs), which are regulatory transcripts involved in the control of gene expression.

1.2.4 Experimental design:

Two or more biological replicates each replicate has 25 million non-duplicate, non-mitochondrial aligned reads for single-end sequencing and 50 million for paired-ended sequencing use as few PCR cycles as possible when constructing the library paired-end sequencing is preferred.

1.3 other analysis tools

nf-core/nascent
Peppro

1.4 nASAP features

1.4.1 Robustness

A regular notebook, such as mine with six cores and 16 GB of memory, is sufficient for performing nascent RNA data analysis. We have tested over a dozen commonly used datasets and have confirmed that they can be analyzed using this tool. Additionally, we are using this same tool to analyze thousands of public nascent RNA analyses in our Transpause database project.

1.4.2 User-friendly

a. We offer a user submission server (https://grobase.top/nasap/) to provide a more accessible tool for biologists.
b. Our tool is designed to be simpler to use by reducing the use of dependencies, minimizing errors during installation. We provide easy-to-use pip and docker installation methods, which are the most popular software installation methods currently available. Our tool is also designed as a simple command-line tool, which can be used without any additional programming knowledge. The output is presented in a clear and friendly report format, which includes a description of each indicator, as well as tips for addressing potential issues.
c. We have well-documented the analytical logic of our tool based on a modular design, making it more accessible for biologists. In addition, we provide documentation for extended analysis options that can be generated based on the output of nASAP.

1.4.3 Rich features and better performance

We offer several analytical modules for nascent RNA analysis that provide more features and faster performance than Peppro. For best results, we recommend analyzing the bw file online due to the complex meta information. The user can upload it to the server for queued tasks.

1.5 nASAP module design

1.5.1 Systematic Quality Assessment

This module employs nascent RNA sequencing data as input and generates multiple levels of quality control results, mapping results, bigWig files, and visual analysis results. These results aid users in comprehending the quality of the sequencing data and utilizing bigWig files for other module analyses. Certain intermediate results, such as mapping files, can be utilized for various analyses, including identifying new transcripts and detecting potential sources of contamination.
assess

1.5.2 Transcription Level Quantification

The module takes bw files as input and calculates various analyses such as gene expression, RPKM, exon and intron density of protein-coding genes, and other related results. These analyses can be used for different purposes such as differential analyses or as biological quality controls.
assess

1.5.3 Pausing Analsis

This module utilizes bigWig files as input to determine the pause site of the entire genome and calculates the pausing index and elongation index.
pause_sites

1.5.4 Regulatory Network Construction

The module utilizes regulatory network data to combine signals from nascent RNA and obtain specific regulatory networks for nascent RNA. This approach allows for the identification of regulatory relationships at a genome-wide scale.
network