Human RNA MaP

The data presented here were generated by DMS Transcriptome-wide RNA accessibility mapping by sequencing (DMS-TRAM-seq) performed in human U2OS cells (female). DMS methylation of solvent-accessible adenine and cytosine RNA bases, followed by the conversion of these modified bases to mismatches during reverse transcription, results in an RNAseq library containing a mutational profile (MaP) reflective of nucleotide-resolution solvent accessibility data. This MaP can be used to constrain base-pairing probabilities during secondary structure prediction. See About for more information on the experimental and analysis protocols, as well as best practices for secondary structure prediction.

Here, we allow users to take advantage of this comprehensive dataset in order to predict the secondary structure of an RNA region of interest, using constraints reflective of the RNA in cellular conditions. Several control examples are provided, and the raw data and code used to process the data are available under Download .

Browse structured elements

Browse example structures

More Information

View preprint

Download

How to search by gene

Search by gene name (Gencode) to find the gene of interest. In rare instances, no standard symbol exists for an annotated gene; you can also search by gene ID (using Gencode gene IDs, of format ENSGXXXXXXXXXXX) in such cases. For any issues finding a gene of interest, search the annotation (GRCh38.106, canonical annotations only, via UCSC Genome Browser) to ensure the correct gene ID and symbol.

The results page will display a plot of windows (50 datapoints, or coverage-filtered A/C bases) tiled across the transcript. Both the Gini index and Pearson’s R are used to characterize the windows, where high values in both metrics correspond to highly-structured regions (i.e. very stringent thresholds for genome-wide identification of structured regions are R ≥ 0.8 and Gini index ≥ 0.5, as used in our publication).

Select window(s) of interest, either by clicking the point in the plot or selecting from the table below, to predict the secondary structure for that region. Creating a custom window is also possible, either through the slider below the plot or through the Predict by Coordinate option.

It is also recommended to do additional predictions expanding the region by ~50 nt on each end in order to check whether structures are being arbitrarily interrupted by the region borders. Please note that the maximum length predictable on this site is 500 nt.

How to predict by coordinate

Input the genomic coordinates of your region of interest, using hg38 as the reference genome. This input is the most flexible, and is unconstrained by sequencing coverage or any annotation. Be sure to double-check that the output sequence matches your expected region, and check that the output coverage of A/C bases is above the recommended 70%, which is reflective of DMS-modifiable bases meeting all coverage and quality filters.

For most users, only one set of coordinates will be used to define their region of interest. However, two sets of coordinates may be needed when joining two separate regions together, such as when crossing a splice junction.

When analyzing a region within a larger transcript, it is generally recommended to test “buffer” regions, where the region of interest should be extended by 20-50 nt on each end in order to reduce the likelihood of structures being arbitrarily interrupted by the region borders.

Due to computational and server constraints, users will be limited to a maximum input length of 500 nucleotides and a maximum output of 5 predicted structures, though some regions may yield fewer than that. If this does not suit your needs, consider downloading and running the data and code locally (see Download ).

Search by Gene & Identify Structured Regions

Example

Chr	Coords	Strand
9	35657750 - 35658019 270 bp

Search by Gene & Identify Structured Regions

Example

Predict by Coordinates

Recently Visualized