clarifies the unbiased expression of the complete set of transcripts at a
single base resolution, including splice junctions and fusion genes, by
providing digital gene expression levels with high reproducibility. RNA-SEQ enables
us to characterize the complex transcriptome, composed of mRNAs, noncoding RNA,
and small RNAs, theoretically at a single cell level, by aligning sequencing
reads on reference genomes. RNA-SEQ experiment capture the total mRNA percent
in the cells and then sequence that RNA in order to determine which genes were
active or expressed in those cells. RNA’s role as the key intermediate between
the genome and the proteome. Transcript identification and the quantification
of gene expression have been distinct core activities in molecular biology.
1.4.1 Comparison on microarray and RNA-SEQ
is the whole transcriptome shotgun sequencing method uses NGS to reveal the
presence and quality of RNA in a biological sample. It has a deep sequencing
technology, which has the reads between 30-400bp. It only requires small amount
of RNA molecules. Read replicates require multiplexing, library preparation and
sample preparation is required. Finally we obtain normalization of differential
expression comparison. It helps to detect novel transcripts.
Microarray which requires the experimental
design and replicates pooling platform array. In data acquisition sample
preparation hybridization imaging and finally image analysis data processing
with statistical analysis and their differences are shown in figure 1.6.
Figure 1.6 Comparisons of Microarray
fastq formatted files of RNA-SEQ datasets for Cystic fibrosis disease retrieved
from EMBL-EBI (European Molecular Biology Laboratory – European Bioinformatics
Institute) under the Study Accession Number PRJNA322763. EMBL-EBI
use bioinformatics science of storing, sharing and analysing biological data.
The Whole Human reference genome was downloaded in fasta format from the UCSC
Mihaela et al. (2016) describes RNA-SEQ experiment generates very large,
complex datasets that perform fast, accurate and flexible software to convert
raw read data to comprehensive results. HISAT2, StringTie and Ballgown are
free, open-source software tools for comprehensive analysis of RNA-SEQ
experiment. The process allow us to align reads to a genome, assemble
transcripts including novel splice variants,
analyse abundance of these transcripts and compare to identify
differentially expressed genes and transcripts between males and females. The
assembly of RNA-SEQ reads reconstructs the exon-intron structure of genes and
their isoforms. Chromosome X is taken as a reference genome.
al. (2016), states that no single pipeline can be used in RNA-SEQ and the
major steps includes experimental design, quality control, read alignment,
quantification of gene and transcript levels, visualization, differential gene
expression, alternative splicing, functional analysis, gene fusion detection.
They discussed many challenges are associated with it. The results are affected
by parameter setting, especially for genes that are expressed at low levels.
Current application of RNA-SEQ are the construction of transcriptome from small
amount of starting materials and better transcript identification from longer
al. (2017) investigated that 219 combinatorial implementations of the most
commonly used analysis tools by RNA-SEQ. A test dataset was generated using
highly purified human classical and non-classical monocyte subsets to evaluate
the performance, when analysis the difference in expression units and gene
versus transcript level estimation. Comparison of RNA-SEQ results to those with
microarray and bead chip analysis. The software selection at each step was the
choice of differential expression analysis approach exhibited the strongest
impact on recall and precision and tools for read aligner and expression
modeller. Downstream application for the type I and type II errors, can guide
the selection of an appropriate workflow. The data generated useful for further
development for differential expression analysis.
Zong et al. (2014) states that software
packages have been developed for the identification of differentially expressed
genes (DEGs) between treatment groups based on RNA-SEQ data. They performed
three of the most frequently used software tools: Cufflinks-Cuffdiff2, DESeq
and edgeR. The number of important parameters includes the number of
replicates, sequencing depth, and balanced versus unbalanced sequencing depth
within and between groups. EdgeR is slightly preferable for differential
expression analysis at the expense of potentially introducing more false
Krishna et al. (2014) states that from qRT-PCR and microarrays as benchmarks,
they observed that edgeR performs better than DESeq and Cuffdiff2 in terms of
the ability with the default FDR setting. All three tools perform much better
when there are biological or technical replicates available. Biological
replicates are a key factor for differential expression analysis in RNA-SEQ
datasets. Cuffdiff2 is most sensitive and DESeq is least sensitive to
sequencing depth, but the overall impact of sequencing depth is not as critical
as the number of biological replicates. When resources are limited for the same
number of total reads, an increased number of biological replicates each with
reduced read depth are recommended over fewer replicates more deeply sequenced.
In addition, EdgeR has the best performance as judged by two AUC statistics
without obvious negative effects under unbalanced sequencing depth.