Research Article - Journal of Drug and Alcohol Research ( 2021) Volume 10, Issue 6

The Differentially Expressed Genes and Biomarker Identification for Dengue Disease Using Transcriptome Data Analysis

Sunil Krishnan G*
1Department of Bioinformatics, Punjab, India
*Corresponding Author:
Sunil Krishnan G, Department of Bioinformatics, India, Email: [email protected]

Received: 24-May-2021;Accepted Date: Jun 07, 2021; Published: 14-Jul-2021


This bioinformatics and biostatistics study was designed to recognize and examine the differentially expressed genes (DEGs) linked with dengue virus infection in Homo sapiens. Thirty nine transcriptome profile datasets were analyzed by linear models for microarray analysis based on the R package of the biostatistics test for the identification of significantly expressed genes associated with the disease. The Benjamini and Hochberg (BH) standard operating procedure assessed DEGs had the least false discovery rate and chosen for further bioinformatics gene analysis. The large gene dataset was investigated for systematically extracting the biological significance of DEGs. Four clusters of DEGs were distinguished from the dataset and found the extracellular calcium sensing receptor gene expressing CASR protein was the most connecting human protein in the disease progression and discovered this protein as a potential biomarker for acute dengue fever.


Dengue virus; Differentially expressed genes; Protein protein interaction; Bioinformatics2


Female Aedes aegypti is a vector of the dengue viral dis- ease spread. The virus is from the Flaviviridae family that infects the disease among human beings globally becomes a serious public concern [1]. Four major serotypes of DENV (1, 2, 3, and 4) share 65% genome similarity [2]. These se- rotypes were recognized as sources for mild to fatal dengue fevers [3,4]. The RNA genome encodes poly proteome for virus structure and reproduction [5,6]. The real time qPCR used for the detection and quantification of viremia best choice because the direct culture methods are difficult due to the low multiplication of clinical virus samples in cell culture. The inconsistency of plaque assay made quantifi- cation in viral diagnosis [7]. The DENV NS1 and IgM/IgG based diagnosis were used in the early stage of the disease but the results may vary with DENV serotypes and second- ary infections [8]. A host dependent biomarker may be a solution for the consistent detection of disease progression [9]. Several microarray studies acknowledged differential- ly expressed genes (DEGs) from multiple sample profiles [10,11]. Consistently DEGs identified from various previ- ous studies [12-16] were used to make out a potential bio- marker for the DENV disease. Meta-analysis approaches are common practice to discover novel DEG signatures for superior biomarkers and synthetic/biotherapeutics [17,18]. Here in this study, the gene expression patterns explored from the Gene Expression Omnibus (GEO) database. DEGs were compared and analyzed by Bioconductor supported Lima R package. Functionally related genes acknowledged by an integrated bioinformatics tool. DEGs corresponding PPI interaction network constructed. Detailed analysis of the data sets identified important biomarkers for therapeu- tic and diagnostic for dengue virus.

Materials and Methods

Retrieval of DENV microarray gene expression profile datasets

The microarray gene expression profile GEO dataset was retrieved from the GEO database [19]. The selected GSE17924 dataset contained 48 samples of host genome wide expression profiling during dengue disease [20]. The sample transcriptome dataset was used for the gene expres- sion analysis.

Differentially expressed genes comparison analysis

From the GEO series, 14 DENV (1, 2, 3, and 4) samples cross compared and identified significant differentially ex- pressed genes across experimental conditions using Bio- conductor supported Lima R package. The Benjamini and Hochberg (BH) standard operating procedure was used for reduced the false P vale discovery rate [19]. The median centered distribution values of samples were selected for optimum cross comparability and differentially expressed gene identification.

Gene functional classification and identification of func- tionally related gene

The differentially expressed genes of DENV classified and functionally related gene groups using DAVID annotation and visualization tool retrieved from https://david.ncifcrf. gov/tools.jsp.

Protein interaction network analysis

The selected DEGs translating protein datasets analyzed and visualized the interaction networks and performed gene set enrichment analysis done by Gene Ontology (GO) and KEGG by using STRING. The resource availed form online at


DENV microarray gene expression profile data

The input of the “Dengue” query keyword has resulted in (n=15830) GEO Dataset. The search was then filtered by ‘expression profiling by array’ and ‘Top organism’– ‘Homo sapiens’ resulted (n=39) GEO dataset. GEO ac- cession number GSE17924 selected. The selected dataset contained (n=48) samples of host genome wide expression profiling during dengue disease and filtered (n=14) samples for our study.

Identification of differentially expressed genes

The GEO samples (GSM) data identified from GEO2R through the GEO series accession number. The identi- fied GSM samples are grouped into four according to the DENV serotypes. The GEO sample are GSM447796, GSM447797, GSM447815, GSM447822 (DENV 1), GSM447781, GSM447791, GSM447804, GSM447819 (DENV2), GSM447783, GSM447784, GSM447785, GSM447786 (DENV 3) and GSM447807, GSM447823 (DENV 4). The BH procedure narrowed false positives results [12]. The selected samples were found suitable for comparative analysis as per the determined calculated dis- tribution values and visualized as a Boxplot in Figure 1A. This analysis compared the four groups of DENV serotypes samples data and identified the top 250 DEGs based on the lowest P value. Genes with the smallest P value are found the most significant in studies [19]. DENV 2 vs. 4 cross comparisons predicted the highest number of significantly expressed genes. The entire up and down regulated gene volcano plots are visualized in Figure 1B. Visualization for the expression density of samples in Figure 1C. The Venn diagram visualization of significantly expressed genes across DENV serotypes in Figure 1D. The mean variance relationship of expression data visualized in Figure 1E. The details of top up (n=5) and down (n=5) regulated DEGs are explained in Table I.


Figure 1: Visualization differentially expressed genes and Protein interaction (A) Boxplot of GSM sample‚??s distribu- tion values (B) Volcano plot of up-regulated (red colour) and down-regulated (blue colour) DENV genes (C) Expres- sion density of samples (D) Venn Diagram of significantly expressed genes across the DENV serotypes (E) Expres- sion data Mean-variance relationship (F) Protein-Protein interaction of selected proteins.

Top 5 Down-regulated DEGs details Top 5 Up-regulated DEGs details
DEGs symbol HGNC approved DEGs name Log2 FC value for selected DEGs DEGs symbol HGNC approved DEGs name Log2 FC value for selected DEGs
USHBP1 USH1 protein network component harmonin binding protein 1 -6.009 SYCP2 synaptonemal complex protein 2 2.893
WWTR1 WW domain containing transcription regulator1 -5.831 PLS3 plastin 3 2.903
ZBED9 Z inc finger BED-type containing 9 -5.792 PTGIS prostaglandin I2 (prostacyclin) synthase 2.913
WNT5A Wntfamily member 5A -5.774 HCAR1 hydroxycarboxylic acid receptor 1 2.963
TRABD2B TraB domain containing 2B -5.739 OCLN occludin 2.986

Table 1: Details of top 5 Up and down-regulated differentially expressed genes.

Gene functional classification and identification results

The systematically organized genes were useful to inter-pret the biological importance of the expressed gene [12]. Through medium stringency, the genes were classified from the expressed gene list. Functional annotation clustering tool based on kappa statistics to quantitatively measure and identify functionally related genes are involved in the similar biological mechanism associated with a set of sim- ilar annotation terms [13]. This tool helped to reduce the redundancy and identify similar annotations of the DEG dataset. The kappa similarity and classification parameters predicted higher quality of functional classification [8]. From the DEGs dataset 34 genes are clustered into four big gene functional groups. The enrichment score determined the importance of the gene group from the gene list. Three gene groups were selected for further analysis based on the highest enrichment scored (>1). The first group of genes (CMTM1, IFI27L2, REEP5, C10orf76, TMEM199, ORM- DL2) had an enrichment score of 1.44. The second group of genes (SLC5A12, PCDH9, CaSR, PCDH7, GPR65, HCAR1, IGSF8, CA12, CPM, CCR1, OR5I1, CD52, PT- GER4, OR10A5, PTGER2, ILDR1, TSPAN2) and the third group (TMPRSS3, TMPRSS13, PRTN3, ELANE) had en- richment score 1.3 and 1.26 respectively.

Network status of protein interaction and functional enrichments

The PPI network analysis shows 10 edges (interaction) jointly contribute to shared functional associations in the 27 network nodes (proteins). The predicted average node degree (0.741) and local clustering coefficient (0.494). Also found PPI enrichment p value (0. 000685). The predicted PPI network view was visualized in Figure 1F, the edge line colored differently. These coloured lines correspond to the types of functional associations between proteins. Figures 2 and 3 visualized protein interaction clusters. The CaSR protein was identified as the highest interacting with Gly- cosphingolipid psychosine (GPR65), Ahydroxy carboxyl- ic acid receptor 1 (HCAR1), and C-C chemokine receptor type 1 (CCR1) proteins. The CaSR plays a key role in the production of parathyroid hormone (PTH) and GPR65 has a role in immune responses. HCAR1 mediates its anti lip- olytic effect and CCR1 responsible for affecting stem cell proliferation. The functional enrichment analysis predict- ed biological processes (n=10), molecular functions (n=8), and cellular components (n=4), GO terms significantly en- riched in the predicted network. The down regulated CaSR is a proliferation marker in colorectal cancer [21], prostate cancer [22], and breast cancer [23]. Here we hypothesized from this study that CaSR protein was a potential biomark- er in the DENV disease.


Figure 2: Visualization of protein interaction clusters.


Figure 3: Graphical abstract


• Transcriptome profile data retrieval and analysis

• Identification of differentially expressed genes (DEGs) associated with dengue virus infection.

• Significant differentially expressed genes were statisti- cally analyzed by the lima R package.

• Functionally related gene classification and identifica- tion.

• Gene Ontology (GO) gene set enrichment analysis of selected DEGs.

• Protein interaction network analysis and biomarker identification.


Despite the reality, although dengue can be a significant subtropical illness, very little understood about the patho- physiology, attributable to the complicated cell activities that take place in sick. Numerous transcripts, as well as host genetic mechanisms, are elevated after dengue fever, according to transcriptional microarray data. Addition- al quantitative studies differentiated among host genetic networks step in establishing an intrinsic communication mechanism (Nuclear factor driven transcripts as well as the Interferon network) but those engaged in viral replication (NF-B driven transcripts or the Interferon route) (ubiqui- tin dependent proteasome) [24]. GEO2R tool was very ef- fective for microarray profile analyzing datasets in many recent studies on dengue [25]. Log2FC values assist in de- termining expression levels of host genes in response to dengue infection. In humans or other vertebrate animal tis- sues, p53/mitochondrial driven apoptotic_pathways were triggered by the dengue fever, microarray studies assist in DEG analysis to reveal such systems [26]. We identified the top 250 differentially expressed genes based on significant P values. DENV2 and 4 had the highest significantly ex- pressed genes in GEO2R comparison analysis. This could be a potent biomarker for the therapeutic and diagnosis of dengue viral disease. The top five upregulated genes that were found in this study are USHBP1, WWTR1, ZBED9, WNT5A, TRABD2B; and top downregulated genes were SYCP2, PLS3, PTGIS, HCAR1, OCLN. CASR a G-pro- tein coupled receptor protein found the highest interacting and enrichment analysis results also supportive to our findings.


We identified the top 250 differentially expressed genes based on significant P values. DENV2 and 4 had the high- est significantly expressed genes in GEO2R comparison analysis. The significant DEGs (n=34) are clustered into four big gene functional groups using the DAVID bioin- formatics tool and three groups contain 27 genes selected for further analysis based on the highest enrichment scored (>1). PPI network analysis shows 10 protein interactions among the nodes and selected one protein which is highly interacting with the other 3 protein in the network. CASR a G-protein coupled receptor protein found the highest in- teracting and enrichment analysis results also supportive of the findings. This could be a potent biomarker for the ther- apeutic and diagnosis of dengue viral disease.


Authors SKG, AJ, and, VK are grateful to, Bioinformatics division of Lovely professional university, Jalandhar, Pun- jab, India for providing a computational and bioinformatics environment for this computational research.