Microarray-based identification of genes associated with cancer progression and prognosis in hepatocellular carcinoma

Background Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related deaths. The average survival and 5-year survival rates of HCC patients still remains poor. Thus, there is an urgent need to better understand the mechanisms of cancer progression in HCC and to identify useful biomarkers to predict prognosis. Methods Public data portals including Oncomine, The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) profiles were used to retrieve the HCC-related microarrays and to identify potential genes contributed to cancer progression. Bioinformatics analyses including pathway enrichment, protein/gene interaction and text mining were used to explain the potential roles of the identified genes in HCC. Quantitative real-time polymerase chain reaction analysis and Western blotting were used to measure the expression of the targets. The data were analysed by SPSS 20.0 software. Results We identified 80 genes that were significantly dysregulated in HCC according to four independent microarrays covering 386 cases of HCC and 327 normal liver tissues. Twenty genes were consistently and stably dysregulated in the four microarrays by at least 2-fold and detection of gene expression by RT-qPCR and western blotting showed consistent expression profiles in 11 HCC tissues compared with corresponding paracancerous tissues. Eleven of these 20 genes were associated with disease-free survival (DFS) or overall survival (OS) in a cohort of 157 HCC patients, and eight genes were associated with tumour pathologic PT, tumour stage or vital status. Potential roles of those 20 genes in regulation of HCC progression were predicted, primarily in association with metastasis. INTS8 was specifically correlated with most clinical characteristics including DFS, OS, stage, metastasis, invasiveness, diagnosis, and age. Conclusion The significantly dysregulated genes identified in this study were associated with cancer progression and prognosis in HCC, and might be potential therapeutic targets for HCC treatment or potential biomarkers for diagnosis and prognosis. Electronic supplementary material The online version of this article (doi:10.1186/s13046-016-0403-2) contains supplementary material, which is available to authorized users.


Background
Hepatocellular carcinoma (HCC) is the third leading cause of cancer-related deaths [1]. There are 750,000 new cases of HCC and nearly 700,000 deaths each year, making this a particularly lethal form of cancer [2]. Over the past decade major progress has been made in our understanding of the risk factors and molecular pathways driving liver carcinogenesis, and these advances have led to substantial opportunities for HCC prevention, surveillance, early diagnosis, prediction of prognosis, and therapy [1]. However, the average survival of HCC patients is normally between 6 and 20 months [3], and long-term prognosis is poor with reported 5-year survival rates ranging from 17 to 53 % [4]. Thus, there is an urgent need to better understand the mechanism of cancer progression and development in HCC and to identify useful biomarkers for diagnosis and prognosis.
High-throughput profiling technologies such as microarrays and, more recently, next-generation sequencing have become invaluable tools for biomedical research, and large amounts of data generated by those tools, including mRNA expression, DNA methylation, and microRNA expression, are collected in public archives such as the major public projects The Cancer Genome Atlas (TCGA) [5] and the International Cancer Genome Consortium [6], and the most prominent primary data archives, ArrayExpress [7], Gene Expression Omnibus (GEO) [8], Oncomine [9] and the databases of the International Nucleotide Sequence Database Collaboration [10]. The wide range of those databases, the various ways in which publicly archived gene expression data are being used in support of new studies, and reuse of these public data can be very powerful [11]. In particular, reusing of the data has the potential to predict treatment response and disease progression and was advantageous to develop precision therapies [12]. For example, based on data retrieved from Oncomine, TCGA, and GEO, Liu et al. identified several genes associated with ovarian cancer progression [13] and drug resistance [14]. In a similar manner, we identified that upregulation of E2F transcription factor 3 is associated with poor prognosis in HCC [15]. In the present study, using data of mRNA expression, DNA methylation, and clinical data retrieved from Oncomine, GEO, and the TCGA cohort, we identified a group of genes associated with cancer progression and prognosis in HCC.

Samples
All patients who underwent curative hepatectomy for primary HCC at the First Affiliated Hospital of Guangxi Medical University between March 2015 and September 2015 were eligible for inclusion in this study. Total of 11 HCCs and the matched paracancerous tissues were collected during surgery and stored in a liquid nitrogen tank until use for mRNA isolation and protein extraction. The study was endorsed by the Ethics Committee of Guangxi Medical University and was performed according to the Declaration of Helsinki, 2013 edition. All patients received an explanation of the aims of the study and signed informed consent.

Gene expression profiles
The genes significantly dysregulated in HCC were identified based on the 4 microarrays, Chen Liver microarray (104 HCCs vs. 76 liver tissues), Roessler Liver microarray (22 HCCs vs. 21 liver tissues), Roessler Liver 2 microarray (225 HCCs vs. 220 liver tissues) and Wurmbach Liver microarray (35 HCCs vs. 10 liver tissues), which are all deposited in Oncomine database (https:// www.oncomine.org/resource/login.html) [9]. The 4 microarrays together covering total of 386 cases of HCCs and 327 cases of normal liver tissues. The rank for a gene is the median rank for that gene across each of the analyses. DNA methylation, mRNA expression, and clinical data of 379 HCC patients in a TCGA cohort were retrieved from cBioPortal for Cancer Genomics (http://cbioportal.org) [16,17], but only 157 samples with matched gene expression data, prognosis data and most of the other clinical data were used to analyze the clinical importance of the target genes. mRNA expression data associated with HCC metastasis were retrieved from microarray GDS3091 [18] and GDS274 [19], which were deposited in the GEO profiles databases (http:// www.ncbi.nlm.nih.gov/geoprofiles/) [8].

Data analysis
The data were analysed by SPSS 20.0 software. The mRNA expression of a gene is presented as the mean ± SD. Homogeneity of variance was analysed using the ttest. Expression values of a gene were dichotomised into high and low expression using the median as a cutoff for analysis of clinical importance in a TCGA cohort, as described in a previous study [25]. The probability of survival and its significance was calculated using the Kaplan-Meier method and log-rank test, respectively. A Cox proportional hazard model was performed for multivariate analysis of prognosis. The correlation between gene expression and clinicopathologic characteristics was evaluated by Pearson's χ 2 test (two-sided). The correlation between DNA methylation and gene expression was analysed using bivariate correlations. P values < 0.05 were considered to indicate statistically significant differences.

Retrieval of significantly dysregulated genes in HCC
Four independent microarrays deposited in the Oncomine database were selected to identify genes associated with cancer development and progression in HCC. These microarrays were Chen Liver Statistics covering 104 cases of HCC and 76 cases of liver tissue, Roessler Liver Statistics covering 22 cases of HCC and 21 cases of liver tissue, Roessler Liver 2 Statistics covering 225 cases of HCC and 220 cases of liver tissue, and Wurmbach Liver Statistics covering 35 cases of HCC and 10 cases of liver tissues. Based on analysis of these four independent microarrays, 40 genes that were significantly upregulated (P < 1.36E-10) and 40 genes that were significantly downregulated (P < 1.31E-10) in HCC were retrieved ( Fig. 1). Analysis of the 80 genes by the DAVID online tool indicated that cell cycle was the top biological process, covering 17 genes, and microtubule cytoskeleton was the top cellular component, covering 14 genes (Additional file 1: Table S1).
Among the 80 genes that were dysregulated in HCCs according to four independent microarrays covering a total of 386 cases of HCC and 327 cases of normal liver tissues, nine genes (CAP2, PTTG1, TOP2A, GMNN, GPC3, UBE2C, UBAP2L, TBCE, and INTS8) were consistently and stably upregulated and 18 genes (CXCL14, VIPR1, CLEC4M, MARCO, CLEC1B, NAT2, FCN2, EGR1, DNASE1L3, MT1F, CRHBP, LCAT, PAMR1, ACSM3, MT1G, MT1X, SRPX, and MT1H) were consistently and stably downregulated in HCC, by least 2-fold ( Fig. 1; Table 1). Among the above 27 genes, seven genes-CAP2, GMNN, PTTG1, TBCE, TOP2A, UBE2C, and FCN2-encode proteins associated with cell cycle and microtubule cytoskeleton (Additional file 1: Table S1). Protein/gene-protein/gene interaction analysis was performed to further explain the interrelationships of these genes in HCC. As shown in Additional file 2: Figure S2, the 27 proteins/genes directly/ indirectly interacted with each other via co-localisation, genetic interactions, shared common pathways, and protein domains, and, in particular, co-expression, and 10 of them-VIPR1, DNASE1L3, SRPX, MT1H, CXCL14, CLEC4M, CRHBP, GPC3, NAT2, and MARCO-interacted with at least 14 other genes, more than half of all the genes in the interaction network (Additional file 2: Figure  S2). Moreover, these genes were also those that were dysregulated at least 4-fold in HCC (Table 1). Fig. 1 The 80 genes that were significantly dysregulated in hepatocellular carcinomas according to four independent microarrays retrieved from the Oncomine database. a The top 40 genes that were significantly upregulated in four microarrays. b The top 40 genes that were significantly downregulated in four microarrays. The four microarrays cover a total of 386 cases of hepatocellular carcinomas and 327 cases of normal liver tissue: (1) Chen Liver Statistics, 104 cases of hepatocellular carcinoma and 76 cases of liver tissue; (2) Roessler Liver Statistics, 22 cases of hepatocellular carcinoma and 21 cases of liver tissue; (3) Roessler Liver 2 Statistics, 225 cases of hepatocellular carcinoma and 220 cases of liver tissue; (4) Wurmbach Liver Statistics, 35 cases of hepatocellular carcinoma and 10 cases of liver tissue. The rank for a gene is the median rank for that gene across each of the analyses. The P value given for a gene is for the median-ranked analysis. The genes labelled in red and in blue were significantly and consistently up-and downregulated in the four microarrays, respectively

Measurement of gene expression at mRNA and protein level
Among the 27 genes, the associations of seven with HCC are relatively well studied and described in published papers. However, the relationship of the remaining 20 genes with HCC was poorly understood, and these genes were selected for further analyses ( Table 1). The expression of eight genes that were randomly selected from the 20 genes was measured by RT-qPCR in 11 tissues of HCC patients All results of gene expression determined by RT-qPCR were completely consistent with their expression identified by the four independent microarrays ( Fig. 1; Table 1). Furthermore, a significant increase at the protein level of INTS8 was observed in HCC tissues compared with corresponding paracancerous tissues (Fig. 2b), which was consistent with its expression at the mRNA level.

Analysis of clinical importance
The clinical importance in HCC of the 20 selected genes ( Table 1) was evaluated on the basis of TCGA clinical data. A total of 379 HCC patient samples with clinical data in a cohort of TCGA were retrieved. Among these, 157 samples with mRNA expression values were selected for analysis of the relationship between genes and clinical characteristics. The expression values of a gene were categorised as high or low according to the median value in accordance with a previous study [25].
A total of 11 genes were associated with DFS and/or OS (Table 2); among those, low expression of ACSM3 and CXCL14 was associated with poor DFS, and low expression of CRHBP, DNASE1L3, FCN2, MT1X, and VIPR1 was associated with poor OS (Fig. 3, Table 2). Four genes were associated with both DFS and OS: high expression of INTS8 in HCC patients, and low expression of LCAT, MARCO, and PAMR1, was associated with poor DFS and OS (Fig. 4, Table 2). To elucidate whether any of the above genes was an independent factor for predicting patient survival, we performed multivariate analyses of tumour stage, tumour pathologic PT, tumour residual, tumour status, vital status, age, gender, and the 11 genes by a Cox proportional hazards model (Table 3). We found that stage (P = 0.050), tumour status (P = 0.001), DNASE1L3 expression (P = 0.042), and INTS8 expression (P = 0.023) were independent risk prognostic factors for OS in HCC patients, although no gene was found to be an independent prognostic factor for DFS (data not shown).
Six genes were associated with tumour pathologic PT and tumour stage (Table 4); among these, high expression of INTS8 and UBAP2L, and low expression of ACSM3, FCN2, LCAT, and MT1G, was significantly associated with metastatic tumour and late stage (P ≤ 0.05). In particular, UBAP2L was markedly and highly expressed in T2 tumours (72.5 % vs. 27.5 %) and LCAT was lowly expressed in T2 tumours (30.0 % vs. 70.0 %) and highly expressed in T1 tumours (72.6 % vs. 27.4 %). In addition, LCAT was highly expressed in stage I tumours (71.2 % vs. 28.8 %). Ten genes were associated with age and gender. As shown in Table 4, we found that six genes-CXCL14, GMNN, INTS8, MT1F, MT1G, and SPRX-were expressed at low levels in HCC patients aged ≥ 65 years. Expression of five genes was related to the gender of HCC patients. Except for FCN2, which is lowly expressed in male HCC patients, the other four genes,  The gene expression and survival data of 157 HCC patients in a TCGA cohort were used for the analysis. Expression values of a gene were dichotomised into high and low expression using the median as a cutoff H high expression, L low expression

Potential roles of the genes in HCC progression
The potential roles of the 20 genes in HCC were predicted on the basis of Coremine Medical mining. As shown in Fig. 5, the associations of the genes with diagnosis, prognosis, drug resistance, recurrence, metastasis, and invasiveness of HCC was comprehensively analysed. The results indicated that, with the exception of PAMR1, the other 19 genes were all associated with at least one factor contributing to cancer progression, and many of the genes, for example GMNN, CXCL14, MT1G, MT1X, SPRX, and VIPR1, were closely associated with almost all of the factors included in this analysis. Most of the genes were extensively associated with several factors. For example, 15 genes (including INTS8, LCAT, MARCO, and DANSE1L3) were associated with diagnosis, 14 genes (including INTS8, MARCO, CRHBP, and VIPR1) were associated with metastasis, and 13 genes (including LCAT, MARCO, FCN2, and CXCL14) were associated with prognosis. Based on the gene expression in two independent GEO microarrays corresponding to HCC metastasis, the association of the genes CLEC4M, CRHBP, MARCO, MT1X, SRPX, UBAP2L, and VIPR1 with metastasis was  Expression values of a gene were dichotomised into high expression (blue line) and low expression (green line) using the median as a cutoff further analysed; unfortunately, data for the other genes were unavailable. The expression of CRHBP, LCAT, and SPRX was significantly dysregulated in nine HCCs with venous metastasis compared with 11 HCC without (Fig. 6a). Genes VIPR1, LCAT, BAP2L, CLEC4M, CRHBP, and SRPX were significantly dysregulated in 32 HCCs with portal vein tumour thrombus metastasis and 33 HCCs with intrahepatic spread metastasis compared with 22 HCCs with no metastasis (Fig. 6b&c). In particular, LCAT was highly expressed in HCC patients with venous metastasis and patients with portal vein tumour thrombus metastasis, and SRPX was lowly expressed in HCC patients with venous metastasis and patients with intrahepatic spread metastasis (Fig. 6).
Correlation of DNA methylation with mRNA expression of the target genes DNA methylation and mRNA expression data from 379 HCC patients in a TCGA cohort were retrieved and the correlations between them were analysed using bivariate correlations. Among the 20 genes that are poorly studied in HCC (Table 1), DNA methylation data of CLEC1B and SRPX were not available. DNA methylation was negatively correlated with the mRNA expression for eight genes, ACSM3, INTS8, LCAT, MT1X, CRHBP, MARCO, PAMR1, and VIPR1. In particular, high methylation of the first four genes was significantly correlated with lower mRNA expression (Fig. 7), indicating that the expression of these genes in HCC might be regulated by DNA methylation.

Discussion
Cancer is frequently considered to be a disease of the cell cycle because alterations in different families of cell cycle regulators cooperate in tumour development. Molecular analysis of human tumours has shown that cell cycle regulators are frequently mutated in human neoplasms, underscoring the importance of maintaining cell cycle commitment in the prevention of human cancer [26]. Abnormal expression of cell cycle controllers, particularly G1/S-phase transition, is often implicated in the pathogenesis of most human cancers, including HCC. For example, vaccinia-related kinase 1 promotes HCC by controlling the levels of cell cycle regulators associated with G1/S transition [27]. In this study, 80 genes that were significantly dysregulated in HCC were identified based on four independent microarrays covering a total of 386 cases of hepatocellular carcinoma and 327 cases of normal liver tissues (Fig. 1), and biological process annotation of these genes revealed that 17 of these genes were implicated in cell cycle functions (Additional file 1: Table S1). These results suggested that these genes might contribute to cancer progression and development in HCC at least in part through regulation of the cell cycle.
Twenty-seven genes were further identified to be consistently dysregulated in all four microarrays by at least 2-fold ( Table 1). The expression of eight of these genes (TBCE, INTS8, VIPR1, CLEC4M, MARCO, DNASE1L3, CRHBP, and FCN2) was confirmed in 11 tissues of HCC patients compared with matched paracancerous tissues     by RT-qPCR (Fig. 2a). Seven of the 27 genes (UBE2C, PTTG1, CAP2, TOP2A, GPC3, EGR1, and NAT2) have been well studied in HCC (Table 1). For example, GPC3 plays critical roles in cell proliferation and invasion through the induction of apoptosis [28] and is a biomarker for diagnosis [29] and recurrence [30]. Protein/gene-protein/gene interaction analyses indicated that these 27 proteins/genes strongly interacted with each other, and 10 of them interacted with at least half of all the genes (Additional file 2: Figure S2). Moreover, six of these genes were related to the cell cycle in HCC (Additional file 1: Table S1). Together, these results indicate that the genes identified in this study might play crucial roles in HCC progression, probably functioning as a group.
Biomarkers not only have prognostic implications, but are also helpful for measurement of treatment responses and surveillance for tumour recurrence and for guiding clinical decisions [31]. Thus, prognostic biomarkers for HCC patients are necessary and crucial, and there is an ongoing search for predictive biomarkers. In this study, a group of genes associated with DFS and OS (Table 2) were identified in 157 HCC patients. Among these genes, low expression of ACSM3 and CXCL14 was associated with poor DFS, low expression of CRHBP, DNASE1L3, FCN2, MT1X, and VIPR1 was associated with poor OS (Fig. 3, Table 2), high expression of INTS8 was associated with poor DFS and OS, and low expression of LCAT, MARCO, and PAMR1 was associated with poor DFS and OS (Fig. 4, Table 2). Furthermore, DNASE1L3 and INTS8 were identified as independent risk prognostic factors for OS (Table 3). There are few reports of the association of these genes with prognosis in HCC or in other cancers. Previous studies indicate that downregulation of CXCL14 is associated with prognosis in gastric cancer patients [32], MT1X may aid in the prognostic discrimination of oral squamous cell carcinoma cases [33], and MARCO expression is associated with breast cancer survival and risk of recurrence [34].
Twenty genes that have been less studied in HCC (Table 1) were further evaluated to predict their potential roles in HCC progression. Coremine medical mining suggested that most of those genes were associated with   6 mRNA expression of the genes in HCC patients with and without metastasis according to microarray data retrieved from the GEO online database. a Microarray data GDS3091 [18] cover nine HCCs with venous metastasis and 11 without as controls. b, c Microarray data GDS274 [19] cover 32 HCCs with portal vein tumour thrombus metastasis, 33 with intrahepatic spread metastasis, and 22 HCCs with no metastasis as controls. *, P < 0.05; **, P < 0.01 diagnosis, prognosis, drug resistance, recurrence, metastasis, and invasiveness. In particular, 13, 14, and 15 genes were potentially associated with prognosis, metastasis, and diagnosis in HCC, respectively (Fig. 5). The association of these genes with prognosis appears to have clinical importance, as 11 genes were shown to be associated with DFS or/and OS (Table 2, Fig. 3 & 4). The role of these genes in metastasis was further confirmed by gene expression analysis, which showed that five genes were significantly dysregulated in HCC with venous metastasis, portal vein tumour thrombus metastasis, or intrahepatic spread metastasis, compared with the appropriate controls. Specifically, LCAT was highly expressed in HCC patients with venous metastasis and patients with portal vein tumour thrombus metastasis, and SRPX was lowly expressed in HCC patients with venous metastasis and patients with intrahepatic spread metastasis (Fig. 6), suggesting that these two genes might be closely related to HCC metastasis. There are few studies on LCAT and SRPX in cancer metastasis, with only one reported that SRPX is upregulated in gastric cancer cells after depletion of TWIST, which promoted the epithelial-mesenchymal transition that occurs during the initial steps of tumour metastasis [35].
INTS8 encodes a subunit of the integrator complex that is involved in the cleavage of small nuclear RNAs, and its association with cancer is poorly understood. Limited studies indicate that INTS8 contains mutations in peripheral T cell lymphoma compared with nonmalignant samples from 12 patients [36], and a combination of INTS8 with SULF1, ATP6V1C1, and GPR172A can be used to discriminate gastric carcinomas from adjacent noncancerous tissues [37]. In this study, we found that, potentially regulated by demethylation (Fig. 7), INTS8 was significantly and consistently upregulated at least 2.115-fold in HCC according to four independent microarrays ( Fig. 1; Table 1) and that INTS8 mRNA was upregulated 2.06-fold on average in 11 tissues of HCC patients compared with corresponding paracancerous tissues, with a similar expression profile at the protein level (Fig. 2). Based on the clinical importance analysis of 157 HCC patients in a TCGA cohort, we found that high expression of INTS8 was associated with poor DFS and OS (Fig. 4, Table 2), and was an independent risk prognostic factor for OS (Table 3). Moreover, high expression of INTS8 was associated with metastatic tumours and late stage (Table 4), and with younger HCC patients (<65 years old) ( Table 4). In addition, text mining indicated that INTS8 was closely related with metastasis, invasiveness, and diagnosis (Fig. 5). The above results strongly indicate that this gene is indeed upregulated in HCC, where it might play crucial roles in HCC cancer progression and development, and is a potential biomarker for diagnosis and, in particular, prognosis.

Conclusion
In summary, by means of data retrieved from six independent microarrays, RT-qPCR and western blotting detection in 11 pairs of tissues, clinical importance analyses in a cohort of 157 patients, and bioinformatics analyses including biological process annotation, protein interaction and text mining, we have identified a group of genes that are significantly dysregulated in HCC and might be associated with cancer progression, development, and, in particular, prognosis. These genes could be potential therapeutic targets for HCC treatment, and might be useful biomarkers for diagnosis and prognosis.

Additional files
Additional file 1: Table S1. Biological process and cellular component annotation of the 80 genes associated with HCC development and progression by DAVID online tool. (PDF 167 kb) Additional file 2: Figure S2. Protein/gene-protein/gene interaction network of the 27 genes that were stably and consistently dysregulated in 386 cases of hepatocellular carcinoma compared with 327 cases of normal liver tissue according to the four independent microarrays retrieved from the Oncomine database. (PDF 306 kb)