Candidate microRNA biomarkers of pancreatic ductal adenocarcinoma: meta-analysis, experimental validation and clinical significance

Background The diagnostic and prognostic value of microRNA (miRNA) expression aberrations in pancreatic ductal adenocarcinoma (PDAC) has been studied extensively in recent years. However, differences in measurement platforms and lab protocols as well as small sample sizes can render gene expression levels incomparable. Methods A comprehensive meta-review of published studies in PDAC that compared the miRNA expression profiles of PDAC tissues and paired neighbouring noncancerous pancreatic tissues was performed to determine candidate miRNA biomarkers for PDAC. Both a miRNA vote-counting strategy and a recently published Robust Rank Aggregation method were employed. In this review, a total of 538 tumour and 206 noncancerous control samples were included. Results We identified a statistically significant miRNA meta-signature of seven up- and three down-regulated miRNAs. The experimental validation results showed that the miRNA expression levels were in accordance with the meta-signature. The results from the vote-counting strategy were consistent with those from the Robust Rank Aggregation method. The experimental validation confirmed that the statistically unique profiles identified by the meta-review approach could discriminate PDAC tissues from paired nonmalignant pancreatic tissues. In a cohort of 70 patients, the high expression of miR-21 (p=0.018, HR=2.610; 95% CI=1.179-5.777) and miR-31 (p=0.039, HR=2.735; 95% CI=1.317-6.426), the low expression of miR-375 (p=0.022, HR=2.337; 95% CI=1.431-5.066) were associated with poor overall survival following resection, independent of clinical covariates. Conclusions The identified miRNAs may be used to develop a panel of diagnostic and prognostic biomarkers for PDAC with sufficient sensitivity and specificity for use in a clinical setting.


Introduction
Pancreatic ductal adenocarcinoma (PDAC) is one of the most aggressive of all malignancies [1]. Less than 20% of PDAC patients present with localised, potentially curable tumours. The overall 5-year survival rate is <5%. Because the chemotherapeutic options prolong life only minimally, the current PDAC mortality is nearly identical to its incidence [2]. Extensive studies have been performed to identify biomarkers for this disease. At the messenger RNA (mRNA) level, quite a few, including some very specific molecular variations have been found in cancerous tissues [3]. MicroRNAs (miRNAs), a class of short non-coding RNA molecules that range in size from 19 to 25 nucleotides, have been proposed as promising biomarkers of early cancer detection and accurate prognosis as well as targets for more efficient treatment [4,5]. MiRNAs play important roles in regulating the translation of many genes and the degradation of mRNAs through base pairing to partially complementary sites, predominately in the 3′ untranslated region [6,7]. Several studies have implicated miRNAs in the regulation of tumour biology [8][9][10]. Model biomarkers should be easily quantifiable and associate strongly with clinical outcome, and miRNAs may match these criteria.
High-throughput technologies have been employed to identify differences in miRNA expression levels between normal and cancerous tissues. These studies have the potential to identify dozens or hundreds of differentially expressed miRNAs, although only a small fraction of them may be of actual clinical utility as diagnostic/prognostic biomarkers. Finding a meaningful way in which to combine different data sources is often a non-trivial task. Differences in measurement platforms and lab protocols as well as small sample sizes can render gene expression levels incomparable. Hence, it may be better to analyse datasets separately and then aggregate the resulting gene lists. This strategy has been applied to identify gene coexpression networks [11] and to define more robust sets of cancer-related genes [12,13] and miRNAs [14,15].
In the meta-review approach, the results of several individual studies are combined to increase statistical power and subsequently resolve any inconsistencies or discrepancies among different profiling studies. In this study, we applied two meta-review approaches: the wellknown vote-counting strategy [12,13], which is based on the number of studies reporting a gene as being consistently expressed and then further ranking these genes with respect to total sample size and average fold-change, and the recently published Robust Rank Aggregation method [16,17]. Pathway analysis was then performed to identify the physiological impact of miRNA deregulation in PDAC progression. Moreover, we further validated the most up-regulated and down-regulated miRNAs from the meta-review in a clinical setting. The expression levels of a subset of candidate miRNAs were assessed by quantitative real-time polymerase chain reaction (qRT-PCR). With the validation of candidate miRNAs, we selected the most promising miRNAs based on factors such as fold-change to explore their potential effects on the survival of PDAC patients after surgical resection.

Selection of studies and datasets
The Scopus database (www.scopus.com) was searched for PADC miRNA expression profiling studies using search term TITLE-ABS-KEY [(mirna* OR microrna* OR mir-* OR mir) AND profil* AND (pancreas* cancer OR pancreatic carcinoma OR pancreas* neoplas* OR pancreatic tumo* OR carcinoma of pancreas* OR cancer of pancreas*)]. The same strategy was also applied to searches of the Gene Expression Omnibus (GEO; www.ncbi.nlm.nih.gov/geo/), ArrayExpress (http://www.ebi.ac.uk/arrayexpress/), and PubMed (www.ncbi.nlm.nih.gov/pubmed). The last search was performed on May 11, 2013. The titles and abstracts of the articles were screened, and the full text of the articles of interest was evaluated. We included only original experimental articles that were published in English and that compared the expression of miRNAs in PDAC tissue and noncancerous pancreatic tissue in humans. Articles were excluded based on the following criteria: (i) review articles, case reports or letters; (ii) non-English articles; (iii) studies of individual preselected candidate miRNAs; (iv) studies that used RT-PCR for initial selection (the reasons for this exclusion criterion are explained in the Discussion section); (v) studies using cell lines or serum from PDAC patients; (vi) studies that did not use a miRNA microarray platform; (vii) studies profiling different histological subtypes; (viii) studies that did not include noncancerous tissue.

Data extraction
Two investigators (MM and XK) independently evaluated and extracted the data using standard protocols, and all discrepancies were resolved by a third investigator (MW). From the full text and corresponding supplemental information, the following eligibility items were collected and recorded for each study: author, region, period, selection and characteristics of the recruited PDAC patients, platform of miRNA expression profiling, and the list of up-and down-regulated miRNAs and their corresponding fold-changes. When the gene list was not available, the authors were contacted directly. All miRNA names were standardised according to miRBase version 20.

Vote-counting strategy
The miRNAs were ranked according to their importance as follows: (i) number of comparisons in agreement (i.e., listing the same miRNAs as having a consistent direction of change and being differentially expressed, respectively); (ii) total number of samples for comparison in agreement; (iii) average fold-changes reported for comparisons in agreement. Total sample size was considered more important than average fold-change because many studies did not report a fold-change. Furthermore, the average fold-change was based solely on the subset of studies for which a fold change value was available.

Robust rank aggregation method
The list of extracted miRNAs was ranked based on their associated p-values (less than 0.05 was considered significant) when their fold-changes were not reported. All of the protocols for the Robust Rank Aggregation method are free to download at the comprehensive R Archive Network website (http://cran. r-project.org/). Details can be found in the package documentation (http://cran.r-project.org/web/packages/ RobustRankAggreg/RobustRankAggreg.pdf). This method assigns a p-value to each element in the aggregated list, which indicates how much better it is ranked compared with a null model, expecting random ordering. To assess the stability of the acquired p-values, leave-one-out crossvalidation was applied in the Robust Rank Aggregation algorithm. This analysis was repeated 10,000 times, and each time, one random gene list was left out of the analysis. The p-values acquired from each round for each miRNA were then averaged.

MiRNA target prediction and enrichment analysis
The mRNA targets of the miRNA genes were predicted using TargetScan (http://www.targetscan.org/), miRDB (http://mirdb.org/miRDB/), and miRANDA (http://www. microrna.org/microrna/getGeneForm.do), as each algorithm determines target binding differently. We selected targets from the miRANDA/miSVR search with scores less than −1.25 for further analysis. Enrichment analyses for KEGG and Panther pathways and Gene Ontology terms were performed with the GeneCodis tool (http:// genecodis.dacya.ucm.es/). The potential targets of each miRNA were used as input.

Ethics statement
Ethical approval for this study was obtained from the Department of General Surgery of Ruijin Hospital at Shanghai Jiaotong University (Shanghai, China). All patients provided informed written consent for their tissues to be used for scientific research and to publish their case details.

Sample collection
Seventy-eight PDAC tissue samples and neighbouring noncancerous pancreatic tissue samples (collected postoperatively from September 2010 to August 2011) used in this study were obtained from the Department of General Surgery of Ruijin Hospital at Shanghai Jiaotong University (Shanghai, China). The specimens were obtained from patients undergoing PDAC resection with curative intent. All diagnoses were based on pathological and/or cytological evidence. The histological features of the specimens were evaluated by a senior pathologist according to the WHO (World Health Organization) classification criteria. The tissues were obtained before chemotherapy and radiation therapy. Upon removal of the surgical specimen, research personnel immediately transported the tissue to the surgical pathology lab. Pathology faculty performed a gross analysis of the specimen and selected pancreatic tissues that appeared to be cancerous and pancreatic tissues that appeared to be normal for analysis. Each sample was immediately frozen in liquid nitrogen and stored at −80°C prior to RNA isolation and qRT-PCR analysis. A second level of quality control was performed on the adjacent benign tissues. Histological slides were prepared from the section of frozen tissue that was directly adjacent to the tissue from which the RNA was isolated. These slides were examined by experienced pathologists to determine if the benign tissues contained any pancreatic tumour cells. Benign tissues that contained residual tumour tissues were not included in the study. Complete clinicopathological follow-up data of the PDAC patients from which the specimens were collected were available.

Statistical analysis
Kaplan-Meier survival analysis was used to analyse the association between postoperative survival and the miRNA expression level measured by qRT-PCR, and the resulting curves were divided into two classes (high and low expression in comparison to the mean level of miRNA expression as the threshold). Survival analysis was performed for each clinical covariate to assess their influence on outcome using a log-rank test. A multivariate Cox regression

Results
Five hundred and ninety-eight relevant publications were indexed in the databases mentioned above (Scopus, GEO, PubMed and ArrayExpress). According to the inclusion criteria and the identification of duplicate   publications, only fourteen independent studies [18][19][20][21][22][23][24][25][26][27][28][29][30][31] were included. However, one article was excluded for the unavailability of a ranked gene list both publically and in response to a request from the corresponding author [18]. The selection process is shown in Figure 1.
Among the analysed studies, some of the studies employed patient samples as low as 5 [19] or 3 [20], which was too small to provide any reliable data. Not surprisingly, these two studies [19,20] were the basis for excluding numerous candidates that were consistently reported as either up-or down-regulated in other studies. The most glaring example of the strategic error of including these two studies in our meta-analysis is miRNA-100, which, despite being reported to be up-regulated in 7 studies [21,23,24,26,28,29,31], was considered to be down-regulated in one of the aforementioned studies [19], which only employed 5 tumour samples. Therefore, if Ref 19 was included, miR-100 would be listed as a miRNA with an inconsistent direction and would be subsequently excluded from the list of most consistently reported miRNAs. In addition, the fold-change in this study [19] was very low (less than 2) and may not have been significant if a large sample size was analysed. Other examples include miR-145, miR-141, miR-379, miR-200c, and miR-125b, which were reported in an opposite direction solely in these two studies. To avoid these deviations, these two smallsample-size studies were excluded from our metaanalysis. A brief description of the eleven included studies [21][22][23][24][25][26][27][28][29][30][31] and the acronyms by which the studies are referred to in the following text are provided in Table 1.
The number of patients with PDAC that were investigated in these eleven studies ranged from 8 to 160 (median 47). The studies employed a diversity of microarray platforms (either commercial or custom), and the average number of miRNAs assayed was 778 (ranging from 377 to 1200; data were missing in three papers [23,24,29]). Only five studies [21][22][23]26,27] provided the whole list of differentially expressed miRNAs, while the others presented only a portion of their data. Our pooled dataset included a total of 538 tumour samples and 206 noncancerous control samples (at least), as in some studies, the number of noncancerous control samples was not specified [22].
A total of 439 differentially expressed miRNAs were reported in the eleven miRNA expression profiling studies; 254 were up-regulated and 185 were down-regulated in at least one study. Among the 439 miRNAs, 98 were reported in at least two studies; 77 (78.57%) with a consistent direction (Tables 2 and 3) and 21 with an inconsistent direction (Table 4) among the studies. Among the 77 miRNAs with a consistent direction, 50 were reported to be up-regulated (Table 2) and 27 were reported to be down-regulated (Table 3). One miRNA (miR-155) was reported in eight studies, three miRNAs (miR-21, miR-100 and miR-221) were reported in seven studies and twelve miRNAs were reported in at least five studies, with a consistent direction in all reports ( Table 5). The miRNAs that were consistently reported in at least five studies are shown in Table 5. Although there were no strong disagreements between the individual miRNA profiling studies, the top lists varied considerably from study to study.
Using the Robust Rank Aggregation method, we identified a statistically significant meta-signature of 7 up-and 3 down-regulated miRNAs in PDAC samples compared to noncancerous pancreatic tissues ( Table 6). All meta-signature miRNAs that reached statistical significance after Bonferroni correction were reported  by at least 5 datasets. Majority of the meta-signature miRNAs belong to the broadly conserved seed family (conserved across most vertebrates and bony fish).
Our results from the vote-counting strategy were almost the same with those from the Robust Rank Aggregation method. The ten GO processes and pathways that were most strongly enriched with respect to the metasignature miRNA candidates (miR-155, miR-100, miR-21, miR-221, miR-31, miR-143, miR-23a, miR-217, miR-148a and miR-375) are shown in Table 7.

Experimental validation of the expression levels of the most deregulated miRNAs in patients with PDAC
To determine if the ten most deregulated miRNAs from the meta-analysis (miR-155, miR-100, miR-21, miR-221, miR-31, miR-143, miR-23a, miR-217, miR-148a and miR-375) could be used as diagnostic biomarkers of PDAC, the expression levels of these miRNAs were compared between PDAC tissues and neighbouring noncancerous tissues by qRT-PCR analysis. The results showed that the expression levels of miR-155, miR-100, miR-21, miR-221, miR-31, miR-143 and miR-23a were increased, whereas the levels of miR-217, miR-148a and miR-375 were decreased in the PDAC tissues (all p<0.05). Detailed data are available in Table 8.

Determination of prognostic significance of the candidate miRNAs in PDAC
The clinicopathological characteristics of 78 PDAC patients are shown in Table 9. The expression levels of individual miRNAs along with other well-known potential prognostic clinicopathological factors, such as histology, T category, lymph node metastasis, tumour size, perineural invasion, venous invasion and margin were included in a univariate analysis. With respect to the miRNA expression levels, for the up-regulated miRNAs, a fold-change of ≥2 was defined as high expression, and a fold-change of <2 was defined as low expression; for the down-regulated miRNAs, a fold-change of ≥2 was defined as low expression, and a fold-change of <2 was defined as high expression. Patients with advanced disease (UICC stage IV and concomitance of distant metastases) were excluded because we assumed that the prognosis of these patients (n=8) is determined by the occurrence of relapse or metastasis rather than other  The number of predicted target genes in the process or pathway is shown. Hyp*: corrected hypergeometric p-value.
biological characteristics, such as miRNA expression levels.
Kaplan-Meier survival analysis was used to analyse the association between postoperative survival and the miRNA expression level, and the resulting curves were divided into two classes (high and low expression in comparison with the mean level of miRNA expression as the threshold), as shown in Figure 2.

Discussion
The common drawback of miRNA expression profiling studies is the lack of agreement among several studies. Differences in measurement platforms and lab protocols as well as small sample sizes can render gene expression levels incomparable. Sato et al. [32] and Wang et al. [33] systematically analysed representative miRNA profiling platforms and revealed that each platform is relatively stable in terms of its own intra-reproducibility; however, the inter-platform reproducibility among different platforms is low. Although the ideal method involves the analysis the raw miRNA expression datasets that are pooled together, such a rigorous approach is often impossible due to the unavailability of raw data and the low inter-platform concordance of results among different studies would bring difficulties to the analysis. To overcome these limitations, it might be better to analyse datasets separately and then aggregate the resulting gene lists. In this study, we used a meta-analysis approach to analyse PDAC-specific miRNAs derived from independent  profiling experiments. The well-known vote-counting strategy [12,13] and the recently published Robust Rank Aggregation method [16,17] were employed. The core elements of the two methods were searches for the most recognised miRNAs among the included studies. Two principal methods are used to measure miRNA expression levels: qRT-PCR and microarray hybridisation. The technological merits and drawbacks of qRT-PCR and microarrays for miRNA analysis are similar to those for RNA or genomic DNA quantification [34]. RT-PCR, a semiquantitative method, is labour intensive and provides data for only one, or very few, miRNA(s) per assay. However, the rapid increase in the number of known miRNAs renders this method inefficient on a genomic scale, and it is most likely better used as a tool for validation rather than discovery. Microarrays are the best option for a standardised genome-wide assay that is amenable to high-throughput application [35]. As qRT-PCR detects only preselected miRNAs, mostly the miRNAs that were shown to be differentially expressed in PDAC from normal tissue in other studies, it hinders the discovery of new miRNAs. Most importantly, the results of studies using qRT-PCR analysis [36][37][38][39][40] were consistent with those of microarray-based studies. In addition to the intra-platform deviations between microarray and qRT-PCR analyses [35], we excluded qRT-PCR-based studies and focused on studies using miRNA microarray platforms.
We identified a meta-signature of seven up-and three down-regulated miRNAs. To our knowledge, no metaanalysis of miRNA profiling studies has specifically investigated PDAC. Furthermore, this is the first study that used a combination of the two most commonly used methods in the meta-analysis of miRNA and gene profiling. To determine if the identified miRNAs could be used as diagnostic biomarkers, we experimentally validated the expression of these miRNAs in a set of PDAC samples.
There are several factors that must be considered when choosing miRNAs as candidate diagnostic biomarkers for PDAC. First, the fold-change of the biomarker should be significant enough to discriminate cancerous tissue from benign tissue. As is shown in Tables 2 and 3, the average fold changes of the 10 miRNAs identified in the microarray-based studies were all >2. In addition, the candidate miRNAs should be expressed in a majority of tissues. As was validated by qRT-PCR, the upregulated miRNAs were all expressed in more than 85% of the samples tested (data not shown).
Second, the biological function of each individual miRNA should be thoroughly investigated. A single miRNA may have dozens of targets, and a specific mRNA may be regulated by multiple different miRNAs [7]. A better understanding of the targets of the miRNAs would advance their use in clinical settings. As shown in Table 7, the ten most strongly enriched GO processes and pathways with respect to the meta-signature miRNA candidates were identified. In the GO processes list, regulation of transcription, DNA-dependent; positive regulation of transcription from RNA polymerase II promoter; and positive regulation of transcription, DNA-dependent were ranked as the top three, which is in accordance with the known primary functions of miRNAs [6,7]. Pathways in cancer and Wnt signalling pathways were ranked first in the KEGG and Panther pathway lists, respectively, highlighting the essential roles of miRNAs in cancer development.
Third, there should be adequate information about the pattern of expression of the miRNAs in different types of specimens. It has been indicated that circulating miRNAs in plasma could be more tissue-specific than tumour-specific [41,42]. In the context of the vast inconsistency between tissue-based and plasma-based results [23], we focused on studies that analysed miRNA expression between PDAC tissues and noncancerous pancreatic tissues in humans.
Last but not least, rigorous validation and demonstration of reproducibility in an independent cohort of patients are necessary to confirm the diagnostic value of miRNAs. With this in mind, we experimentally validated 10 candidate miRNAs in PDAC samples and confirmed that these 10 miRNAs were differentially expressed between PDAC tissues and noncancerous pancreatic tissues.
Considering that miRNA expression is able to successfully discriminate normal from cancerous pancreatic tissues, it is tempting to speculate that miRNAs could also predict cancer prognosis. However, our results do not exclude the possibility that other miRNAs are associated with prognosis, as we only studied a meta-signature of 10 miRNAs in a limited number of PDAC samples (n=78). The main reason for the possible association between miRNAs not within this meta-signature and prognosis may centre on the relatively small sample size in our study and others [25,27]. It is quite unrealistic to include all the miRNAs in Kaplan-Meier survival analyses, as it would be very laborious and time-consuming. Thus, commonly, only the candidate miRNAs with the greatest fold changes are included. As mentioned above, although there were no strong disagreements between the individual miRNA profiling studies, the top lists varied considerably from study to study. To remedy this problem, it was critical to identify the most differentially expressed miRNAs. We used a meta-review approach, which combines the results of several individual studies to increase statistical power and to subsequently resolve the inconsistency among different profiling studies. A meta-signature of seven up-and three down-regulated miRNAs was identified. Then, in independent patient samples, miR-21, miR-31 and miR-375 were found to be associated with cancer prognosis. From our point of view, great caution should be taken in future research in this field. To start, sample sizes should be increased to minimise random sampling error. Next, as it is impossible for every researcher to use the same platform, reliable microarray platforms should be employed in all experiments. Finally, it is advisable to obtain an integrated view of the candidate miRNAs from many studies to avoid one-sided opinions, as great discrepancies exist among the studies.
Our study presents a method to resolve the differences that exist among studies and might have some clinical significance for research on miRNAs in PDAC. The 10 identified miRNAs may be used as diagnostic biomarkers or even therapeutic targets. In addition to our metaanalysis, we performed further studies examining the expression of the candidate miRNAs in PDAC samples and confirmed miR-21, miR-31 and miR-375 as potential prognostic biomarkers for PDAC.