scispace - formally typeset
Open AccessJournal ArticleDOI

An online survival analysis tool to rapidly assess the effect of 22,277 genes on breast cancer prognosis using microarray data of 1,809 patients.

TLDR
An online tool to draw survival plots, which can be used to assess the relevance of the expression levels of various genes on the clinical outcome both in untreated and treated breast cancer patients, and which validated the capability of microarrays to determine estrogen receptor status in 1,231 patients.
Abstract
Validating prognostic or predictive candidate genes in appropriately powered breast cancer cohorts are of utmost interest. Our aim was to develop an online tool to draw survival plots, which can be used to assess the relevance of the expression levels of various genes on the clinical outcome both in untreated and treated breast cancer patients. A background database was established using gene expression data and survival information of 1,809 patients downloaded from GEO (Affymetrix HGU133A and HGU133+2 microarrays). The median relapse free survival is 6.43 years, 968/1,231 patients are estrogen-receptor (ER) positive, and 190/1,369 are lymph-node positive. After quality control and normalization only probes present on both Affymetrix platforms were retained (n = 22,277). In order to analyze the prognostic value of a particular gene, the cohorts are divided into two groups according to the median (or upper/lower quartile) expression of the gene. The two groups can be compared in terms of relapse free survival, overall survival, and distant metastasis free survival. A survival curve is displayed, and the hazard ratio with 95% confidence intervals and logrank P value are calculated and displayed. Additionally, three subgroups of patients can be assessed: systematically untreated patients, endocrine-treated ER positive patients, and patients with a distribution of clinical characteristics representative of those seen in general clinical practice in the US. Web address: www.kmplot.com . We used this integrative data analysis tool to confirm the prognostic power of the proliferation-related genes TOP2A and TOP2B, MKI67, CCND2, CCND3, CCNDE2, as well as CDKN1A, and TK2. We also validated the capability of microarrays to determine estrogen receptor status in 1,231 patients. The tool is highly valuable for the preliminary assessment of biomarkers, especially for research groups with limited bioinformatic resources.

read more

Content maybe subject to copyright    Report

PRECLINICAL STUDY
An online survival analysis tool to rapidly assess the effect
of 22,277 genes on breast cancer prognosis using microarray
data of 1,809 patients
Balazs Gyo
¨
rffy
Andras Lanczky
Aron C. Eklund
Carsten Denkert
Jan Budczies
Qiyuan Li
Zoltan Szallasi
Received: 3 November 2009 / Accepted: 3 December 2009 / Published online: 18 December 2009
Ó Springer Science+Business Media, LLC. 2009
Abstract Validating prognostic or predictive candidate
genes in appropriately powered breast cancer cohorts are of
utmost interest. Our aim was to develop an online tool to
draw survival plots, which can be used to assess the rele-
vance of the expression levels of various genes on the
clinical outcome both in untreated and treated breast cancer
patients. A background database was established using
gene expression data and survival information of 1,809
patients downloaded from GEO (Affymetrix HGU133A
and HGU133?2 microarrays). The median relapse free
survival is 6.43 years, 968/1,231 patients are estrogen-
receptor (ER) positive, and 190/1,369 are lymph-node
positive. After quality control and normalization only
probes present on both Affymetrix platforms were retained
(n = 22,277). In order to analyze the prognostic value of a
particular gene, the cohorts are divided into two groups
according to the median (or upper/lower quartile) expres-
sion of the gene. The two groups can be compared in terms
of relapse free survival, overall survival, and distant
metastasis free survival. A survival curve is displayed, and
the hazard ratio with 95% confidence intervals and logrank
P value are calculated and displayed. Additionally, three
subgroups of patients can be assessed: systematically
untreated patients, endocrine-treated ER positive patients,
and patients with a distribution of clinical characteristics
representative of those seen in general clinical practice in
the US. Web address: www.kmplot.com. We used this
integrative data analysis tool to confirm the prognostic
power of the proliferation-related genes TOP2A and
TOP2B, MKI67, CCND2, CCND3, CCNDE2, as well as
CDKN1A, and TK2. We also validated the capability of
microarrays to determine estrogen receptor status in 1,231
patients. The tool is highly valuable for the preliminary
assessment of biomarkers, especially for research groups
with limited bioinformatic resources.
Keywords Survival analysis Breast cancer Prognosis
Background
Biomarkers are a readily measurable set of parameters with
directly applicable information on the clinical course of
cancer. The first biomarkers were established at the cellu-
lar, histological, or whole organism level. For example,
tumor grade has traditionally been regarded as an important
indicator of breast cancer prognosis [1]. Also, Adjuvant!
Online, a SEER (Surveillance Epidemiology and End
Results—an authoritative source of information on cancer
incidence and survival in the United States) data-based
algorithm, integrates various clinical (age, nodal status)
B. Gyo
¨
rffy (&) A. Lanczky
Joint Research Laboratory of the Hungarian Academy
of Sciences and the Semmelweis University, Semmelweis
University 1st Department of Pediatrics, Bokay u. 53-54,
1083 Budapest, Hungary
e-mail: zsalab2@yahoo.com
A. Lanczky
Pazmany Peter University, Budapest, Hungary
A. C. Eklund Q. Li Z. Szallasi
Center for Biological Sequence Analysis, Technical University
of Denmark, Lyngby, Denmark
C. Denkert J. Budczies
Charite
´
Universitaetsmedizin, Berlin, Germany
Z. Szallasi
Children’s Hospital Informatics Program at the Harvard-MIT
Division of Health Sciences and Technology (CHIP@HST),
Harvard Medical School, Boston, MA, USA
123
Breast Cancer Res Treat (2010) 123:725–731
DOI 10.1007/s10549-009-0674-9

and histopathological parameters (estrogen receptor, size,
grade) to predict 10-year mortality rate in breast cancer
[2, 3]. With the introduction of biomarkers such as estrogen
receptor and HER2 in evaluating the clinical course of
breast cancer, biomarker discovery has shifted toward a
more molecular level with a large number of individual
gene or protein expression levels being tested. To date
numerous additional genes have been suggested as being
capable to predict prognosis in breast cancer [4]. This shift
has also been further investigated by the fact that qualita-
tive biomarkers are usually difficult to assess in a consis-
tent fashion; e.g., the concordance of tumor grade
assessments by three independent pathologists is less than
50% [5].
Following the identification of new gene expression-
based biomarkers various steps of independent validations
must be completed. While direct measurement of gene
expression levels, e.g., by QRT–PCR, is the most reliable
method to do this; it is often desirable to test few candidate
genes without major further investment in order to choose
the most promising candidates and eliminate those that are
most likely to fail. Microarray cohorts combined with
appropriate clinical data offer exactly such a cost effective
tool to prescreen potential new biomarkers.
The accuracy of microarray-based gene expression
measurements has been evaluated by a wide array of
diverse studies [68], leading to the general conclusion that
it is a powerful surveyor of gene expression changes when
its limitations are considered properly. While absolute gene
expression levels are hard to estimate, relative gene
expression levels can be measured in a consistent fashion;
therefore, a preliminary test to evaluate prognostic bio-
markers based on their relative gene expression levels is a
prudent exploitation of already existing clinical microarray
cohorts.
The Kaplan–Meier estimator (also known as the product
limit estimator) estimates the survival function from life-
time data. An important benefit of the Kaplan–Meier curve
is that the method takes into account ‘censored’ data—
losses from the cohort before the final outcome is observed
(for instance, if a patient withdraws from a study). When
no truncation or censoring occurs, the Kaplan–Meier curve
is equivalent to the empirical distribution [9]. The associ-
ation between a clinical parameter (or biomarker) and
survival can be visualized by drawing a Kaplan–Meier plot
in which patients are split into groups according to the
parameter.
Our aim was to use the data generated in gene expres-
sion studies to develop an online survival analysis tool that
can be used to assess the effect of single genes on breast
cancer prognosis. Since many of the current ASCO pro-
posed proliferation-related genes [10] do not hold sufficient
evidence to be introduced in clinical practice, we also
aimed to assess the effect of their expression on survival.
Finally, we evaluated the capability of microarray data to
predict estrogen receptor (ER) status.
Methods
A database was established using gene expression data
downloaded from GEO. For this, the keywords ‘breast’’,
‘cancer’’, ‘gpl96’’, and ‘gpl570’ were used in GEO (http://
www.ncbi.nlm.nih.gov/geo/). Only publications with
available raw data, clinical survival information, and at least
30 patients were included. Only Affymetrix HG-U133A
(GPL96) and HG-U133 Plus 2.0 (GPL570) microarrays
were considered, because they are frequently used and
because these two particular arrays have 22,277 probe sets in
common. The use of nearly identical platforms is important
since different platforms for gene-expression profiling
measure expression of the same gene with varying precision,
on different relative scales, and with different dynamic
ranges [11]. An overview of the clinical data is presented on
Table 1.
After an initial quality control, redundant samples
(n = 384) were excluded [12]. The raw CEL files were
MAS5 normalized in the R statistical environment (www.
r-project.org) using the affy Bioconductor library [13].
MAS5 can be applied to individual chips, making future
extensions of the database uncomplicated. Moreover, MAS5
ranked among the best normalization methods when com-
pared to the results of RT-PCR measurements in our recent
study [8]. Then, only probes measured on both GPL96 and
GPL570 were retained (
n = 22,277). At this stage, we
performed a second scaling normalization to set the average
expression on each chip to 1,000 to avoid batch effects [14].
The Kaplan–Meier plotter is set up using a central server
which can be reached over the internet. The background
database is handled by a MySQL server, which integrates
gene expression and clinical data simultaneously. Data is
loaded into the R statistical environment, where calcula-
tions are performed. The package ‘survival’ is used to
calculate and plot Kaplan–Meier survival curves, and the
number-at-risk is indicated below the main plot. Hazard
ratio (and 95% confidence intervals) and logrank P are
calculated and displayed. The user receives the feedback
over the webpage. The system is summarized on Fig. 1.
In order to determine expression of the ER gene ESR1,
we used the results from Gong et al. [15], who found that
the probe set 205225_at had the highest mean and median
expression values, the greatest range of expression values,
and the strongest correlation with clinical ER status, and
was therefore suggested for future ESR1 determinations.
We also used their suggested threshold of 500 to determine
ER status of the samples.
726 Breast Cancer Res Treat (2010) 123:725–731
123

When comparing data from Surveillance, Epidemiology,
and End Results (SEER), the population-based tumor reg-
istry program of the National Cancer Institute [16] to the
overall characteristics of the patients used in our analysis
(only patients with all available clinical data), some dif-
ferences were observed. These differences could influence
actual results when interpreting the resulting Kaplan–Meier
plot. Therefore, a randomization algorithm-selected set of
patients of similar, over-represented clinical characteristics
were removed in making an additional filter for the
analysis.
Results
We identified 1,809 unique patients meeting our criteria in
GEO. The median relapse free survival is 6.43 years, 968/
1,231 patients are estrogen-receptor positive by histologi-
cal or radioimmunoassay based evaluation, and 190/1,369
are lymph-node positive. Furthermore, 1,593 patients have
relapsed free survival data, 594 have overall survival data
and 767 have distant metastasis free survival data.
In order to analyze the association between a queried gene
and survival, the samples are grouped according to the
median (or upper or lower quartile) expression of the
selected gene, and then the two groups are compared by a
Kaplan–Meier plot. Before running the analysis, the patients
can be filtered using ER status, lymph node status, and/or
grade. Additionally, as an alternative to relapse free sur-
vival, overall survival and distant metastasis free survival
can be employed. The web address is www.kmplot.com.
Many of the published microarray cohorts used patient
selection criteria corresponding to the goals of the partic-
ular study. Therefore, the patients in our database may not
be representative of breast cancer patients in general. Users
of our service may be interested how a given gene is
associated with outcome in a general ‘all comer’ cohort,
as might be seen in the everyday clinical practice. For this
we established a patient cohort similar to SEER published
prevalences. The eliminated samples were ER positive,
node negative patients in all three grades from different
datasets. The resulting reduced database includes 500
patients, and the prevalences of the individual breast cancer
subtypes and clinical parameters are similar to the actual
US prevalence numbers (Table 2).
A clinician might be interested in a specific clinical
question related to the treatment of the patients. Therefore,
we established two options for additional filtering: the first
cohort represents a truly prognostic setting (e.g., systemi-
cally untreated patients, n = 809) and the second cohort
the endocrine-treated ER positive patients (n = 414).
The ER status as determined by IHC was available for
1,231 patients, which we used to assess the efficacy of ER
Table 1 Clinical properties of the microarray datasets used in the analysis
GEO ID Platform ER? Lymph node ? Relapse event Average relapse
free survival
Grade: 1/2/3 Age (years) Size (cm) # of CEL files after
quality control
References
GSE12276 GPL570 NA NA 204 (100%) 2.2 ± 1.8 NA NA NA 204 [21]
GSE16391 GPL570 55 (100%) 33 (60%) 55 (100%) 3.0 ± 1.2 2/35/18 61 ± 9NA 55 [22]
GSE12093 GPL96 136 (100%) 0 (0%) 20 (15%) 7.7 ± 3.2 NA NA NA 136 [23]
GSE11121 GPL96 NA 0 (0%) 46 (23%) 7.8 ± 4.2 58/136/35 NA 2.1 ± 1 200 [24]
GSE9195 GPL570 77 (100%) 36 (47%) 13 (17%) 7.8 ± 2.5 14/20/24 64 ± 9 2.4 ± 177 [25]
GSE7390 GPL96 134 (68%) NA 91 (46%) 9.3 ± 5.6 30/83/83 46 ± 7 2.2 ± 0.8 198 [26]
GSE6532 GPL96 70 (86%) 22 (27%) 19 (23%) 6.1 ± 3.1 0/54/1 64 ± 10 2.5 ± 1.2 82 [27]
GSE5327 GPL96 0 (0%) NA 11 (19%) 6.8 ± 3.1 NA NA NA 58 [28]
GSE4922 GPL96 1 0 0 12.17 1 69 2.2 1 [29]
GSE3494 GPL96 213 (85%) 84 (33%) NA NA 67/128/54 62 ± 14 2.2 ± 1.3 251 [30]
GSE2990 GPL96 73 (72%) 15 (15%) 40 (39%) 6.6 ± 3.9 27/20/36 58 ± 12 2.3 ± 1.1 102 [31]
GSE2034 GPL96 209 (73%) 0 107 (37%) 6.5 ± 3.5 NA NA NA 286 [32]
GSE1456 GPL96 NA NA 40 (25%) 6.2 ± 2.3 28/58/61 NA NA 159 [33]
Total 968 (78%) 190
(15%) 689 (43%) 6.4 ± 4.1 198/534/312 57 ± 13 2.2 ± 1.1 1,809
Parentheses: percentage of patients within the dataset
Breast Cancer Res Treat (2010) 123:725–731 727
123

determination on the microarray. The ER-positive samples
(n = 968) had a markedly higher expression of the ESR1
gene than did the ER negative samples (n = 263). On
Fig. 2, we illustrate the distribution of ER positive and ER
negative samples as measured by microarray and IHC.
90.2% of the ER positive (945 out of 1,048), and 89.8% of
ER negative (160 out of 183) predictions were correct.
Markers of cell proliferation have been proposed and
evaluated as prognostic factors in breast cancer. We com-
puted Kaplan–Meier plots for the markers Ki67, cyclin D,
cyclin E, the cyclin inhibitors p27 and p21, thymidine
kinase, and topoisomerase II to assess their effect on
prognosis (Table 3; Fig. 3).
Table 2 Overall clinical characteristics of the patients in our data-
base, and the subset designed to match US prevalences are compared
to SEER reported US prevalences
All
a
Prevalence-matched subset SEER
n % n %%
ER? 774 87.8 412 82.4 76.3
ER- 108 12.2 88 17.6 23.7
Node? 176 20.0 168 33.6 36.5
Node- 706 80.0 332 66.4 63.5
Grade 1 166 18.8 86 17.2 17.1
Grade 2 469 53.2 219 43.8 44.0
Grade 3 247 28.0 195 39.0 38.9
Total n 882 500
a
Only samples for which all clinical data was available
simultaneously
0
5000
10000
15000
20000
25000
30000
35000
40000
IHC:1IHC:0
normalized value of ESR1 expression
Fig. 2 Box plot showing normalized expression of ESR1 (probe set
205225_at) in 1,231 tumors divided into two groups based on the IHC
diagnosis of ER (1 = ER positive, n = 968; 0 = ER negative,
n = 263)
Query
http://www.kmplot.com
Raw CEL files
n=2193
mySQL
database
Combining platforms and
second scaling normalization
(average expression=1000)
Clinical
annotation
Plotting in R
Graphical feedback of
KM-plot and p value
Filtering for gene
expression and input
parameters in R
Quality control and MAS5
normalization
remaining n=1809
GEO
SEER
data
Fig. 1 Flowchart of the
Kaplan–Meier plotter
728 Breast Cancer Res Treat (2010) 123:725–731
123

Discussion
The discovery of prognostic markers is a high priority task
in breast cancer biomarker research. In our study, we
combined raw data from several studies; this enabled us to
treat the data as a single dataset which makes the use of
existing algorithms directly applicable. By combining
multiple datasets the statistical power is dramatically
increased. Prior to our study, no suitable tool was available
which could help to estimate the prognostic value of any
selected gene in a large cohort of clinical patients. In our
service, after dividing the patients into two groups based on
the expression of the selected gene, a Kaplan–Meier plot is
generated. In this, 1,809 patient are used all together, of
which 1,593 have relapse free survival data, 594 have
overall survival data, and 767 have distant metastasis free
survival data. As our service performs the requested anal-
ysis in real time on the original data, the extension of the
analysis (e.g., the inclusion of additional samples or fil-
tering for other clinical parameters) will be easily feasible
in the future.
Since gene expression arrays might be used to confirm
ER status, we implemented an estimation of ER status
based on gene expression data. Previous studies have
shown significant correlation between mRNA concentra-
tions and routinely established (IHC based) clinical ER
status [1719]. In the study of Gong et al. [15] the same
platform was used as in our study. They used immuno-
histochemistry to independently measure the ER status and
to establish a statistical threshold for ESR1 mRNA level to
assign ER status to tumor samples. They suggested using
an ESR1 mRNA cutoff value of 500 to identify ER positive
status with an overall accuracy of 90%. By using the above
threshold in the 1,231 patients with available ER status
data, we also achieved overall accuracy of 90%. Thus, we
confirmed the capability to use microarrays to measure ER
status. Because we performed a second scaling normali-
zation, the original MAS5 expression values (as used in the
study of Gong et al.) were slightly transformed. However,
this transformation made it possible to compare gene
expression measurements made on two different micro-
array platforms. On our webpage, the ER status for all
Table 3 The association between proliferation genes and relapse-free survival
Marker Gene name Affymetrix ID HR RFS p
MKI67 Antigen identified by monoclonal antibody Ki-67 212020_s_at 0.95 (0.82–1.1) 1
212021_s_at 1.13 (0.97–1.31) 1
212022_s_at 1.8 (1.5–2.1) 1.14E-12
212023_s_at 1.3 (1.1–1.5) 0.0352
CCND1 Cyclin D1 208711_s_at 1.3 (1.1–1.5) 0.0374
208712_at 1.07 (0.93–1.25) 1
a
CCND2 Cyclin D2 200951_s_at 1.2 (1.0–1.4) 0.946
200952_s_at 0.62 (0.53–0.72) 1.23E-08
200953_s_at 0.68 (0.58–0.79) 9.02E-06
CCND3 Cyclin D3 201700_at 0.7 (0.6–0.82) 0.000114
CCNE1 Cyclin E1 213523_at 1.2 (1.1–1.4) 0.1518
CCNE2 Cyclin E2 205034_at 2.5 (2.1–2.9) \1e-16
211814_s_at 1.2 (1.0–1.3) 1
CDKN1B Cyclin-dependent kinase inhibitor 1B (p27, Kip1) 209112_at 1.3 (1.1–1.5) 0.0132
CDKN1A Cyclin-dependent kinase inhibitor 1A (p21, Cip1) 202284_s_at 0.68 (0.59–0.79) 1.21E-05
TK1 Thymidine kinase 1, soluble 202338_at 1.2 (1.0–1.4) 0.506
TK2 Thymidine kinase 2, mitochondrial 204227_s_at 0.53 (0.45–0.62) 7.26E-15
a
204276_at 0.67 (0.58–0.78) 4.18E-06
204277_s_at 0.81 (0.70–0.94) 0.1496
TOP2A Topoisomerase (DNA) II alpha 170 kDa 201291_s_at 2.3 (2.0–2.7) \1e-16
201292_at 1.8 (1.6–2.1) 2.05E-13
a
TOP2B Topoisomerase (DNA) II beta 180 kDa 211987_at 1.7 (1.5–2.0) 4.4E-11
The patients were divided into two groups as having higher or lower expression as compared to the median. Bonferroni multiple testing
correction was applied when generating the P value
RFS relapse free survival, HR hazard ratio
a
See Kaplan–Meier plots on Fig. 3
Breast Cancer Res Treat (2010) 123:725–731 729
123

Citations
More filters
Journal ArticleDOI

Cutoff Finder: A Comprehensive and Straightforward Web Application Enabling Rapid Biomarker Cutoff Optimization

TL;DR: The functionality of Cutoff Finder is illustrated by the analysis of the gene expression of estrogen receptor and progesterone receptor in breast cancer tissues, which is analyzed and correlated with immunohistologically determined ER status and distant metastasis free survival.
Journal ArticleDOI

Single-cell analysis reveals a stem-cell program in human metastatic breast cancer cells.

TL;DR: It is shown at the single-cell level that early stage metastatic cells possess a distinct stem-like gene expression signature, which supports a hierarchical model for metastasis, in which metastases are initiated by stem- like cells that proliferate and differentiate to produce advanced metastatic disease.
Journal ArticleDOI

Prolyl-4-hydroxylase α subunit 2 promotes breast cancer progression and metastasis by regulating collagen deposition

TL;DR: It is shown that P4HA2 was associated with expression of Col1A1, Col3A2, and Col4A1 during breast cancer development and progression and identified P4 HA2 as a potential therapeutic target and biomarker for breast cancer progression.
Journal ArticleDOI

Implementing an online tool for genome-wide validation of survival-associated biomarkers in ovarian-cancer using microarray data from 1287 patients.

TL;DR: In this article, the authors developed a global online biomarker validation platform that mines all available microarray data to assess the prognostic power of 22,277 genes in 1287 ovarian cancer patients.
References
More filters
Book ChapterDOI

Nonparametric Estimation from Incomplete Observations

TL;DR: In this article, the product-limit (PL) estimator was proposed to estimate the proportion of items in the population whose lifetimes would exceed t (in the absence of such losses), without making any assumption about the form of the function P(t).
Journal ArticleDOI

A Multigene Assay to Predict Recurrence of Tamoxifen-Treated, Node-Negative Breast Cancer

TL;DR: The recurrence score has been validated as quantifying the likelihood of distant recurrence in tamoxifen-treated patients with node-negative, estrogen-receptor-positive breast cancer and could be used as a continuous function to predict distant recurrent in individual patients.
Journal ArticleDOI

affy---analysis of Affymetrix GeneChip data at the probe level

TL;DR: The affy package is an R package of functions and classes for the analysis of oligonucleotide arrays manufactured by Affymetrix that provides the user with extreme flexibility when carrying out an analysis and make it possible to access and manipulate probe intensity data.
Journal ArticleDOI

Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer.

TL;DR: The ability to identify patients who have a favourable prognosis could, after independent confirmation, allow clinicians to avoid adjuvant systemic therapy or to choose less aggressive therapeutic options.
Journal ArticleDOI

Gene Expression and Benefit of Chemotherapy in Women With Node-Negative, Estrogen Receptor–Positive Breast Cancer

TL;DR: The RS assay not only quantifies the likelihood of breast cancer recurrence in women with node-negative, estrogen receptor-positive breast cancer, but also predicts the magnitude of chemotherapy benefit.
Related Papers (5)

Comprehensive molecular portraits of human breast tumours

Daniel C. Koboldt, +355 more
- 04 Oct 2012 - 
Frequently Asked Questions (14)
Q1. What are the contributions mentioned in the paper "An online survival analysis tool to rapidly assess the effect of 22,277 genes on breast cancer prognosis using microarray data of 1,809 patients" ?

In this paper, an online tool to draw survival plots, which can be used to assess the relevance of the expression levels of various genes on the clinical outcome both in untreated and treated breast cancer patients. 

As their service performs the requested analysis in real time on the original data, the extension of the analysis ( e. g., the inclusion of additional samples or filtering for other clinical parameters ) will be easily feasible in the future. They suggested using an ESR1 mRNA cutoff value of 500 to identify ER positive status with an overall accuracy of 90 %. Therefore, the authors suggest the use of above prognostic genes as measured using microarrays. The integrative genomic analysis is still evolving ; thus future integration of additional forms of data such as sequence, location, or copy number variations might potentially add vital additional information which will enable us to deliver higher accuracy in prognosis prediction. 

The advantage of the use of the median for splitting is the negligible effect of outliers, which—due to the high dynamic range of the microarrays—could seriously skew the results when using the mean. 

Since gene expression arrays might be used to confirm ER status, the authors implemented an estimation of ER status based on gene expression data. 

1,593 patients have relapsed free survival data, 594 have overall survival data and 767 have distant metastasis free survival data. 

With the introduction of biomarkers such as estrogen receptor and HER2 in evaluating the clinical course of breast cancer, biomarker discovery has shifted toward a more molecular level with a large number of individual gene or protein expression levels being tested. 

The use of nearly identical platforms is important since different platforms for gene-expression profiling measure expression of the same gene with varying precision, on different relative scales, and with different dynamic ranges [11]. 

An important benefit of the Kaplan–Meier curve is that the method takes into account ‘‘censored’’ data— losses from the cohort before the final outcome is observed (for instance, if a patient withdraws from a study). 

The median relapse free survival is 6.43 years, 968/ 1,231 patients are estrogen-receptor positive by histological or radioimmunoassay based evaluation, and 190/1,369 are lymph-node positive. 

As their service performs the requested analysis in real time on the original data, the extension of the analysis (e.g., the inclusion of additional samples or filtering for other clinical parameters) will be easily feasible in the future. 

In principle, a cutoff-free correlation analysis of gene expression and survival data is possible using Cox proportional hazard models. 

The authors must note a limitation of their approach: the use of the median (or upper/lower quartile) sample for dividing the samples into high- and low-expression groups. 

a randomization algorithm-selected set of patients of similar, over-represented clinical characteristics were removed in making an additional filter for the analysis. 

NA 15 9[3 3]T ota l9 68 (78 %) 19 0(1 5% )6 89 (43 %) 6.4 ±4 .11 98 /53 4/3 12 57 ±1 32 .2± 1.1 1,8 09P are nth eses :p erce nta ge of pat ien tsw ith inth ed atas etdetermination on the microarray.