What data are collected?

The Generations Study involves over 100,000 women, aged 16 years or older resident in the UK. Those taking part in the study are asked to fill out follow-up questionnaires every few years. On occasion, they are invited to provide additional samples as described below.


All participants completed the recruitment questionnaire when they joined the study, mostly (97%) during 2004-2009. Follow up questionnaires were sent around 2 ½, 6, 9 ½, and 13 years after recruitment, with further follow-up questionnaires planned. The questionnaires ask about participant exposures, and changes in exposures, to existing and potential risk factors such as:

  • What is your BMI, now and in earlier life
  • How many children have you had
  • Whether you drink alcohol or smoke
  • Whether you have had breast disease or other medical conditions
  • How often are you engaging in physical activity
  • Whether you have taken the oral contraceptive pill
  • If you have ever used hormone replacement therapy

The first and second follow-up questionnaires were completed on paper with 99% and 97% of questionnaires returned respectively. The third questionnaire was a mix of online and paper surveys and achieved a 96% response rate. The fourth questionnaire at 13 years after recruitment was affected by the Covid-19 pandemic, but 83.5% of the participants we contacted by post or e-mail returned a completed questionnaire.

Blood Samples

Participants (92% of those who completed the recruitment questionnaire) have provided blood samples that have been analysed to investigate potential genetic and hormonal risk factors.

Cancer incidence and mortality

Participant self-reported cancer diagnoses and deaths reported by family members are confirmed through national cancer registration and death registers (The National Health Service Central Registers).

Pathology Specimens

We collect paraffin-embedded tumour blocks for breast cancers and are processing the blocks to isolate the DNA/RNA for molecular assays.


We have obtained serial pre-diagnostic mammograms from patient breast cancer screenings on a nested case-control analysis. We have expanded this collection to cover the whole cohort that is of screening age.


We have created a nested matched case-control dataset of identified incident cases of breast cancer and controls matched on age, ethnicity, blood sample availability, and time at risk. Participant blood samples are sent to external laboratories for genotyping. We have calculated polygenic risk scores (PRS) for cases and controls using this data.

Hormones and other assays

We hold assay measurements for a nested breast cancer case-control dataset for oestradiol, testosterone, C-peptide, prolactin, sex hormone binding globulin, estrone, estrone sulphate, progesterone, insulin-like growth factor-1, leptin, and anti-Müllerian hormone (AMH).


We have collected participant physical activity data (24 hour continuous 8-day triaxial 100Hz accelerometery) for a smaller subset of the Generations study. Data collection was suspended during the Covid-19 pandemic, but data collection has restarted, now with a focus on participants with breast cancer.

Urine Samples

We have collected urine samples for a subset of pre and post-menopausal women who are not taking oral or other hormonal contraceptives.