Research Data Access

Over the last five years, we have been collecting data from families across the UK with children with intellectual disability of known genetic aetiology. IMAGINE ID has successfully collected the largest dataset of genotypic and phenotypic in children with intellectual disability with the aim to collaborate and share our data. The IMAGINE ID research data are available as a resource for the scientific community to maximize the value of the data for research and eventual patient and public benefit. The IMAGINE ID database will be available to the scientific community by the end of 2019. We have detailed information regarding requesting data, data availability, data security and patient confidentiality below.

We are accepting data requests for genetic and phenotypic data. To apply for data please fill out our Data Access Form and submit it to ich.imagineid@ucl.ac.uk

From May/June 2019 all requests for genetic data linked to phenotypic data will be reviewed through the METADAC committee. Committee dates TBC.

The IMAGINE ID database will be uploaded to the European Genome-Phenome Archive and the UK Data Archive by the end of 2019.

Requesting data

The IMAGINE ID dataset has been obtained in confidence from study participants. Preserving the confidentiality of the data is of paramount importance. The IMAGINE ID database does not contain any individual identifiers such as names, addresses, telephone numbers, NHS numbers and the like, conforming to the UK Data Protection Act.

IMAGINE ID data cannot be used for commercial purposes. Data sharing at IMAGINE ID involves an application process for bona fide researchers with an established scientific record. This approach ensures the reputations of the funding bodies, the IMAGINE ID team and its participants are not compromised through unethical, premature or opportunistic data analysis. We follow the MRC definition of bona fide research:

MRC Policy and Guidance on Sharing of Research Data from Population and Patient Studies (page 24)

In line with this definition, we will accept applications from bona fide researchers who:

  1. conduct bona fide research. This involves high quality, ethical projects for research purposes using rigorous scientific methods. There must be an intention to publish the research findings for wider scientific and eventual public benefit, without restrictions and with minimal delay.
  2. have a formal relationship with a bona fide research organisation, which is an established academic institution, research body or organisation with the capability to lead or participate in high quality, ethical research. It is not a requirement that research is the primary business of that organisation, or that the organisation is publicly financed. In this context, a public-private partnership may qualify as a bona fide research organisation.

Please see the IMAGINE ID Data Access Policy and IMAGINE ID Data Access Form for full details of the application and approval process.

Data overview

The IMAGINE ID data available to bona fide researchers include demographic, genetic, medical, mental health diagnoses and adaptive behaviour data.

The full data catalogue for data collected online for the IMAGINE ID cohort is available here

The data catalogue for data collected face to face on a subset of the IMAGINE ID cohort is available here

The face to face data is collected on children with the following genetic changes: 1q21.1 deletion/duplication, 15q11.2 deletion/duplication, 15q13.3 deletion/duplication, 16p11.2 deletion/duplication, 22q11.2 deletion/duplication, Kleefstra syndrome (9q34 deletion), Shank3 deletion, NRXN1 deletion and IDIC 15 duplication.

Data security

The approved projects will require evidence of ethical approval. Ethical approval from the data user’s local ethical committee is the responsibility of the data user.

IMAGINE ID data can be provided without restriction within the EEA, which comprises all countries in the European Union, Iceland, Liechtenstein, Norway and the States of Guernsey.

IMAGINE ID data can also be transferred to a country or territory outside the EEA if the applicant and their collaborators provide an adequate level of protection of personal data and operate under the Data Protection Scheme in place in their country. If requested by the IMAGINE ID team, a copy of this Data Protection registration should be sent to the IMAGINE ID Contact Researcher.

As mentioned above, it will be the responsibility of IMAGINE ID data users to ensure that no participant’s identity is disclosed under any circumstances. IMAGINE ID data users must also preserve the confidentiality of the data in outputs and publications (e.g., tables with small numbers might lead to the identification of individuals).

It is forbidden to match or attempt to match individual records to any other data.

IMAGINE ID data is held securely in environments which meet the ISO27001 standard. Secure data access, such as passwords, firewalls, etc., must be in place to ensure that the data are kept secure. IMAGINE ID data users should access the provided dataset using a network drive set up by the academic organization where they work, and avoid keeping datasets on their own PC/laptop, unless the device is encrypted (minimum AES-256). This network drive should be accessible through password control.

IMAGINE ID data will be transferred encrypted and password protected and the passwords must not be shared with anyone not named on the Data Request Form.

IMAGINE ID data should be retained only for as long as it is needed. On completion of the project, the data must no longer be used and must be destroyed securely.

Participant confidentiality

The IMAGINE ID dataset has been obtained in confidence from study participants. Preserving the confidentiality of the data is of paramount importance.

The IMAGINE ID database does not contain any individual identifiers such as names, addresses, telephone numbers, NHS numbers and the like, conforming to the UK Data Protection Act.

Extra restrictions have been applied to the data released to external IMAGINE ID data users. We use two basic tools to achieve this:

  1. Participants are identified using an anonymised ID that will be different for each IMAGINE ID data sharing project. The purpose is to avoid linkages that could potentially result in the identification of individuals. Identifiers are unique but family members can be identified by the last two digits, e.g. a single member of a family may be A0001_01, two members of the same family may be A0002_01 and A0002_02.
  2. All datasets will be stripped off specific variables that can create a risk of participant identification (such as complete dates of birth and death) or that have low prevalence rates (such as medical events dates, job title, etc), which could potentially enable the identification of subjects with unusual characteristics.

However, given the nature of the data, it is virtually impossible to prevent the identification of specific individuals were one so minded. Therefore, it will be the responsibility of data users to ensure that no participant’s identity is disclosed under any circumstances.

Data are stored internally in a MySQL database. Data will be made available in tab-delimited or .csv file. Each row in a data file represents one case and each column represents a different variable. In the genetics files, each row is a variant per case – so there may be more than one row for each individual.

  • Demographic data
  • Genetic data
  • Medical questionnaire data
  • DAWBA data
  • ABAS-3 assessment data
  • Health, Development & Family  Environment
  • Child and Adolescent Psychiatric Assessment (CAPA) data
  • Neurocognitive data

The IMAGINE ID study collects identifiable and demographic information as a necessary part of the study administration. The identifiable data is not available to access under any circumstances however some demographic data are available:

  • Unique identifier (unique to data extract, linked across different tables requested)
  • Gender
  • Deceased flag
  • DAWBA version completed
  • Year of birth
  • IMD decile
  • OAC supergroup

Information such as date of enrolment into the study, consent information and recruitment pathway and data such as date assessments are started and completed are collected for internal study tracking purposes.

If a participant withdraws from the study, a record of their identifiable and demographic data is maintained to record their initial inclusion in the study.

 

Genetic information is collected from Regional Genetic Centres – or genetic reports are supplied directly to the study team by non-UK families. Depending on the age of the report and the genetic test carried out, the specific data available may vary. If you are interested in researching specific genetic conditions and would like to find out more, please contact ich.imagineid@ucl.ac.uk

View a sample of the type of genetic data we have collected from families for both copy number variants (CNV) and single nucleotide variants (SNV) here

The CNV dataset includes:

  • type of report
  • type of genetic test
  • type of array technology (where applicable)
  • year of report
  • karyotype, chromosome, band, start and end coordinates, probes (where applicable)
  • consequence and size (where available)
  • genome build
  • inheritance
  • categorised pathogenicity as stated on report

Here is a summary of the CNV genetic data as of February 2018:

CNV in the study

The SNV dataset includes:

  • type of report
  • type of genetic test
  • year of report
  • chromosome, start and end coordinates
  • reference and alternate alleles
  • ENST and NM transcripts
  • consequence
  • coding sequence position
  • variant protein position
  • variant amino acid
  • genotype
  • inheritance
  • categorised pathogenicity as stated on laboratory report 
  • genome build

Here is a summary of the SNV genetic data as of February 2018:

SNV in the study

The IMAGINE ID medical questionnaire is a bespoke questionnaire created specifically for the study but with reference to existing medical history questionnaires used in other research studies and in clinical practice. Certain questions are mapped to HPO/SNOMED terms for ease of analysis.

The medical questionnaire dataset includes:

  • Ethnicity
  • Family structure and history of similar problems
  • Parental education and work
  • Pregnancy details
  • Birth details
  • Neonatal development
  • Infantile development
  • Childhood development
  • Current medical problems (by body system)
  • Medication
  • Current height, weight & head circumference
  • Inheritance of genetic variant

See our online data dictionary for more information

 

Parental reports of their child’s behavioural adjustment and mental health are obtained using the DAWBA (Diagnostic and Well-Being Assessment-DAWBA) that has been used in both UK national and international surveys of mental health in typical and ID children (Ford,T., et al. J. Am. Acad. Child Adolesc. Psychiatry 42, 1203-1211 (2003); Green,H., et al. Mental Health of Children and Young People in Britain, 2004; Heiervang,E., et al. J. Child Psychol. Psychiatry 49, 678- 685(2008); Emerson,E. & Hatton,C. Br. J. Psychiatry 191, 493-499 (2007)).

 

The DAWBA has a DSM-IV and a DSM-V version; the IMAGINE ID cohort includes some earlier participants who completed the DSM-IV version but the majority of participants have completed the DSM-V version. This distinction is clear in the dataset and there are only minor differences between the two versions.

 

The DAWBA is supplemented by additional measures tailored to record behaviour found mainly in severely and profoundly ID children (e.g. repetitive self-injury). The questionnaire is based online (although it can be completed over the phone or face to face with a researcher) and can take around 3 hours to complete – more information is available at www.dawba.info. The questionnaire utilises screening questions and skip rules so some data may be missing due to the questions not being shown to the respondent; all questions shown to the respondent require an answer (except open text answers).

 

The DAWBA covers various areas of development and well-being and as well as recording the raw data entered by the respondent, also uses algorithms to assess the likelihood of the subject falling within a diagnosable range for particular disorders. Distinct variables can be requested from the DAWBA but the diagnosis computed by the DAWBA is based on all variables in a particular section. The DAWBA also has an option for a clinical rating; clinicians on the IMAGINE ID study team rate the DAWBA on the basis of all available given information, including open text answers which are not currently available for access.

 

The DAWBA dataset has around 800 variables and broadly includes:

  • SDQ (strengths and difficulties questionnaire)
  • Separation anxiety
  • Specific phobia
  • Social phobia
  • Obsessions and compulsions
  • Generalized anxiety
  • DMDD
  • Depression
  • Mood disorder
  • ADHD
  • ODD
  • CD
  • Eating, sleeping, toileting
  • Tics
  • Social Aptitude Scale
  • ASD
  • ID behaviours
  • Other worries & strengths
  • Medical background
  • Parental stress
  • FLQ (family life questionnaire)
  • EFQ (everyday feelings questionnaire)
  • Education and help received
  • Clinical ratings of disorders

See our online data dictionary for more information

 

The ABAS-3 is a parental report of adaptive behaviour. P. Harrison, T. Oakland Adaptive Behavior Assessment System, Third Edition (ABAS-3). Western Psychological Services, Torrance (CA) (2015)

The ABAS-3 questionnaire assesses adaptive behaviours and is administered online (although it can also be completed over the phone with a researcher). The online system generates a report which is then transcribed into REDCap. The raw scores are converted into scaled scores and computed by the online system; no further computation is required. There are 2 versions of the ABAS: one for ages 0-5 and one for ages 6-21.

The ABAS dataset includes raw scores in the following areas:

  • Communication
  • Community use
  • Functional pre-academics (0-5 only)
  • Functional academics (6-21 only)
  • Home living
  • Health and safety
  • Leisure
  • Self-care
  • Self-direction
  • Social
  • Motor (0-5 only)
  • Work (6-21 only)

These scores are combined into 4 areas of adaptive functioning for further analysis:

  • GAC (General)
  • Conceptual
  • Social
  • Practical

See our online data dictionary for more information

 The primary caretaker (usually mother) provides information on the child’s health, development and the home environment, including:

  • Modified Lewis scale: Family size and structure, social class, pregnancy and child birth (age at birth, birth weight, ante- and perinatal health problems, smoking and alcohol use.
  • Life Events Checklist: screens for possible traumatic events experienced by the child
  • Family relationship quality:
    • Family Environment Scale: assesses overall family relationship quality
    • Iowa Family Interaction Rating Scales: assesses parent child relationship quality (eg. warmth and hostility)
  • Strengths and Difficulties Questionnaire (SDQ): assesses child prosocial and antisocial behaviour, ADHD and emotional symptoms
  • Developmental Behaviour Checklist (DBC): developed for children with intellectual disability, it assesses child development and behavioural problems.
  • Social Communication Questionnaire (SCQ; formerly Autism Screening Questionnaire (ASQ)): autism screening tool.
  • Peterson assessment: assesses child pubertal development
  • Development Coordination Disorder Questionnaire: assesses for Development Coordination Disorders
  • Child Eating Behaviour Questionnaire and the Hyperphagia Questionnaire: assesses child eating style and behaviour
  • Epilepsy screening questionnaire: assesses severity and frequency of epilepsy and seizures

See our F2F data dictionary for more information

 The CAPA (duration 1-4 hours) provides DSM-IV diagnoses of many behavioural and psychiatric problems (including detailed assessment of psychotic symptoms), but does not include autism. The CAPA is conducted with the primary caregiver to provide information on the child and with the child for self-report. The child informant version of the CAPA provides self-reports of psychotic symptoms, mood and anxiety in cases and controls.

See our F2F data dictionary for more information

The full neurocognitive assessment is conducted over 2.5 hours: Wechsler Abbreviated Scale of Intelligence (WASI)

The WASI assesses children’s cognition and intelligence. It provides age and sex normed scores of general cognitive ability, verbal ability, non-verbal ability and visuomotor/coordination skills.

The Wisconsin Card Sorting Test (WCST)

The WCST assesses executive function skills, specifically concept formation, planning and cognitive flexibility, visual spatial working memory, deductive reasoning, problem solving, and set shifting ability.

Cambridge Neuropsychological Test Automated Battery (CANTAB)

The CANTAB assesses cognitive function.  As many subtests as possible were administered, taking into account the child’s mental and motivational state.

Subtests used:

  • Delayed Matching to Sample test (visual memory)
  • Stockings Of Cambridge (spatial planning and working memory)
  • Spatial Working Memory (executive function)
  • Rapid Visual information Processing (attention and general performance)
  • Reaction time

 

See our F2F data dictionary for more information

Contacts and further information

 For further information about the IMAGINE ID data, please contact the IMAGINE ID Principal Investigators:

Prof David Skuse d.skuse@ucl.ac.uk

Prof Lucy Raymond flr24@cam.ac.uk

Prof Marianne van den Bree Vandenbreemb@cardiff.ac.uk

Prof Jeremy Hall HallJ10@cardiff.ac.uk