Skip to main content

Data Preservation

Archiving data

Generating data and software as well as publishing results in scientific journals take a lot of effort. Thus, it is important to properly archive your research output. Archiving data means that you ensure that definitive data is kept in a secure location for the long term (10 years or more). The purpose of archiving may vary, including research integrity, future reuse, reproducibility, or compliance with funding or institutional requirements.

Deciding which research data to archive involves consideration of the purpose and a careful evaluation of different factors (such as data quality, significance, uniqueness, potential for reuse, and alignment with your research objectives) to ensure that valuable and relevant data are preserved. The table below provides a summary of what should be archived depending for research integrity and data reuse purposes. Check the next section for more information.

Perspective of scientific integrityPerspective of reuse of data
All raw, processed and analysed dataFinal versions of analysed data If possible: also raw and processed data
Documentation (i.e., codebooks, lab journals, protocols, etc.) necessary for understanding the dataDocumentation (i.e., codebooks, lab journals, protocols, etc.) necessary for understanding the data
Readme.txt file to help others understand the contents and purpose of the associated files or codeReadme.txt file to help others understand the contents and purpose of the associated files or code
Informed consents formsTemplate of informed consent form used in a study
Approval letter from the Ethical Review Board 
If applicable: Data Management Plan 

Data retention & minimization

The below checklist will help you define whether a given dataset is worth including to the archival package. Broadly speaking, consider three categories of reasons (1) documentation & reproducibility, (2) scientific value, and (3) policy & legal compliance.

Documentation & reproducibility

💾 Reasons to retain data

  • Data are sufficiently documented, organized, and accompanied by metadata (e.g. methods, variables, file structure) so that others can understand and reuse them.
  • Data that can be easily understood and reused by others, facilitating reproducibility and validation.
  • Whenever possible, raw and processed data, as well as any intermediate results or transformations should be kept alongside the analyzed data.

🗜️ Reasons to minimize or remove data

  • Data lack essential documentation or contextual information and cannot reasonably be made understandable.
  • Data are redundant because equivalent, better-documented versions are already archived.

Scientific value

💾 Reasons to retain data

  • Data is required for reproducing the results of the published results (that is, data in original form or data representing the publication).
  • Data is unique, and thus impossible or impractical to reproduce.
  • Data represent a landmark discovery or new precedent.

🗜️ Reasons to minimize or remove data

  • Data is intermediary or test data, which is neither the original data nor the analysis dataset for the publication.
  • Data can easily be reproduced with minimal effort (e.g., via code).
  • The quality of data is poor (corrupted files, insufficient precision of measurement data, etc.) or not well-documented what makes it impossible to reuse.

💾 Reasons to retain data

  • You promised to retain data in a contract.
  • Funding body or a journal requires you to retain the data (e.g., NWO asks archiving data at least for 10 years).
  • Data may be used in future legal procedures (e.g., litigation, police investigations, Freedom of Information requests).

🗜️ Reasons to minimize or remove data

  • You promised to remove certain data to your participants or your data provider.
  • You've promised to delete data at the end of the project in a contract.
  • You have personal data, and you want an extra layer of protection for privacy (e.g., destroying raw video or audio recording before archiving, applying pseudonymization or anonymization).

Data that include personal data cannot be archived indefinitely, and its limitations are already defined within the informed consent letters, and privacy agreement.

Data sovereignty, data governance and digital sovereignty, LCRDM

Choose the best option for your situation to protect research participants' privacy while preserving valuable research data. Ensure you retain only the data necessary to demonstrate the integrity of the research. If any personal data is not essential for this purpose, consider removing it.

After considering the reasons mentioned above, it might still not be entirely straightforward whether to keep your data, as this decision can involve subjective judgment. Archiving decisions should be made consciously and, where relevant, in consultation with collaborators or co-authors. Keep a brief record of:

  • Which data were included or excluded
  • The rationale for these decisions (e.g. scientific value, legal constraints, privacy considerations)

Revisit archiving decisions periodically to ensure continued compliance with policies and alignment with good research practice.

FAQ

What is the difference between raw, processed, and analysed data?

Raw data are the original data that you have collected but have not yet processed or analysed. For instance: audio files, archives, observations, field notes and data from experiments. Data you have not collected yourself and that you are reusing, may be considered raw data.

Processed data are the data that you have digitised, translated, transcribed, cleaned, validated, checked and/or anonymised.
Analysed data are the models, graphs, tables, texts and so on that you have created based on the raw and the processed data, and that are intended to aid in the discovery of useful information, the presentation of conclusions, and decision-making.

For more information, please see Research Data page.

Does TU/e have an archive for research data used in scientific publications?

Yes. Data and other relevant documents associated with scientific publication can be archived in the TU/e Research Archival Package Solution.