Research Data
Research data refers to the information, records, and observations collected or generated during a research project to support or validate its development, results or findings, including contextual information. Research data does not include physical objects, but it does include information/description about them. Some examples are:
- Questionnaires
- Field notes and diaries
- Measurements and statistical analyses
- Mathematical models, algorithms and scripts
- Images and audio/video recordings
- Interview questions and transcripts
Raw, processed & analysed data
At different stages of the research, research data can be classified in the following types:
- Raw data is the original data that you have collected but have not yet processed or analysed. This can include data from various collection methods such as experiments, computer simulations, interviews, and observations. Data you have not collected yourself, but obtained from others for reuse (secondary data), may be considered raw data.
- Processed data is the data that you have digitized, translated, transcribed, cleaned, organized, transformed, anonymized, validated and checked, making it suitable for analysis. This can include steps like removing errors, normalizing values, or converting formats.
- Analysed data is data that has been examined and interpreted to draw conclusions or insights. This involves applying statistical methods, algorithms, or other analytical techniques to processed data. This is usually the data that is represented or included in the final models, graphs, tables and texts from your final report or scientific article.
Active & definitive data
It is also useful to distinguish the types of data above between active and definitive data.
- Active or mutable data refers to data that are still subject to change, such as data being collected or actively processed and/or analysed.
- Definitive or immutable data refers to data that have reached a stable state and are no longer expected to change, for example the datasets underlying published results.
Active data typically require flexible working environments, while definitive data should be preserved in a stable and well-documented form to ensure long-term accessibility, reproducibility, and integrity.
Other classifications
There are several ways to classify and define research data types besides the ones you see above. Other valuable classifications include its nature (qualitative or quantitative), collection method (experimental, simulation, interview, observational, or secondary data), and level of confidentiality or protection the data requires.
Knowing what type of data you are working with is important because it affects how you should handle and store it. Some data may come with specific requirements or restrictions. For example, if your dataset includes personal data, you will need to follow data protection laws and ethical guidelines. This kind of data is typically classified as having a medium to high confidentiality level, which means it must be stored securely using appropriate technical and organizational safeguards. You may also need to take extra steps like de-identifying the data or using client-side encryption to protect it. These and other considerations will be explained further in section Privacy in Research.