Processing
What is data processing?
Data processing is the phase in the project where data is converted into a desired format and prepared for analysis. When data has been freshly collected, data processing includes some automated steps in a workflow that perform format conversion, quality check and preprocessing following a standardised protocol. The main aim of processing is to:
- convert data into readable format giving it the shape and form necessary for downstream analysis;
- discard bad or low quality data in order to create clean, high-quality dataset for reliable results.
When data is imported from existing sources, e.g. data to be reused from another project, processing can also include manual steps to make it suitable for analysis. These steps include but are not limited to:
- making changes to data formats such that different datasets will be compatible for integration with each other;
- changing coding systems or ontologies for the data to bring everything to the same level;
- filtering data such that only data suitable for the project is retained.
After data processing, clean data is ready for analysis and should therefore be available to the members of the project team that need to perform the next steps.
ELIXIR. “RDMkit – Processing“.
For details, see RDMkit – Processing
Researcher tasks anticipated during the processing phase (external link)
Reference information for the processing phase
Amnesia Anonymization Tool
This free anonymization tool deletes identifying information from research data.