JavaScript is required to use this site. Please enable JavaScript in your browser settings.

Data Preparation for Data Analysis I

Title: Data Preparation for Data Analysis – Part 1
Next Session: will be announced soon
Target Group: Anyone regularly working with data and an interest in learning about various data preprocessing steps to improve analysis results.
Number of participants: 15
Language: English / German
Format: Tutorial, hybrid

The term Data Analysis or Data Mining describes the systematic application of statistical methods to identify structures, dependencies, and relationships in sometimes very large data sets and to gain new knowledge, where computer-aided methods are used in the individual process steps. The content and scope of the respective steps depend, among other things, on the problem domain, the analysis goal, and other technical aspects like the available data sources or the representation of the data. A relevant process step is the preprocessing of these data (data preparation) to increase their quality for the subsequent analysis.

In this part of the training, various aspects of the Data Mining process and data preparation will be examined theoretically as well as practically using an example data set and working through prepared Jupyter Notebooks. It covers the restructuring and indexing of the data as well as the detection and handling of missing values.

Agenda

  • Introduction to general aspects of Data Mining and the process of data preparation (10%)
  • Hands-on tutorial with focus on data restructuring, indexing and missing values (90%)

Handouts

The following documents (slides, sample applications) will be provided to the participants: 

  • PDF of the slides for “Introduction to Data Mining and data preparation”
  • CSV file (world bank data on development and health indicators)
  • Jupyter Notebooks on data preparation with tasks and solutions

Prerequisites

  • Participants should have at least intermediate up to advanced knowledge in Python 3.x
  • Basic knowledge on how to use Jupyter Notebooks
  • Basic knowledge of the Python packages NumPy and Pandas recommended
  • Laptop with internet access – equipment can be provided on site if required (please enquire as numbers are limited)

Learning Outcomes

After the training, participants will be familiar with theoretical considerations and practical approaches to data preparation using Python with NumPy, Pandas and other packages.

funded by:
Gefördert vom Bundesministerium für Bildung und Forschung.
Gefördert vom Freistaat Sachsen.