Home // Events // Trainings // Data Analysis – Data Preparation

Contact

Trainings

trainings.scads.ai@tu-dresden.de

Data Analysis – Data Preparation

Title: Data Analysis – Data Preparation
Next Session: will be announced soon
Target group: Intermediate to advanced knowledge on Python, basic knowledge on Pandas
Language: German
Format: Tutorial, hybrid

The term Data Analysis or Data Mining describes the systematic application of statistical methods to identify structures, dependencies, and relationships in sometimes very large data sets and to gain new knowledge from them. Computer-aided methods are used in the individual process steps of Data Mining. The content and scope of the respective steps depend, among other things, on the problem domain, the analysis goal, and other technical aspects like the available data sources or the representation of the data.

A relevant process step is the preprocessing of these data (data preparation) to increase their quality for the subsequent analysis. In this training, various aspects of the Data Mining process and data preparation will be examined theoretically as well as practically using an example data set and working through prepared Jupyter notebooks. The restructuring and indexing of the data, the handling of missing values and outliers as well as a final comparison of the analysis results based on different variants of preprocessing will be considered.

Agenda

Introduction to general aspects of Data Mining and the process step of data preparation (10%)
Tutorial on data preparation with prepared Jupyter notebooks on an example data set (90%)

Handouts

The following documents (slides, example applications) will be provided to the participants:

PDF of the slides for “Introduction to Data Mining and data preparation”
CSV file (world bank data on development and health indicators)
Jupyter notebooks for working with Python and Pandas

Prerequisites

Participants should have at least intermediate up to advanced knowledge in Python 3.x. Furthermore, basic knowledge of Python libraries Pandas and Numpy is recommended. If these are not available, a previous visit to the Pandas tutorial is recommended.

Furthermore, participants are expected to have a basic knowledge of Jupyter notebook.

Learning Outcomes

After the training, participants will be familiar with theoretical considerations and practical approaches to data preparation in the Data Mining process selected by the trainees – using Python with Pandas, Numpy and other libraries.

funded by:

Gefördert vom Bundesministerium für Bildung und Forschung.

ScaDS.AI Dresden/Leipzig (Center for Scalable Data Analytics and Artificial Intelligence) is a center for Data Science, Artificial Intelligence and Big Data with locations in Dresden and Leipzig.