Title: Data Analysis – Data Preparation
Next Session: will be announced soon
Target group: Intermediate to advanced knowledge on Python, basic knowledge on Pandas
Language: German
Format: Tutorial, hybrid
The term Data Analysis or Data Mining describes the systematic application of statistical methods to identify structures, dependencies, and relationships in sometimes very large data sets and to gain new knowledge from them. Computer-aided methods are used in the individual process steps of Data Mining. The content and scope of the respective steps depend, among other things, on the problem domain, the analysis goal, and other technical aspects like the available data sources or the representation of the data.
A relevant process step is the preprocessing of these data (data preparation) to increase their quality for the subsequent analysis. In this training, various aspects of the Data Mining process and data preparation will be examined theoretically as well as practically using an example data set and working through prepared Jupyter notebooks. The restructuring and indexing of the data, the handling of missing values and outliers as well as a final comparison of the analysis results based on different variants of preprocessing will be considered.
The following documents (slides, example applications) will be provided to the participants:
Participants should have at least intermediate up to advanced knowledge in Python 3.x. Furthermore, basic knowledge of Python libraries Pandas and Numpy is recommended. If these are not available, a previous visit to the Pandas tutorial is recommended.
Furthermore, participants are expected to have a basic knowledge of Jupyter notebook.
After the training, participants will be familiar with theoretical considerations and practical approaches to data preparation in the Data Mining process selected by the trainees – using Python with Pandas, Numpy and other libraries.