The term Data Analysis or Data Mining describes the systematic application of statistical methods to identify structures, dependencies, and relationships in sometimes very large data sets and to gain new knowledge from them. Computer-aided methods are used in the individual process steps of Data Mining. The content and scope of the respective steps depend, among other things, on the problem domain, the analysis goal, and other technical aspects like the available data sources or the representation of the data.
A relevant process step is the preprocessing of these data (data preparation) to increase their quality for the subsequent analysis. In this training, various aspects of the Data Mining process and data preparation will be examined theoretically as well as practically using an example data set and working through prepared Jupyter notebooks. The restructuring and indexing of the data, the handling of missing values and outliers as well as a final comparison of the analysis results based on different variants of preprocessing will be considered.
Title: Data Analysis – Data Preparation
Next Session: New dates will be announced soon.
Target group: Intermediate to advanced knowledge on Python, basic knowledge on Pandas
Format: Tutorial, hybrid
- Introduction to general aspects of Data Mining and the process step of data preparation (10%)
- Tutorial on data preparation with prepared Jupyter notebooks on an example data set (90%)
The following documents (slides, example applications) will be provided to the participants:
- PDF of the slides for “Introduction to Data Mining and data preparation”
- CSV file (world bank data on development and health indicators)
- Jupyter notebooks for working with Python and Pandas
Participants should have at least intermediate up to advanced knowledge in Python 3.x. Furthermore, basic knowledge of Python libraries Pandas and Numpy is recommended. If these are not available, a previous visit to the Pandas tutorial is recommended.
Furthermore, participants are expected to have a basic knowledge of Jupyter notebook.
After the training, participants will be familiar with theoretical considerations and practical approaches to data preparation in the Data Mining process selected by the trainees – using Python with Pandas, Numpy and other libraries.
Check out the other trainings by ScaDS.AI Dresden/Leipzig.