Home // Server-Side Aggregation of Time-Series Data in Distributed NoSQL Databases

Supervisor

Dr. Eric Peukert

Department of Computer Science

Leipzig University

peukert@informatik.uni-leipzig.de

Author

Oliver Swoboda

Server-Side Aggregation of Time-Series Data in Distributed NoSQL Databases

Status: finished / Type of Theses: Master theses / Location: Leipzig

Motivation

Time-Series data has become more and more important for Industry 4.0, IoT and data-driven companies. Since the data volume is rising, NoSQL databases like Apache Accumulo, Cassandra and HBase are providing extensions to work with time-series data:

Timely (Accumulo)
Open TSDB (HBase)
KairosDB (Cassandra)

Unfortunately they are either immature or didn’t provide exact numbers for aggregations (min, max, sum, avg, std deviation, percentile) of large data sets.

Aim

This theses aims to define a performant schema for exact aggregations by either using Apache Accumulo with server-side iterators or Apache Flink as distributed calculation framework.

Contact

Dr. Eric Peukert – peukert@informatik.uni-leipzig.de
Matthias Kricke, M. Sc. – kricke@informatik.uni-leipzig.de

Student

Oliver Swoboda

Publication

Swoboda, O.: Serverseitige Aggregation von Zeitreihendaten in verteilten NoSQL-Datenbanken. In BTW (Workshops) (pp. 365-373), GI 2017 [PDF]

References

[1] Knuth, Donald Ervin: The Art of computer programming. Volume 2, Seminumerical algorithms. S. 216, 1998.

[2] Menne; M.J.; Durre, I.; Korzeniewski, B.; McNeal, S.; Thomas, K.; Yin, X.; Anthony, S.; Ray, R.; Vose, R.S.; Gleason, B.E.; Houston, T.G.: Global historical climatology network-daily (GHCN-Daily), Version 3.22. NOAA National Climatic Data Center, 2012. http://doi.org/10.7289/V5D21VHZ, Stand:18.10.2016.

[3] Saukas, Einar LG; Song, Siang W: Efficient selection algorithms on distributed memory computers. In: Proceedings of the 1998 ACM/IEEE conference on Supercomputing. IEEE Computer Society, S. 1–26, 1998.

funded by:

Gefördert vom Bundesministerium für Bildung und Forschung.

ScaDS.AI Dresden/Leipzig (Center for Scalable Data Analytics and Artificial Intelligence) is a center for Data Science, Artificial Intelligence and Big Data with locations in Dresden and Leipzig.