Server-Side Aggregation of Time-Series Data in Distributed NoSQL Databases

Type of thesis: Masterarbeit / location: Leipzig / Status of thesis: Finished theses

Motivation

Time-Series data has become more and more important for Industry 4.0, IoT and data-driven companies. Since the data volume is rising, NoSQL databases like Apache AccumuloCassandra and HBase are providing extensions to work with time-series data:

Unfortunately they are either immature or didn’t provide exact numbers for aggregations (min, max, sum, avg, std deviation, percentile) of large data sets.

Aim

This theses aims to define a performant schema for exact aggregations by either using Apache Accumulo with server-side iterators or Apache Flink as distributed calculation framework.

Contact

Student

  • Oliver Swoboda

Publication

  • Swoboda, O.: Serverseitige Aggregation von Zeitreihendaten in verteilten NoSQL-Datenbanken. In BTW (Workshops) (pp. 365-373), GI 2017 [PDF]

References

[1] Knuth, Donald Ervin: The Art of computer programming. Volume 2, Seminumerical algorithms. S. 216, 1998.

[2] Menne; M.J.; Durre, I.; Korzeniewski, B.; McNeal, S.; Thomas, K.; Yin, X.; Anthony, S.; Ray, R.; Vose, R.S.; Gleason, B.E.; Houston, T.G.: Global historical climatology network-daily (GHCN-Daily), Version 3.22. NOAA National Climatic Data Center, 2012. http://doi.org/10.7289/V5D21VHZ, Stand:18.10.2016.

[3] Saukas, Einar LG; Song, Siang W: Efficient selection algorithms on distributed memory computers. In: Proceedings of the 1998 ACM/IEEE conference on Supercomputing. IEEE Computer Society, S. 1–26, 1998.

Counterpart

Dr.
Eric Peukert

Administration Director

Department of computer science

Universität Leipzig

TU
Universität
Max
Leibnitz-Institut
Helmholtz
Hemholtz