JavaScript is required to use this site. Please enable JavaScript in your browser settings.

Contact

Decorative Header Image

Immersive Web Observatory

The World Wide Web is the single largest repository of digital culture and knowledge. By strategically collecting, analyzing, and visualizing web data, business intelligence can extract decision-relevant insights, digital social sciences can explore current societal trends and social networks, and digital humanities can study cultural and historical questions using digital media. Additionally, the web is a focal point of computer science research for developing information systems and AI applications.

Aims

The Immersive Web Observatory (IWO), a BMBF-funded infrastructure at the Digital Bauhaus Lab at Bauhaus-Universität Weimar, to which we have access, leverages this potential by providing an extensive web crawl corpus encompassing 8 Petabytes of data, covering both current and historical web content taken from the Internet Archive’s web archive. It is an invaluable data resource for projects across various disciplines, particularly in information retrieval, data mining, and visualization. The IWO further facilitates knowledge and technology transfer to local businesses through project collaborations, demonstrators, and open-access publications, and fosters the training of data scientists specializing in big data and cognitive computing.

Further Information: https://www.uni-weimar.de/fileadmin/user/uni/dezernate/dfo/TOP-Projekte/2017/2017_14_Hagen_IWO-buw-projektbeschreibung.pdf

Team

  • Prof. Dr. Matthias Hagen, Friedrich-Schiller-Universität Jena
  • Prof. Dr. Martin Potthast
  • Prof. Dr. Benno Stein, Bauhaus-Universität Weimar

Selected Publications

  • Jan Heinrich Reimer, Sebastian Schmidt, Maik Fröbe, Lukas Gienapp, Harrisen Scells, Benno Stein, Matthias Hagen, and Martin Potthast. The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web Archives. In Hsin-Hsi Chen et al., editors, 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023), pages 2848–2860, July 2023. ACM.
  • Theresa Elstner, Johannes Kiesel, Lars Meyer, Max Martius, Sebastian Schmidt, Benno Stein, and Martin Potthast. Visual Web Archive Quality Assessment. In Gianmaria Silvello et al., editors, 26th International Conference on Theory and Practice of Digital Libraries (TPDL 2022), pages 365–371, September 2022. Springer.
  • Niklas Deckers and Martin Potthast. WARC-DL: Scalable Web Archive Processing for Deep Learning. In Andreas Wagner, Christian Guetl, Michael Granitzer, and Stefan Voigt, editors, 4th International Symposium on Open Search Technology (OSSYM 2022), October 2022. International Open Search Symposium.
funded by:
Gefördert vom Bundesministerium für Bildung und Forschung.
Gefördert vom Freistaat Sachsen.