Interactive Exploration of Embedding Spaces

Type of thesis: Masterarbeit / location: Leipzig / Status of thesis: Theses in progress

Embeddings (distributed representations) are a powerful concept in machine learning because they map complex objects into a lower dimensional space which contains information about the objects and their relations. While these reprensations can be used as an input for further algorithms they are also an interesting structure on their own. For example it has been shown in word embeddings that the relation between a country and its capitol is always the (approximately) same vector. Hence the question “What is the capitol of X?” can be answered by simply following this vector from starting point X. Another possibility is the creation of axes which separate the embedding space by a certain concept. This can be used to normalize the objects around this axis to remove bias from the data.

The goal of this work is to transfer these concepts into the field of fashion and to create a software which allows the interactive exploration of a product catalog. The user can choose a product and an axis (e.g. summer – winter) and explore further products which are similar to the choosen one but lie on different regions of the choosen axis.

The work includes the following subtasks:

  • Research of a suitable data set
  • Research of related work
  • Creation of a product embedding
  • Implementation of a method to compute topical axes
  • Implementation of an effecient retrieval method of products along an axis
  • Implementation of a client to visualize the data and allow for exploration

To be succesful in these tasks, experience in the following areas is helpful. However if you are highly interested in the topic it is always possible possible to learn them in course of the thesis work:

  • machine learning, deep learning and linear algebra basics
  • programming in python (and the typical dl frameworks pytorch or tensorflow)
  • a javascript frontend framework

Documented, readable and clean code is appreciated. Also experiments should be conducted in a reproducable manner and the whole softare should be build with a version control system.

Counterpart

Moritz Wilke

Universität Leipzig

Entity Resolution

TU
Universität
Max
Leibnitz-Institut
Helmholtz
Hemholtz