Visual Exploration

For this notebook we use a 4CAT corpus collected from TikTok about the 2024 Farmers’ Protest in Germany. Let’s take a look at all relevant columns. We’re mostly dealing with the image_file column. Additionally, the images files should be extracted to the /content/media/images/ path. (See linked notebook for the conversion from the original 4CAT files).

df[['id', 'body', 'Transcript', 'image_file']].head()
id body Transcript image_file
0 7321692663852404001 #Fakten #mutzurwahrheit #ulrichsiegmund #AfD #... Liebe Freunde, schaut euch das an, das ist der... /content/media/images/7321692663852404001.jpg
1 7320593840212151584 Unstoppable πŸ‡©πŸ‡ͺ #deutschland #8januar2024 #baue... the next, video!! /content/media/images/7320593840212151584.jpg
2 7321341957333060896 08.01.2024 Streik - Hoss & Hopf #hossundhopf #... scheiß Bauern, die, was weiß ich, ich habe auc... /content/media/images/7321341957333060896.jpg
3 7321355364950117665 #streik #2024 #bauernstreik2024 #deutschland #... 😎😎😎😎😎😎😎😎😎 /content/media/images/7321355364950117665.jpg
4 7321656341590789409 #🌞❀️ #sunshineheart #sunshineheartforever #πŸ‡©πŸ‡ͺ ... NaN /content/media/images/7321656341590789409.jpg


Let’s first install bertopic including the vision extensions.


The following code has been taken from the BERTopic documentation and was only slightly changed.

!pip install bertopic[vision]

Images Only

Next, we prepare the pipeline for an image-only model: We want to fit the Topic Model on the image content only. We follow the BERTOpic Multimodal Manual, and generate image captions using the vit-gpt2-image-captioningpackage. The documentation offers a lot of different options, we can incorporate textual content for the topic modeling, or fit the model on textual information only and look for the best matching images for each cluster and display them.

In our example we focus on image-only topics models.

from bertopic.representation import KeyBERTInspired, VisualRepresentation
from bertopic.backend import MultiModalBackend

# Image embedding model
embedding_model = MultiModalBackend('clip-ViT-B-32', batch_size=32)

# Image to text representation model
representation_model = {
    "Visual_Aspect": VisualRepresentation(image_to_text_model="nlpconnect/vit-gpt2-image-captioning")

Next, select the column with the path of your images files, in my example image_file. Convert it to a python list.

image_only_df = df.copy()
images = image_only_df['image_file'].to_list()

Now it’s time to fit the model.

from bertopic import BERTopic

# Train our model with images only
topic_model = BERTopic(embedding_model=embedding_model, representation_model=representation_model, min_topic_size=5)
topics, probs = topic_model.fit_transform(documents=None, images=images)
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [02:33<00:00, 21.88s/it]
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:02<00:00,  2.99it/s]

Finally let’s display the topics. Remember: Topic -1 is a collection of documenst that do not fit into any topic.

# See linked notebook for code.