About the Seminar

Author

Michael Achmann-Denkler

Published

October 9, 2023

Introduction

The research seminar Computational Analysis of Visual Social Media consists of project-centred work in groups, lectures on theory and practical sessions. Each group will follow their own research interests and datasets. Groups will be formed in the third session, together with preliminary topics. We have participants from different fields, the topics will mirror this interdisciplinarity, roughly drawn from the interesctions of media studies, political science, and communication science. The seminar aims at master students with first knowledge of at least one programming language.

What to Expect (Theoretical Skills)

By the end of the semester you should know more about:

  • The state of Social Media Research,
  • interesting questions to answer with social media data,
  • ethical and legal restrictions,
  • develop operationalizations for visual and textual data,

What to Expect (Practical Skills)

By the end of the semester you will be able to:

  • Collect Instagram stories, posts and TikTok videos,
  • apply OCR and automatically transcribe videos,
  • computationally classify text and images using GPT and CLIP,
  • evaluate and optimize your classifications using human annotations,
  • present your results.

Class Requirements

The following expectations and criteria must be met to pass the course:

  • Independent familiarization with your own scientific topic: Literature research, formulation of research questions, and operationalization.
  • Willingness to master new tools (supported by practical units and some provided Jupyter Notebooks).
  • Active and regular team collaboration on the individual project.
  • Continuous documentation of current progress through a project wiki.
  • Writing a project report at the end of the semester.

Project Documentation

  • We will openly document our project progress, incorporating a strong Open Science stance.
  • The project report template is a good starting point for your report and continuous documentation.
  • Through the semester we will come back to the draft and extend it towards the final report.
  • The goal is to publish the report on social-media-lab.net.

Project Report

  • The project report will be handed in collaboratively,
  • consists of app. 20 pages,
  • follows the IMRAD1 principle,
  • uses APA citation style,
  • needs to be handed in no later than 31.03.2024.

Project Ideas

  • Suggest your own project
  • Possible Projects:
    • Landtagswahl BY 2023
      • IG Stories & Posts
      • TikTok
      • Jugendorganisationen
    • Politische Influencer auf TikTok
    • War in Sozialen Medien:
      • Ukraine Invasion
      • Hamas Angriff auf Israel
    • Falschinformationen & KI-Generierte Inhalte

Social Media Lab

  • Most course material will be available on social-media-lab.net.
  • Additional material will be provided via GRIPS.
  • We will work collaboratively on the website through the semester.
  • The content is edited using Quarto and Markdown, you will need a GitHub Account.
  • Please provide your GitHub username to get access to the repository.
  • All content will be published under GPL-3.

Course Plan

Overview of our sessions
Date Content
16.10.23 Course Organization
23.10.23 Introduction to Social Media Analysis
30.10.23 Projects & Groups,
Getting Started: Tools
06.11.23 Data Collection: IG Posts & Stories
13.11.23 Data Collection: TikTok
20.11.23 - entfallen -
27.11.23 Data Preprocessing: OCR & Whisper
04.12.23 Exploration of Textual Data using GPT
11.12.23 Operationalization I & Computational Text Classification using GPT
18.12.23 Data Annotation: LabelStudio & Annotation Guides
08.01.24 Evaluation I: Optimizing Text Classification
15.01.24 Exploration of Visual Data
22.01.24 Operationalization II & Computational Image Classification using CLIP
29.01.24 Evaluation II: Optimizing Image Classification
05.02.24 Data Analysis as a Conversation: Exploring trends using ChatGPT
Visual Presentation of your Data & Results: RAWGraphs and more.

Introduction to Social Media Analysis

  • Overview of social media studies
    • Which academic disciplines are interested in plattforms like Instagram?
    • What is their interest, how do they study the user generated content?
    • Special focus: Political Communication on Instagram
  • How to conduct your own literature review
  • Theory: Digital Methods & Cultural Analytics
  • A short word about ethics & laws

Getting Started: Tools

  • Installation & Configuration of different tools.
    • Google Colab / Jupyter Notebooks
    • Quarto & Markdown for project documentation
    • Git & GitHub
    • Firefox Plugins
    • Figma
    • and more

Data Collection: IG Posts & Stories

Screenshot of the CrowdTangle interface.

Data Collection: TikTok

  • Post types and platform affordances of TikTok
  • Collecting TikToks using Zeeschuimer and 4CAT

Screenshot of the Firebase Backend and Zeeschuimer-F.

Data Preprocessing

  • OCR
  • We are going to use easyocr to detect and recognize text embedded in images, such as posts and stories.
  • We will export the first frame of videos for OCR and further analyses.
  • Automated Transcription
    • We will extract any audio from collected videos.
    • We will use whisper to transcribe the audio content of videos

Textual Exploration

  • We will first take a look at the textual data using simple frequency analyses and wordclouds.
  • We will use the GPT-API to explore the textual content of our data.
  • Optional we might use BERTopic to explore our textual data.

Operationalization I

  • This session depends on your own research: By december you should have developed an initial research request and explored related work in order to develop the first operationalization for content analysis.
  • We will learn more about content analysis in this session.
  • Based on your research, and the explorations of the previous sesssion, we will develop the first annotation guide.
  • Through the session we will explore how to (efficiently) use GPT for text data annotation.

Data Annotation

  • In this session we will import our data into LabelStudio and develop a final annotation manual.
  • Using the manuals and LabelStudio projects we will annotate the data.
  • We will shuffle annotators: Everyone will annotate for another group.

Evaluation I

  • Using the human annotations we will evaluate the performance of our computational text annotations / information extractions.
  • We can fine-tune our prompts using the annotation data to improve the annotation quality.
  • We will learn how to present and visualize the quality of the model.

Exploration of Visual Data

  • We will explore different tools to visualize images:
  • ImageJ
  • PixPlot
  • Memespector and Gephi3
  • The visualization forms the basis for image classification: In this stage we want to find similarities and differences.

Example of image exploration using PixPlot.

Operationalization II

  • Once more a dive in the literature: This time on visual content analysis.
  • Combining the results of our exploration, reserach interest and related work with content analysis, we will develop an annotation manual for the images.
  • We will learn how to use CLIP for image classification
  • Based on your previous experience you will create human annotations.
  • We will shuffle annotators: Everyone will annotate for another group.

Decomposition of different layers in a Story by @gruenebayern during the 2023 Bavarian state elections.

Evaluation II

  • Once more we will evaluate the quality of our model,
  • and fine-tune our prompts.
  • Work in Progress: We might organize this session differently on short notice, depending on the outcomes of my current research project.
  • Waiting in Progress: In case of visual GPT being published we might have to adapt.

Example of a visual inspection of classification results: Intermediary results of image types classifications using CLIP for the 2021 federal election. Two out of five stories posted by differnt parties have been misclassified.

Data Wrangling as a Conversation

  • The Advanced Data Analysis mode of ChatGPT is a powerful tool to (quickly) analyze metadata (and more) of social media data.
  • We will give it a shot with some simple analyses, like trends over time.
  • Experimental in case we have enough time left, we might try to create a workflow with LangChain and LlamaIndex to chat with our data.

Visual Presentation of your Data

  • Our last session of the semester will be all about telling a story with your data.
  • We will use Python (Jupyter Notebooks) to transform our data in CSV files.
  • We will import the data into RAWGraphs to create convincing plots
  • We will use Figma to collaboratively sketch the layout of your project report website.

Footnotes

  1. Introduction, Method, Result, Analaysis, Discussion↩︎

  2. I will not be able to provide access to the tool. We can, however, export data for our projects from the platform and you will learn how to use the exported data.↩︎

  3. following Omena’s concept for cloud vision labels, see related work.↩︎

Reuse