About the Seminar
Introduction
The research seminar Computational Analysis of Visual Social Media consists of project-centred work in groups, lectures on theory and practical sessions. Each group will follow their own research interests and datasets. Groups will be formed in the third session, together with preliminary topics. We have participants from different fields, the topics will mirror this interdisciplinarity, roughly drawn from the interesctions of media studies, political science, and communication science. The seminar aims at master students with first knowledge of at least one programming language.
What to Expect (Theoretical Skills)
By the end of the semester you should know more about:
- The state of Social Media Research,
- interesting questions to answer with social media data,
- ethical and legal restrictions,
- develop operationalizations for visual and textual data,
What to Expect (Practical Skills)
By the end of the semester you will be able to:
- Collect Instagram stories, posts and TikTok videos,
- apply OCR and automatically transcribe videos,
- computationally classify text and images using GPT and CLIP,
- evaluate and optimize your classifications using human annotations,
- present your results.
Class Requirements
The following expectations and criteria must be met to pass the course:
- Independent familiarization with your own scientific topic: Literature research, formulation of research questions, and operationalization.
- Willingness to master new tools (supported by practical units and some provided Jupyter Notebooks).
- Active and regular team collaboration on the individual project.
- Continuous documentation of current progress through a project wiki.
- Writing a project report at the end of the semester.
Project Documentation
- We will openly document our project progress, incorporating a strong Open Science stance.
- The project report template is a good starting point for your report and continuous documentation.
- Through the semester we will come back to the draft and extend it towards the final report.
- The goal is to publish the report on social-media-lab.net.
Project Report
Project Ideas
- Suggest your own project
- Possible Projects:
- Landtagswahl BY 2023
- IG Stories & Posts
- TikTok
- Jugendorganisationen
- Politische Influencer auf TikTok
- War in Sozialen Medien:
- Ukraine Invasion
- Hamas Angriff auf Israel
- Falschinformationen & KI-Generierte Inhalte
- Landtagswahl BY 2023
Course Plan
Date | Content |
---|---|
16.10.23 | Course Organization |
23.10.23 | Introduction to Social Media Analysis |
30.10.23 | Projects & Groups, Getting Started: Tools |
06.11.23 | Data Collection: IG Posts & Stories |
13.11.23 | Data Collection: TikTok |
20.11.23 | - entfallen - |
27.11.23 | Data Preprocessing: OCR & Whisper |
04.12.23 | Exploration of Textual Data using GPT |
11.12.23 | Operationalization I & Computational Text Classification using GPT |
18.12.23 | Data Annotation: LabelStudio & Annotation Guides |
08.01.24 | Evaluation I: Optimizing Text Classification |
15.01.24 | Exploration of Visual Data |
22.01.24 | Operationalization II & Computational Image Classification using CLIP |
29.01.24 | Evaluation II: Optimizing Image Classification |
05.02.24 | Data Analysis as a Conversation: Exploring trends using ChatGPT Visual Presentation of your Data & Results: RAWGraphs and more. |
Getting Started: Tools
- Installation & Configuration of different tools.
- Google Colab / Jupyter Notebooks
- Quarto & Markdown for project documentation
- Git & GitHub
- Firefox Plugins
- Figma
- and more
Data Collection: IG Posts & Stories
- Post types and platform affordances of Instagram
- How to use Instaloader
- How to use CrowdTangle2
- Collecting Stories using Zeeschuimer-F and the firebase backend.
- Collecting Posts using Zeeschuimer and 4CAT
Data Collection: TikTok
- Post types and platform affordances of TikTok
- Collecting TikToks using Zeeschuimer and 4CAT
Data Preprocessing
- OCR
- We are going to use easyocr to detect and recognize text embedded in images, such as posts and stories.
- We will export the first frame of videos for OCR and further analyses.
- Automated Transcription
- We will extract any audio from collected videos.
- We will use whisper to transcribe the audio content of videos
Textual Exploration
Operationalization I
- This session depends on your own research: By december you should have developed an initial research request and explored related work in order to develop the first operationalization for content analysis.
- We will learn more about content analysis in this session.
- Based on your research, and the explorations of the previous sesssion, we will develop the first annotation guide.
- Through the session we will explore how to (efficiently) use GPT for text data annotation.
Data Annotation
- In this session we will import our data into LabelStudio and develop a final annotation manual.
- Using the manuals and LabelStudio projects we will annotate the data.
- We will shuffle annotators: Everyone will annotate for another group.
Evaluation I
- Using the human annotations we will evaluate the performance of our computational text annotations / information extractions.
- We can fine-tune our prompts using the annotation data to improve the annotation quality.
- We will learn how to present and visualize the quality of the model.
Exploration of Visual Data
- We will explore different tools to visualize images:
- ImageJ
- PixPlot
- Memespector and Gephi3
- The visualization forms the basis for image classification: In this stage we want to find similarities and differences.
Operationalization II
- Once more a dive in the literature: This time on visual content analysis.
- Combining the results of our exploration, reserach interest and related work with content analysis, we will develop an annotation manual for the images.
- We will learn how to use CLIP for image classification
- Based on your previous experience you will create human annotations.
- We will shuffle annotators: Everyone will annotate for another group.
Evaluation II
- Once more we will evaluate the quality of our model,
- and fine-tune our prompts.
- Work in Progress: We might organize this session differently on short notice, depending on the outcomes of my current research project.
- Waiting in Progress: In case of visual GPT being published we might have to adapt.
Data Wrangling as a Conversation
- The Advanced Data Analysis mode of ChatGPT is a powerful tool to (quickly) analyze metadata (and more) of social media data.
- We will give it a shot with some simple analyses, like trends over time.
- Experimental in case we have enough time left, we might try to create a workflow with LangChain and LlamaIndex to chat with our data.
Visual Presentation of your Data
- Our last session of the semester will be all about telling a story with your data.
- We will use Python (Jupyter Notebooks) to transform our data in CSV files.
- We will import the data into RAWGraphs to create convincing plots
- We will use Figma to collaboratively sketch the layout of your project report website.
Footnotes
Introduction, Method, Result, Analaysis, Discussion↩︎
I will not be able to provide access to the tool. We can, however, export data for our projects from the platform and you will learn how to use the exported data.↩︎
following Omena’s concept for cloud vision labels, see related work.↩︎
Social Media Lab