Notes on Computational Social Media Research

Welcome to this collection of notes on social media analysis with a special focus on computational methods. It is a work-in-progress website, created as part of my PhD project and teaching at the Media Informatics Group at the University of Regensburg, Germany. My name is Michael Achmann-Denkler and I’m currently experimenting with computational approaches for multimodal analysis of social media content, like Instagram posts and stories. My aim for this website is to develop a collection of notes exploring various methodologies, techniques, and tools for social media research. As a first milestone, the website accompanied my research seminar Computational Analysis of Visual Social Media in the 2023/24 winter semester.

The research seminar is in its second iteration for the 2024/25 winter seminar. Updates are being made to accommodate fast-changing data access.

Content

Updates 2024/25

✨ Recently improved or updated with new content.
🚀 Significant update or major addition.
🔧 Needs fixes or improvement.
📝 Work in progress, more content will be added soon.

🚀 Course Organization: Introduction to the course including the course schedule.
Interdisciplinary Approaches to Social Media Analysis: Overview of Social Media Analysis (SMA) in academic and professional contexts, focusing on its intersection with communication science, political science, and computational methods.
Related Work: An overview of literature reviews on the (computational) analysis of social media, with a special focus on Instagram. Additionally some hands-on advice for writing your own literature reviews.
✨ GPT Literature Assistant: A simple notebook showcasing some possibilities of OpenAI’s GPT API and how to use it to support your literature selection.
Tools: A short guide for tools and software beneficial for visual social media analysis. Key tools discussed include Colab, a Google platform for collaborative work using Python and Jupyter notebooks, and Obsidian, a versatile note-taking app with plugins for task organization and literature notes.
🚀 Tools: OpenAI: An overview of the most important steps to sign up for an OpenAI API key and track spending.
✨ Data Collection: Instagram Posts: Code examples and notebooks to collect Instagram posts using instaloader, CrowdTangle, or Zeeschuimer & 4CAT.
🚀 Data Collection: Instagram Stories: Code examples and notebooks to collect Instagram posts using instaloader or Tidal Tales.
Data Collection: TikTok (External): Link to the Zeeschuimer & 4CAT manual for TikTok provided by the digital methods initiative.
🚀 Text as Data and Data Organization: Provides an overview of using text as a data source in computational social science. It differentiates between structured and unstructured data, emphasizing the complexity of processing unstructured language data.
✨ Text Processing: A notebook to convert audio-visual social media data into text using OCR and Whisper.
✨📝 Text Exploration: Introduces two approaches for the exploration of textual content: Topic Modeling using BERTopic and OpenAI’s GPT-API.
🚀 Text Classification: An introduction to text classification using GPT. The article presents several approaches, like Zero-Shot and Few-Shot classification. The accompanying notebook provides all the necessary code to get started with GPT.
🚀 Fine-Tuning BERT for Text Classification: A notebookt to fine-tune a BERT model for binary classification tasks. Includes hyperparameter tuning, evaluation, and LIME for explainability. May be used as a foundation for Transformer based visual classification.
✨ Gold Standard Validation: This chapter emphasizes the importance of validation in computational social media analysis, focusing on external validation through non-expert annotations using LabelStudio for creating gold standard data. It discusses developing an annotation manual and setting up a Label Studio project for text data annotation, highlighting the iterative nature of manual development and the importance of clear, consistent guidelines.
✨ Agreement & Evaluation: This chapter discusses the assessment of annotation quality and the evaluation of machine learning models. It introduces Cohen’s Kappa and Krippendorff’s Alpha for measuring interrater agreement, ensuring the reliability of human annotations in datasets. The chapter then shifts focus to evaluating machine learning models with metrics like Accuracy, Precision, Recall, and F1 Score, crucial for understanding model performance in real-world scenarios.
🔧 Images as Data: This chapter introduces the concept of using images as data in computational social science and digital humanities, paralleling the “Text as Data” paradigm. It discusses automated visual analysis for transforming visual content into quantifiable insights, emphasizing the use of computer vision and machine learning techniques.
🔧 Exploration of Visual Data: The chapter covers various unsupervised techniques for analyzing image datasets. It discusses the use of commercial computer vision APIs for labeling images and examines network analysis and k-means clustering methods. Further, it introduces BERTopic’s multimodal functionality for image captioning and topic modeling.
🔧 Computational Image Classification: This chapter explores different methods for classifying images, focusing on zero-shot classifications with OpenAI’s CLIP and multimodal GPT-4. The first notebook demonstrates CLIP’s application in classifying image types in political communication. The second experiment involves multimodal GPT-4 for the classification of visual frames, based on Grabe & Bucy’s work. In a third subchapter we take a look at an ensemble classification approach: Using APIs we detect text and objects, and generate image captions. Combining these pieces of information into a single prompt we use GPT to classify the image.
✨ Collecting Human Image Annotations: This chapter is an alternative path of the Gold Standard Validation chapter – using the notebook provided here we can upload images to a Google Cloud bucket and annotate images inside Label Studio (alternatively audio and video).

Citation and Licences

The website repository is available on GitHub and registered with Zenodo . Please use the citation data provided by Zenodo when quoting parts of this website in academic work. Code examples and computational notebooks are published on the supplement repository, which is also registered with Zenodo . All text content on this website is published under the creative commons attribution (CC-BY) license. All code is released under the GNU GPLv3.