Data Collection in a Post-API World
This section introduces how Instagram content can be collected for social-media research in a “post-API” era (Freelon 2018; Bruns 2019; Caliandro 2021; Trezza 2023). Today, you will find no single best access pathway. Instead, you must navigate a heterogeneous ecosystem:
- Official transparency interfaces when available
- Open-source tools and browser-based capture workflows
- Commercial services that offer managed scheduling and logging
This chapter aims to help you understand this landscape so that you can make informed, responsible methodological choices. While later tutorial pages will show you technical workflows, here we provide the conceptual grounding, legal/ethical boundaries, and historical motivations behind today’s practices.
Background
On April 4, 2018, Facebook shut down the Pages API, removing TOS-compliant pathways to extract public page content and signaling what many now call the post-API era (Freelon 2018). This wasn’t an isolated technical tweak—it marked a structural pivot in how platforms govern research access and how you, as a researcher, must plan data collection. Freelon argues that when platforms can revoke or restrict APIs at will, researchers should learn robust web-scraping workflows and understand the risks of violating Terms of Service, because heavy investment in platform-specific API skills can become obsolete overnight (Freelon 2018).
The broader backdrop is the Cambridge Analytica scandal, which catalyzed a wave of tightened access rules across platforms. Post-2018, access restrictions have reconfigured research practice: some scholars adapt via scraping and mixed tooling, others drift toward “easy-data” platforms (historically, Twitter) with better-documented endpoints—raising concerns about selection biases and the over-study of a few platforms (Trezza 2023). Trezza frames this shift as an “APIcalypse”: not a clean break from APIs, but a messier research landscape that demands diversification of sources, ethical caution with user data, and transparent documentation of trade-offs (Trezza 2023).
Legal and Ethical Grounding
In this course, you will work with public accessible social-media content. Classroom projects primarily rely on the European text-and-data-mining (TDM) exception for non-commercial research (Art. 3, Directive (EU) 2019/790; § 60d UrhG).
This exception permits automated analysis and temporary reproduction for scientific purposes, provided that no technical protection measures are circumvented. Expert guidance by the German Data Forum (Rat für Sozial- und Wirtschaftsdaten (RatSWD) 2019) notes that § 60d UrhG can apply even where platform Terms of Service prohibit automated access.
The EU’s Digital Services Act (DSA) introduces new data-access rights for vetted researchers studying systemic risks on very large platforms. However, our class projects fall outside this definition because they use only publicly accessible content, do not require vetted access, and do not request non-public platform data.
For further insight into current challenges and platform (non-)cooperation under the DSA, see Kupferschmidt (2025) and Keller (2025).
Because collection may involve personal data, GDPR applies. Public availability does not eliminate contextual privacy expectations (Zimmer 2010). Thus, we:
- limit collection to specific time windows, accounts, and purposes
- avoid profiling or inference about individuals
- store identifiable media only in controlled workspaces
- document decisions, procedures, and limitations
Ethical orientation
Legal compliance is only a baseline. Ethical practice requires contextual, case-based judgment. AoIR’s Internet Research: Ethical Guidelines 3.0 emphasizes that internet research should prioritize respect for persons and communities, minimize harm, and attend to context rather than rely on rigid rules (Franzke et al. 2020).
Key principles include:
- Context matters: Ethical decisions depend on how material is presented, shared, and understood in its original setting. Public visibility is not the same as ethical availability.
- Respect for persons & communities: Researchers should acknowledge that users have differing expectations of privacy and visibility, even in public spaces.
- Minimize harm: Avoid creating new risks for individuals or groups, including reputational harm, unwanted exposure, or identification.
- Proportionality: Collect only what is necessary to answer the research question; reduce detail where possible.
- Transparency & accountability: Explain what was collected, how, and why; document trade-offs and limitations.
In short: A responsible workflow is not only lawful — it is proportionate, well-documented, context-sensitive, and attentive to potential harm.
Summary
In today’s post-API landscape, social-media data collection requires deliberate methodological choices rather than reliance on a single unified access point. Your task is to begin from the research question, then select an approach that is legally grounded, ethically defensible, and technically feasible.
A responsible workflow:
- does not let platforms dictate what you study,
- selects tools that align with your goals and constraints,
- and clearly documents decisions, limitations, and trade-offs.
The next pages provide hands-on tutorials to help you apply these principles in practice. They walk you through concrete collection workflows — including browser-extension–based capture and managed commercial collection — and show you how to assemble platform-specific corpora for analysis.
→ Continue to:
References
Reuse
Citation
@online{achmann-denkler2025,
author = {Achmann-Denkler, Michael},
title = {Data {Collection} in a {Post-API} {World}},
date = {2025-11-03},
url = {https://social-media-lab.net/data-collection/},
doi = {10.5281/zenodo.10039756},
langid = {en}
}