Data Collection in a Post-API World

Author

Michael Achmann-Denkler

Published

November 3, 2025

This section introduces how Instagram content can be collected for social-media research in a “post-API” era (Freelon 2018; Bruns 2019; Caliandro 2021; Trezza 2023). Today, you will find no single best access pathway. Instead, you must navigate a heterogeneous ecosystem:

Official transparency interfaces when available
Open-source tools and browser-based capture workflows
Commercial services that offer managed scheduling and logging

This chapter aims to help you understand this landscape so that you can make informed, responsible methodological choices. While later tutorial pages will show you technical workflows, here we provide the conceptual grounding, legal/ethical boundaries, and historical motivations behind today’s practices.

Background

On April 4, 2018, Facebook shut down the Pages API, removing TOS-compliant pathways to extract public page content and signaling what many now call the post-API era (Freelon 2018). This wasn’t an isolated technical tweak—it marked a structural pivot in how platforms govern research access and how you, as a researcher, must plan data collection. Freelon argues that when platforms can revoke or restrict APIs at will, researchers should learn robust web-scraping workflows and understand the risks of violating Terms of Service, because heavy investment in platform-specific API skills can become obsolete overnight (Freelon 2018).

The broader backdrop is the Cambridge Analytica scandal, which catalyzed a wave of tightened access rules across platforms. Post-2018, access restrictions have reconfigured research practice: some scholars adapt via scraping and mixed tooling, others drift toward “easy-data” platforms (historically, Twitter) with better-documented endpoints—raising concerns about selection biases and the over-study of a few platforms (Trezza 2023). Trezza frames this shift as an “APIcalypse”: not a clean break from APIs, but a messier research landscape that demands diversification of sources, ethical caution with user data, and transparent documentation of trade-offs (Trezza 2023).

Collecting Social-Media Content Under Post-API Conditions

In a post-API environment, there is no single best tool for collecting social-media data. Platforms differ in what they make accessible, and they regularly change access rules without notice. As a researcher, you therefore need to select a collection strategy that fits your research question, your legal and ethical responsibilities, and your technical capacity — not the other way around.

Most importantly: do not let platforms dictate your research questions. Start from what you want to understand, then identify how you can feasibly and responsibly observe it.

Broadly, available approaches fall into three families. None is the universally best, each is useful under different conditions.

Browser-based capture Tools such as Selenium, Playwright, or custom browser extensions collect content as rendered in the interface. Browser workflows can be powerful but require maintenance, responsible authentication management, and clear logging.
Open-source & transparency-oriented tools Packages such as instaloader or platform-provided transparency resources (e.g., Meta Content Library; TikTok Research Tools) can offer structured access to posts, accounts, or engagement metrics. Coverage varies, some support only selected public data, and availability is subject to change.
Commercial collection services Providers like Apify encapsulate scraping, scheduling, logging, and proxy management. These can lower technical complexity.These lower technical overhead but may obscure details about how data were obtained and require researchers to evaluate legal and ethical implications carefully.

Key idea: The “best” tool is the one that enables you to answer your question responsibly and transparently, given legal, ethical, and practical constraints.

Because constraints differ it is perfectly legitimate for two teams studying similar topics to use different data-collection tools. What matters is that you:

explain why the chosen approach is appropriate
operate within the applicable legal/ethical framework
document what was (and was not) collected

For our 2025/26 seminar, we will explore two common strategies in practice:

Browser-extension-based collection (Zeeschuimer, Tidal Tales)
Commercial managed collection (Apify)

Legal and Ethical Grounding

In this course, you will work with public accessible social-media content. Classroom projects primarily rely on the European text-and-data-mining (TDM) exception for non-commercial research (Art. 3, Directive (EU) 2019/790; § 60d UrhG).

This exception permits automated analysis and temporary reproduction for scientific purposes, provided that no technical protection measures are circumvented. Expert guidance by the German Data Forum (Rat für Sozial- und Wirtschaftsdaten (RatSWD) 2019) notes that § 60d UrhG can apply even where platform Terms of Service prohibit automated access.

Out of Scope: Digital Service Act (DSA)

The EU’s Digital Services Act (DSA) introduces new data-access rights for vetted researchers studying systemic risks on very large platforms. However, our class projects fall outside this definition because they use only publicly accessible content, do not require vetted access, and do not request non-public platform data.

For further insight into current challenges and platform (non-)cooperation under the DSA, see Kupferschmidt (2025) and Keller (2025).

Because collection may involve personal data, GDPR applies. Public availability does not eliminate contextual privacy expectations (Zimmer 2010). Thus, we:

limit collection to specific time windows, accounts, and purposes
avoid profiling or inference about individuals
store identifiable media only in controlled workspaces
document decisions, procedures, and limitations

Ethical orientation

Legal compliance is only a baseline. Ethical practice requires contextual, case-based judgment. AoIR’s Internet Research: Ethical Guidelines 3.0 emphasizes that internet research should prioritize respect for persons and communities, minimize harm, and attend to context rather than rely on rigid rules (Franzke et al. 2020).

Key principles include:

Context matters: Ethical decisions depend on how material is presented, shared, and understood in its original setting. Public visibility is not the same as ethical availability.
Respect for persons & communities: Researchers should acknowledge that users have differing expectations of privacy and visibility, even in public spaces.
Minimize harm: Avoid creating new risks for individuals or groups, including reputational harm, unwanted exposure, or identification.
Proportionality: Collect only what is necessary to answer the research question; reduce detail where possible.
Transparency & accountability: Explain what was collected, how, and why; document trade-offs and limitations.

In short: A responsible workflow is not only lawful — it is proportionate, well-documented, context-sensitive, and attentive to potential harm.

Summary

In today’s post-API landscape, social-media data collection requires deliberate methodological choices rather than reliance on a single unified access point. Your task is to begin from the research question, then select an approach that is legally grounded, ethically defensible, and technically feasible.

A responsible workflow:

does not let platforms dictate what you study,
selects tools that align with your goals and constraints,
and clearly documents decisions, limitations, and trade-offs.

The next pages provide hands-on tutorials to help you apply these principles in practice. They walk you through concrete collection workflows — including browser-extension–based capture and managed commercial collection — and show you how to assemble platform-specific corpora for analysis.

→ Continue to:

References

Bruns, Axel. 2019. “After the ‘APIcalypse’: social media platforms and their fight against critical scholarly research.” Information, Communication and Society 22 (11): 1544–66. https://doi.org/10.1080/1369118X.2019.1637447.

Caliandro, Alessandro. 2021. “Repurposing digital methods in a post-API research environment: Methodological and ethical implications.” Italian Sociological Review. https://doi.org/10.13136/ISR.V11I4S.433.

Franzke, Aline Shakti, Anja Bechmann, Michael Zimmer, Charles M Ess, and the Association of Internet Researchers. 2020. “Internet research: ethical guidelines 3.0: association of internet researchers.” https://aoir.org/reports/ethics3.pdf.

Freelon, Deen. 2018. “Computational Research in the Post-API Age.” Political Communication 35 (4): 665–68. https://doi.org/10.1080/10584609.2018.1477506.

Keller, Daphne. 2025. “Using the DSA to study platforms.” Verfassungsblog, October. https://doi.org/10.59704/edb8c8d790ab8435.

Kupferschmidt, Kai. 2025. “Meta and TikTok are obstructing researchers’ access to data, European Commission rules.” https://www.science.org/content/article/meta-and-tiktok-are-obstructing-researchers-access-data-european-commission-rules.

Rat für Sozial- und Wirtschaftsdaten (RatSWD). 2019. “Big Data in den Sozial-, Verhaltens- und Wirtschaftswissenschaften: Datenzugang und Forschungsdatenmanagement - Mit Gutachten "Web Scraping in der unabhängigen wissenschaftlichen Forschung".” RatSWD Output. German Data Forum ( RatSWD). https://doi.org/10.17620/02671.39.

Trezza, Domenico. 2023. “To scrape or not to scrape, this is dilemma. The post-API scenario and implications on digital research.” Frontiers in Sociology 8 (March): 1145038. https://doi.org/10.3389/fsoc.2023.1145038.

Zimmer, Michael. 2010. “"But the Data is Already Public": On the Ethics of Research in Facebook.” Ethics and Information Technology 12 (4): 313–25. https://doi.org/10.1007/s10676-010-9227-5.

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{achmann-denkler2025,
  author = {Achmann-Denkler, Michael},
  title = {Data {Collection} in a {Post-API} {World}},
  date = {2025-11-03},
  url = {https://social-media-lab.net/data-collection/},
  doi = {10.5281/zenodo.10039756},
  langid = {en}
}

For attribution, please cite this work as:

Achmann-Denkler, Michael. 2025. “Data Collection in a Post-API World.” November 3, 2025. https://doi.org/10.5281/zenodo.10039756.