Data Collection in a Post-API World

Author

Michael Achmann-Denkler

Published

November 3, 2025

This section introduces how Instagram content can be collected for social-media research in a “post-API” era (Freelon 2018; Bruns 2019; Caliandro 2021; Trezza 2023). Today, you will find no single best access pathway. Instead, you must navigate a heterogeneous ecosystem:

This chapter aims to help you understand this landscape so that you can make informed, responsible methodological choices. While later tutorial pages will show you technical workflows, here we provide the conceptual grounding, legal/ethical boundaries, and historical motivations behind today’s practices.

Background

On April 4, 2018, Facebook shut down the Pages API, removing TOS-compliant pathways to extract public page content and signaling what many now call the post-API era (Freelon 2018). This wasn’t an isolated technical tweak—it marked a structural pivot in how platforms govern research access and how you, as a researcher, must plan data collection. Freelon argues that when platforms can revoke or restrict APIs at will, researchers should learn robust web-scraping workflows and understand the risks of violating Terms of Service, because heavy investment in platform-specific API skills can become obsolete overnight (Freelon 2018).

The broader backdrop is the Cambridge Analytica scandal, which catalyzed a wave of tightened access rules across platforms. Post-2018, access restrictions have reconfigured research practice: some scholars adapt via scraping and mixed tooling, others drift toward “easy-data” platforms (historically, Twitter) with better-documented endpoints—raising concerns about selection biases and the over-study of a few platforms (Trezza 2023). Trezza frames this shift as an “APIcalypse”: not a clean break from APIs, but a messier research landscape that demands diversification of sources, ethical caution with user data, and transparent documentation of trade-offs (Trezza 2023).

Collecting Social-Media Content Under Post-API Conditions

In a post-API environment, there is no single best tool for collecting social-media data. Platforms differ in what they make accessible, and they regularly change access rules without notice. As a researcher, you therefore need to select a collection strategy that fits your research question, your legal and ethical responsibilities, and your technical capacity — not the other way around.

Most importantly: do not let platforms dictate your research questions. Start from what you want to understand, then identify how you can feasibly and responsibly observe it.

Broadly, available approaches fall into three families. None is the universally best, each is useful under different conditions.

  1. Browser-based capture Tools such as Selenium, Playwright, or custom browser extensions collect content as rendered in the interface. Browser workflows can be powerful but require maintenance, responsible authentication management, and clear logging.

  2. Open-source & transparency-oriented tools Packages such as instaloader or platform-provided transparency resources (e.g., Meta Content Library; TikTok Research Tools) can offer structured access to posts, accounts, or engagement metrics. Coverage varies, some support only selected public data, and availability is subject to change.

  3. Commercial collection services Providers like Apify encapsulate scraping, scheduling, logging, and proxy management. These can lower technical complexity.These lower technical overhead but may obscure details about how data were obtained and require researchers to evaluate legal and ethical implications carefully.

Key idea: The “best” tool is the one that enables you to answer your question responsibly and transparently, given legal, ethical, and practical constraints.

Because constraints differ it is perfectly legitimate for two teams studying similar topics to use different data-collection tools. What matters is that you:

  • explain why the chosen approach is appropriate
  • operate within the applicable legal/ethical framework
  • document what was (and was not) collected

For our 2025/26 seminar, we will explore two common strategies in practice:

  • Browser-extension-based collection (Zeeschuimer, Tidal Tales)
  • Commercial managed collection (Apify)

Summary

In today’s post-API landscape, social-media data collection requires deliberate methodological choices rather than reliance on a single unified access point. Your task is to begin from the research question, then select an approach that is legally grounded, ethically defensible, and technically feasible.

A responsible workflow:

  • does not let platforms dictate what you study,
  • selects tools that align with your goals and constraints,
  • and clearly documents decisions, limitations, and trade-offs.

The next pages provide hands-on tutorials to help you apply these principles in practice. They walk you through concrete collection workflows — including browser-extension–based capture and managed commercial collection — and show you how to assemble platform-specific corpora for analysis.

→ Continue to:

References

Bruns, Axel. 2019. After the ‘APIcalypse’: social media platforms and their fight against critical scholarly research.” Information, Communication and Society 22 (11): 1544–66. https://doi.org/10.1080/1369118X.2019.1637447.
Caliandro, Alessandro. 2021. Repurposing digital methods in a post-API research environment: Methodological and ethical implications.” Italian Sociological Review. https://doi.org/10.13136/ISR.V11I4S.433.
Franzke, Aline Shakti, Anja Bechmann, Michael Zimmer, Charles M Ess, and the Association of Internet Researchers. 2020. Internet research: ethical guidelines 3.0: association of internet researchers.” https://aoir.org/reports/ethics3.pdf.
Freelon, Deen. 2018. Computational Research in the Post-API Age.” Political Communication 35 (4): 665–68. https://doi.org/10.1080/10584609.2018.1477506.
Keller, Daphne. 2025. Using the DSA to study platforms.” Verfassungsblog, October. https://doi.org/10.59704/edb8c8d790ab8435.
Kupferschmidt, Kai. 2025. Meta and TikTok are obstructing researchers’ access to data, European Commission rules.” https://www.science.org/content/article/meta-and-tiktok-are-obstructing-researchers-access-data-european-commission-rules.
Rat für Sozial- und Wirtschaftsdaten (RatSWD). 2019. Big Data in den Sozial-, Verhaltens- und Wirtschaftswissenschaften: Datenzugang und Forschungsdatenmanagement - Mit Gutachten "Web Scraping in der unabhängigen wissenschaftlichen Forschung".” RatSWD Output. German Data Forum ( RatSWD). https://doi.org/10.17620/02671.39.
Trezza, Domenico. 2023. To scrape or not to scrape, this is dilemma. The post-API scenario and implications on digital research.” Frontiers in Sociology 8 (March): 1145038. https://doi.org/10.3389/fsoc.2023.1145038.
Zimmer, Michael. 2010. "But the Data is Already Public": On the Ethics of Research in Facebook.” Ethics and Information Technology 12 (4): 313–25. https://doi.org/10.1007/s10676-010-9227-5.

Reuse

Citation

BibTeX citation:
@online{achmann-denkler2025,
  author = {Achmann-Denkler, Michael},
  title = {Data {Collection} in a {Post-API} {World}},
  date = {2025-11-03},
  url = {https://social-media-lab.net/data-collection/},
  doi = {10.5281/zenodo.10039756},
  langid = {en}
}
For attribution, please cite this work as:
Achmann-Denkler, Michael. 2025. “Data Collection in a Post-API World.” November 3, 2025. https://doi.org/10.5281/zenodo.10039756.