Academy/Recon/Data Scraping
High severityRecon

Data Scraping

A scraper bot crawls your API or website, calling endpoints in rapid sequence to copy your data — user profiles, product listings, pricing, contact details — and export it. Unlike a real user, a scraper has no interest in the UI; it just wants the data underneath.

Think of it this way

Imagine someone walking into your store not to buy anything, but to photograph every price tag, product name, and customer name badge on display — then walking out and selling that information to your competitors. Data scraping is that, but done thousands of times per hour by a bot.

How it works

The attacker identifies which API endpoints return useful data — search results, user profiles, product details, pricing. They write a bot that systematically calls those endpoints, often with incremented IDs or paginated requests, extracting and storing the responses. They disguise the traffic by rotating IP addresses, mimicking browser behaviour, and pacing requests to stay under simple rate limits.

Real-world scenarios

Scenario 1

Competitor price monitoring

A competitor deploys a scraper that calls your pricing API every hour, automatically adjusting their prices to always be marginally cheaper than yours — without you knowing they have real-time visibility into your pricing.

Scenario 2

Marketplace data theft

A new startup scrapes all 500,000 listings from an established marketplace — including seller contacts, prices, and descriptions — to bootstrap their own competing platform without doing any original work.

Scenario 3

User contact harvesting

An attacker calls your user search or profile API with every name variation, harvesting email addresses and phone numbers to build a spam or phishing list.

How Anomira detects this

Anomira detects scrapers through abnormally high request rates to data-serving endpoints, sequential patterns in request parameters (incrementing IDs, paginating through all results), near-zero time between requests (non-human pacing), and the absence of standard browser signals.

What to do

  • Implement authentication and rate limiting on all data endpoints — even public ones.
  • Add pagination limits — no single session should be able to retrieve your entire dataset.
  • Return only the data fields each user role actually needs (field-level access control).
  • Use Anomira to block IPs showing scraper behaviour and add CAPTCHAs to data-heavy endpoints.
  • Consider adding a terms of service API clause and pursuing legal action for large-scale scrapers.

Related attacks

See this attack in your live API traffic

Anomira detects data scraping automatically — no configuration needed.