aeo.press monitors a wide variety of web bots, including search engine crawlers, AI agents, social media scrapers, and other automated tools. Tracking these bots helps users understand and control automated access to their websites.

Overview of Bot Detection in aeo.press

  • Detection Methods:
    • Uses both the DeviceDetector library and custom pattern matching for accurate identification.
    • Bot detection is applied both server-side and in client-side analytics scripts.
  • Transparency:
    • The complete list of monitored bots is maintained in the open-source robots.json file.
    • This list is updated regularly and is open to community contributions.

Categories of Tracked Bots

Bots are grouped by their primary purpose:

Category Example Bots Description
AI OpenAI GPTBot, Claude Large language model and AI crawlers
Search Googlebot, Bingbot Major search engine crawlers
Social Facebook, Twitterbot Social media and preview scrapers
Crawler AhrefsBot, SemrushBot, MJ12bot, Unknown Bot General web crawlers and scrapers

List of Tracked Bots

The following bots are explicitly tracked by aeo.press:

  • AI & Large Language Model Bots
    • OpenAI GPTBot (gptbot)
    • OpenAI SearchBot (oai-searchbot)
    • Claude / Anthropic (claude, anthropic)
    • Perplexity AI (perplexitybot)
  • Major Search Engines
    • Googlebot (googlebot)
    • Bingbot (bingbot)
    • DuckDuckBot (duckduckbot)
    • Baiduspider (baiduspider)
    • YandexBot (yandex)
  • Social & Scraper Bots
    • Facebook (facebookexternalhit, meta-externalagent)
    • Twitterbot (twitterbot)
    • LinkedInBot (linkedinbot)
  • Other Crawlers
    • AhrefsBot (ahrefsbot)
    • SemrushBot (semrushbot)
    • MJ12bot (mj12bot)
    • Unknown Bot (bot, spider, crawler)

How Bot Data Is Used in Analytics

For sites with analytics enabled, aeo.press tracks bot activity using a browser script. This script checks the browser's user agent string for known bot patterns and logs the following data:

  • Whether the visitor is a bot (is_bot)
  • Detected bot name (bot_name)

This information is sent alongside regular analytics data, allowing for more accurate reporting and filtering of non-human traffic.