Perplexity is repeatedly modifying their user agent and changing IPs and ASNs to hide their crawling activity, in direct conflict with explicit no-crawl preferences expressed by websites.
The article explains PerplexityBot respects robots.txt, but then sends a different request with a different IP and different user-agent. They could very well be using a different method to walk around it.
Or they haven’t been caught yet.
The article explains PerplexityBot respects robots.txt, but then sends a different request with a different IP and different user-agent. They could very well be using a different method to walk around it.
The article explains how they tested for that, and as far as they could tell OpenAI is respecting the rules.