lemmy.helios42.de
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
Pro@mander.xyz to Technology@programming.devEnglish ·
edit-2
1 month ago

Cloudflare: Perplexity uses stealth crawling techniques, like undeclared user agents and rotating IP addresses, to evade robots.txt rules and network blocks

blog.cloudflare.com

external-link
message-square
7
link
fedilink
  • cross-posted to:
  • technology@lemmy.zip
55
external-link

Cloudflare: Perplexity uses stealth crawling techniques, like undeclared user agents and rotating IP addresses, to evade robots.txt rules and network blocks

blog.cloudflare.com

Pro@mander.xyz to Technology@programming.devEnglish ·
edit-2
1 month ago
message-square
7
link
fedilink
  • cross-posted to:
  • technology@lemmy.zip
Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives
blog.cloudflare.com
external-link
Perplexity is repeatedly modifying their user agent and changing IPs and ASNs to hide their crawling activity, in direct conflict with explicit no-crawl preferences expressed by websites.

Perplexity blog.

  • beeng@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 month ago

    Perplexity fired back in their blog.

    Pretty tasty.

Technology@programming.dev

Technology@programming.dev

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !Technology@programming.dev

Share interesting Technology news and links.

Rules:

  1. No paywalled sites at all.
  2. News articles has to be recent, not older than 2 weeks (14 days).
  3. No external video links, only native(.mp4,…etc) links under 5 mins.
  4. Post only direct links.

To encourage more original sources and keep this space commercial free as much as I could, the following websites are Blacklisted:

  • Al Jazeera;
  • NBC;
  • CNBC;
  • Substack;
  • Tom’s Hardware;
  • ZDNet;
  • TechSpot;
  • Ars Technica;
  • Vox Media outlets, with exception for Axios;
  • Engadget;
  • TechCrunch;
  • Gizmodo;
  • Futurism;
  • PCWorld;
  • ComputerWorld;
  • Mashable;
  • Hackaday;
  • WCCFTECH;
  • Neowin.

More sites will be added to the blacklist as needed.

Encouraged:

  • Archive links in the body of the post.
  • Linking to the direct source, instead of linking to an article talking about the source.

Misc:

Relevant Communities:

  • Beehaw Technology- Technology Related Discussions.
  • lemmy.zip Technology- Hard Tech news.
Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 378 users / day
  • 2.13K users / week
  • 4.92K users / month
  • 7.14K users / 6 months
  • 1 local subscriber
  • 552 subscribers
  • 1.6K Posts
  • 3.53K Comments
  • Modlog
  • mods:
  • Pro@programming.dev
  • BE: 0.19.12
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org