• pr06lefs@lemmy.ml
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    2 months ago

    So you’re a cloudflare customer and you wish they would let the perplexity traffic multiplier through to your website? You can leave cloudflare any time you want.

    • FauxLiving@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      5
      ·
      2 months ago

      🙄You’re an Internet user and you don’t like AI so you can leave the Internet anytime you want.

      That’s not a good argument, what about the users who want to block mass scraping but want to make their content available to users who are using these tools? Cloudflare exists because it allows legitimate traffic, that websites want, and blocks mass scraping which the sites don’t want.

      If they’re not able to distinguish mass scraping traffic from user created traffic then they’re blocking legitimate users that some website owners want.

      • pr06lefs@lemmy.ml
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        2
        ·
        2 months ago

        Yes your “leave the internet any time you want” strawman is not a good argument.

        If allowing perplexity while blocking the bad guys is so easy why not find a service that does that for you?

        • FauxLiving@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          4
          ·
          2 months ago

          The topic is that Cloudflare is classifying human sourced traffic as bot sourced traffic.

          Saying “Just don’t use it” is a straw man. It doesn’t change the fact that Cloudflare, one of the largest CDNs representing a significant portion of the websites and services in the US, is misclassifying traffic.

          I used mine intentionally while knowing it was a straw man, did you?

          The same with “if it’s so easy, just don’t use it” hopefully for obvious reasons.

          This affects both the customers of Cloudflare (the web service owners) as well as the users of the web services. A single site/user opting out doesn’t change the fact that a large portion of the Internet is classifying human sourced traffic as bot sourced traffic.

          • pr06lefs@lemmy.ml
            link
            fedilink
            English
            arrow-up
            5
            arrow-down
            2
            ·
            2 months ago

            LOL “human sourced traffic” oh the tragedy. I for one am rooting for perplexity to go out of business forever.

            • FauxLiving@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              arrow-down
              4
              ·
              2 months ago

              I for one am rooting for perplexity to go out of business forever.

              Yeah, I know.

              You’re engaging in motivated reasoning. That’s why you’re saying irrational things, because you’re working backwards from a conclusion (AI bad).

              • pr06lefs@lemmy.ml
                link
                fedilink
                English
                arrow-up
                5
                arrow-down
                1
                ·
                2 months ago

                I don’t see how categorically blocking non-human traffic is irrational given the current environment of AI scanning. And what’s rational about demanding cloudflare distinguish between the ‘good guy’ AI and ‘bad guy’ AI without proposing any methodology for doing so.

                • FauxLiving@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  3
                  arrow-down
                  4
                  ·
                  2 months ago

                  It is blocking human traffic, that’s the entire premise of the article.

                  Attempting to say that this is non-human traffic makes no sense if you understand how a browser works. When you load a website your browser, acting as an agent, does a lot of tasks for you and generates a bunch of web requests across multiple hosts.

                  Your browser downloads the HTML from the website, it parses the contents of that file for image, script and CSS links, it retrieves them from the various websites which host them, it interprets the JavaScript and makes any web requests based on that. Often the scripting has a user constantly sending requests to a website in order to update the content (like using web based email).

                  All of this is automated and done on your behalf. But you wouldn’t classify this traffic as non-human because a person told the browser to do that task and the task resulted in a flurry of web requests and processing on behalf of the user.

                  Summarization is just another task, which is requested by a human.

                  The primary difference, and why it is incorrectly classified, is because the summarization tools use a stripped down browser. It doesn’t need JavaScript to be rendered or CSS to change the background color so it doesn’t waste resources on rendering that stuff.

                  Cloudflare detects this kind of environment, one that doesn’t fully render a page, and assumes that it is a web scraper. This used to be a good way to detect scraping because the average user didn’t use web automation tools and scrapers did.

                  Regular users do use automation tools now, so detecting automation doesn’t guarantee that the agent is a scraper bot.

                  The point of the article is that their heuristics doesn’t work anymore because users use automation tools in a manner that doesn’t generate tens of millions of requests per second and overwhelm servers and so it shouldn’t classify them the same way.

                  The point of Cloudflare’s bot blocking is to prevent a single user from overwhelming a site’s resources. These tools don’t do that. Go use any search summarization tool and see for yourself, it usually grabs one page from each source. That kind of traffic uses less resources than a human user (because it only grabs static content).

                  • pr06lefs@lemmy.ml
                    link
                    fedilink
                    English
                    arrow-up
                    4
                    arrow-down
                    1
                    ·
                    2 months ago

                    so how would cloudflare tell the difference between the good ‘stripped down’ queries and the bad? still not hearing how that is supposed to work. if there’s no way to tell the difference, the baby will be thrown out with the bathwater, and I can’t blame them.