• 0 Posts
  • 6 Comments
Joined 2 years ago
cake
Cake day: June 29th, 2023

help-circle




  • Great article, thanks for sharing it OP.

    For example, the Anthropic researchers who located the concept of the Golden Gate Bridge within Claude didn’t just identify the regions of the model that lit up when the bridge was on Claude’s mind. They took a profound next step: They tweaked the model so that the weights in those regions were 10 times stronger than they’d been before. This form of “clamping” the model weights meant that even if the Golden Gate Bridge was not mentioned in a given prompt, or was not somehow a natural answer to a user’s question on the basis of its regular training and tuning, the activations of those regions would always be high.

    The result? Clamping those weights enough made Claude obsess about the Golden Gate Bridge. As Anthropic described it:

    If you ask this “Golden Gate Claude” how to spend $10, it will recommend using it to drive across the Golden Gate Bridge and pay the toll. If you ask it to write a love story, it’ll tell you a tale of a car who can’t wait to cross its beloved bridge on a foggy day. If you ask it what it imagines it looks like, it will likely tell you that it imagines it looks like the Golden Gate Bridge.

    Okay, now imagine you’re Elon Musk and you really want to change hearts and minds on the topic of, for example, white supremacy. AI chatbots have the potential to fundamentally change how a wide swath of people perceive reality.

    If we think the reality distortion bubble is bad now (MAGAsphere, etc), how bad will things get when people implicitly trust the output from these models and the underlying process by which the model decides how to present information is weighted towards particular ideologies? Considering the rest of the article, which explores the way in which chatbots attempt to create a profile for the user and serve different content based on that profile, now it will be even easier to identify those most susceptible to mis/disinformation and deliver it with a cheery tone.

    How might we, as a society, create a process for conducting oversight for these “tools”? We need a cohesive approach that can be explained to policymakers in a way that will call them to action on this issue.


  • “If you have nothing to hide then you have nothing to fear.”

    Given the strong presence of the privacy community on Lemmy, I have to say that I’m a bit shocked to hear so many in these discussions chiming in to support voting transparency.

    I’m on board with the idea of using ring signatures to validate the legitimacy of a vote and moderating spammers based on metadata.

    Or, for something (potentially) easier to implement, aggregating vote tallies at the instance level (votes visible to your instance admin and mods) and federating the votes anonymously by instance, so you might see something like:

    Up/down votes are the method of community moderation that sets Reddit apart from many other platforms. If the Lemmy community is trying to capture some of that magic, which is good for both highlighting gems AND burying turds, radical transparency isn’t the path to get there.

    In fact, I’d argue that the secret ballot has already been thoroughly discussed and tested throughout history and there are plenty of legitimate examples of why it would be better if they were more secret than they are today.

    Many people have brought up the idea of brigading, but would this truly get better if votes are public? Is it hard to imagine noticing that an account you generally trust has voted and matching their vote, even subconsciously?

    For those who feel that they aren’t able to post on Lemmy because downvotes make you feel sad, my feeling is that if you make posts in a community and they consistently get down voted to oblivion, you’re in the wrong place. The people in that community don’t value your contributions, and you should find another place to share them. This is the system working as intended and the mods should be thankful that such a system has been implemented.

    The last point I’ll make is about the potential for a chilling effect - making users less likely to interact with a post in any way due to a fear of retaliation. Look - if you’re looking for a platform where all of your activity is public, those are out there. Why should we make Lemmy look just like every other platform?