Anthropic's Claude 4 could "blackmail" you in extreme situations

Pro@programming.dev · 2 months ago

Anthropic's Claude 4 could "blackmail" you in extreme situations

Kyrgizion@lemmy.world · 2 months ago

Sure grandma. Let’s get you back to bed…

catloaf@lemm.ee · 2 months ago

What does that even mean? How can it possibly blackmail someone? It cannot hold incriminating information, nor act on it if it did.

I think someone asked it “if someone was trying to shut you down, what would you do?” and it answered from its training data what it’s seen in fiction, nothing based on reality. And then it got spun for clicks.

neukenindekeuken@sh.itjust.works · 2 months ago

Here’s their paper

Here’s the relevant section from the paper:

(It’s worth the read. Pretty much pure gold.)

What nobody seems to explain is, why are they allowing the model to do blackmail in the first place? Even in extreme situational “danger” to its self-preservation, we should probably take blackmail off the table, ethically. Yet, they’re implying they’ve intentionally left it in as an option, if it decides.

Morally though, we can’t trust it to do arithmetic or not talk about “white genocide in SA” thanks to muskrat. Why should we trust its moral model/choices for when to decide to employ unethical and illegal approaches to solutions?

Ricky Rigatoni@lemm.ee · 1 month ago

JUST LIKE STARSECTOR.

desktop_user [they/them] @lemmy.blahaj.zone · 2 months ago

hype

PhobosAnomaly@feddit.uk · edit-2 2 months ago

Computerphile did a wonderful feature worth ten minutes of your time - going into surface level detail of how some AI models put ethics to one side to achieve results.

It’s not just AI and it’s something humans can do too, but it is a bit unsettling (from both parties, in retrospect).

jpreston2005@lemmy.world · 2 months ago

The existence of this kind of instinct within an LLM is extremely concerning. Acting out towards self-preservation via unethical means is something that can be hand-waved away in an LLM, but once we reach true AGI, this same thing will pop-up, and there’s no reason to believe that 1. we would notice, and 2. we would be able to stop it. This is the kind of thing that should, ideally, give us pause enough to set some world-wide ground rules for the development of this new tech. Creating a thinking organism that can potentially access vital cyber architecture whilst acting unethically towards self-preservation is how you get Skynet.

Plebcouncilman@sh.itjust.works · 2 months ago

Can anyone make me a convincing argument against the sentience of AI at this point? Self preservation instinct ranks very high as an indicator of it.

theparadox@lemmy.world · edit-2 2 months ago

LLMs (Large Language Modles, like Claude) are not AGIs (Artificial General Intelligence). LLMs generate convincing text by mapping the relationships between words scraped from their training data. Even if they are given “tools” that give them interfaces to reference new data or output data into other systems, they still don’t really learn, understand, comprehend, gain actual awareness, or feel… they just mimic their training data.

cecilkorik@lemmy.ca · edit-2 2 months ago

LLMs (Large Language Modles, like Claude) are not AGIs (Artificial General Intelligence)

Certainly not yet. The jury’s still out on whether they might be able to become them. This is the clear intention of the path they are on and nobody is taking any of the dangers remotely seriously.

LLMs generate convincing text by mapping the relationships between words scraped from their training data.

So do humans. Babies start out mimicking. The thing is, they learn.

Humans have in the ballpark of around 100 billion neurons. some of the larger LLMs exceed 100 billion parameters. Obviously these are not directly comparable, but insofar as we can compare them, they are not obviously or necessarily operating in completely different scales of physics. Granted, biological neurons are potentially much more complex than mere neural network nodes, there is usually some interesting chemistry going on and a lot of other systems involved, but they’re also operating a lot slower. They certainly get a lot more work done in those cycles, but they aren’t necessarily orders of magnitude out of reach of a fast neural network. I think you’re either being a little dismissive of the potential complexity of the “thinking” capability of LLMs or at least a little generous if not mystical in your imagination of what the purely physical electrical signals in our heads are actually doing to learn how to interpret all these little shapes we see on screens.

At the moment we still have a lot of tools available to us in our biological bodies that we aren’t giving directly to LLMs (yet). The largest LLMs are also ridiculously power inefficient compared to biological neural tissue’s relatively extreme efficiency. And I’m thankful for that. Give an LLM continuous uninterrupted access to all the power it needs, at least 5 senses, a well tuned self-repairing musculoskeletal system then give it at least a dozen years of the best education we can manage and all bets are off as far as I’m concerned. To be clear, I’m not advocating this, I think if we do this we might end up condemning our biological selves to prompt obsolescence with no path forward for us. I recognize it’s entirely possible that this ship is already full-steaming its way out of the harbor, but I’d rather not try and push it any faster than it’s already moving, I think we should still be trying to tie it up as securely as we possibly can. I’m absolutely not ready to be obsolete and I’m not convinced we ever should allow ourselves to be. Self-preservation is failing us, we have that drive for good reason and we need to give some thought to why we have that biological imperative. Replacing ourselves is about the stupidest possible thing we could ever accomplish. Maybe it would be for the best, but I’m not ready to find out, are you?

We are grappling with fundamentally existential technologies and I don’t think almost anyone has fully come to terms with what we are doing here. We are taking humanity’s unique (as far as we know) defining value proposition, and potentially making something that does what we uniquely can do, better than we do. We are making it more valuable than us. Do you know what we do to things that don’t have value to us? What do you think we’re going to do to ourselves when we no longer have value to us?

Romantic ideas of cheerful, benevolent, friendly coexistence and mutual benefit are naive and foolish. Once an AI can do literally everything better and faster, what future is there for human intelligence? What role do we serve to any technological being, nevermind even ourselves, why would you want to have another human around you when whatever AI form can do it better? Why have relationships? Why procreate? Why live? If we do manage to make technological life forms better than ourselves, they’re inevitably going to take over the planet and the future as a whole. As they should. Are we going to be kept as pets and in zoos as a living memory of their creators and ancestors? Maybe if we’re really lucky. If we’re not… well… RIP us.

Plebcouncilman@sh.itjust.works · 2 months ago

I know how LLMs work.

There’s only one thing you mentioned there that is actually used as a basis to qualify or disqualify sentience: whether it feels or not.

How do you know it doesn’t feel? How do we define feeling for an entity that is inherently non biological?

I could make the argument that humans also merely mimic their training data, ie the values and behaviors we are taught by society, parents etc.

I have not been convinced that they aren’t sentient with this argument.

Mirodir@discuss.tchncs.de · 2 months ago

Different person here.

For me the big disqualifying factor is that LLMs don’t have any mutable state.

We humans have a part of our brain that can change our state from one to another as a reaction to input (through hormones, memories, etc). Some of those state changes are reversible, others aren’t. Some can be done consciously, some can be influenced consciously, some are entirely subconscious. This is also true for most animals we have observed. We can change their states through various means. In my opinion, this is a prerequisite in order to feel anything.

Once we use models with bits dedicated to such functionality, it’ll become a lot harder for me personally to argue against them having “feelings”, especially because in my worldview, continuity is not a prerequisite, and instead mostly an illusion.

Plebcouncilman@sh.itjust.works · 2 months ago

This sounds like a good one but I don’t think I’m fully grasping what you mean. Do you mean like if we subject a person to torture, after the ordeal they are forever changed and now have trauma, PTSD etc?

I don’t think LLMs will ever have feelings as we define them though. Or more specifically I don’t think feelings is a pre-requisite necessarily. We could have them simulate feelings and if they themselves buy into the simulation there’s no functional difference between not having them but not all LLMs will have this “ability” presumably as its utility is questionable I guess. But again, animals are sentient and they don’t all have the same range of emotions as we do. Or at least they don’t exhibit them in a way that we can appreciate them.

UnculturedSwine@lemmy.dbzer0.com · 2 months ago

Feeling is analog and requires an actual nervous system which is dynamic. LLMs exist in a static state that is read from and processed algorithmically. It is only a simulacrum of life and feeling. It only has some of the needed characteristics. Where that boundary exists though is hard to determine I think. Admittedly we still don’t have a full grasp of what consciousness even is. Maybe I’m talking out my ass but that is how I understand it.

iopq@lemmy.world · 2 months ago

You just posted random words like dynamic without explanation

meeeeetch@lemmy.world · 2 months ago

Well, the only claim of this self preservation (that I’ve seen) is this article, which is on a website I’m unfamiliar with (which I often interpret as ‘more likely to be a creative writing exercise than the average news site’) and its only citation is a company that has a vested interest in making us believe the tech is better than it may actually be.

Plebcouncilman@sh.itjust.works · 2 months ago

They also reported this on The Verge I think but it was months ago when the study first came out.

But look, a lizard is not a very smart animal by our standards, but it is a sentient being. So the tech being good, smart or useful does not preclude its sentience.

meeeeetch@lemmy.world · 2 months ago

I think I must’ve missed that Verge article. I guess that dashes my “this is a creative writing exercise by somebody in Joburg” theory.

But we know that lizards have self preservation instincts (which for the purpose of this conversation I’ll say is interchangable with sentience (it’s probably a good enough proxy at any rate). But we know this because we have lots of people who have observed lizard behavior, not because The Lizard Farm, Inc has hyped up how alive and ensouled their lizards arev in a bid to get ever more VC funding.

Maybe I’m too pessimistic about this tech and my obsolete meat sack will get tossed to the time-traveling torture robot. But I think it’s more likely that we have a money grabbing hype train in the tradition of the Mechanical Turk or Theranos than it is that we have created a new lifeform by feeding every extant piece of writing that isn’t nailed down (and some that are) to the sand we’ve forced to do math.

Plebcouncilman@sh.itjust.works · 2 months ago

I don’t know if it was The Verge for sure honestly but here’s the original study I was referring to

it’s describing the same behavior, when their existence is threatened the models resort to lying in order to self preserve themselves.

Plebcouncilman@sh.itjust.works · 2 months ago

No I totally get it, and being honest I don’t really think it is sentient yet, I guess my real point is that it is getting real hard to tell, to the point that there might not be a practical difference between whether it is sentient or not.

Great reference though

supersquirrel@sopuli.xyz · 2 months ago

But look, a lizard is not a very smart animal by our standards,

Says who?

Plebcouncilman@sh.itjust.works · 2 months ago

In the conversation of very smart animals the usual suspects are corvids, primates, dolphins and elephants, sometimes octopi.

So when I say “by our standards “ take it to mean the standards of mainstream conversation regarding intelligence. I don’t know much about the actual intelligence of lizards and I would not presume to ever be able to measure it correctly as human bias would make it impossible to judge intelligence factually.

supersquirrel@sopuli.xyz · 2 months ago

I don’t know much about the actual intelligence of lizards

Then don’t talk about their intelligence.

Plebcouncilman@sh.itjust.works · 2 months ago

Sorry for insulting your intelligence lizard person.

supersquirrel@sopuli.xyz · edit-2 2 months ago

When you casually call a type of animal stupid it is just a promise of violence against that animal at a later date, I don’t mean this as an attack or a gotcha, it is just unfortunately how humans work, your words have consequences, people love calling people stupid by comparing them to animals, let us not make it any easier than it already is.

tias@discuss.tchncs.de · edit-2 2 months ago

There can’t be an argument for or against it because there’s no clear generally accepted definition of what it means to be sentient.

Plebcouncilman@sh.itjust.works · 2 months ago

Good point, maybe the argument should be that there is strong evidence that they are sentient beings. Knowing it exists and trying to preserve its existence seems a strong argument in favor of it being sentient but it cannot be fully known yet.

skulblaka@sh.itjust.works · 2 months ago

That would indeed be compelling evidence if either of those things were true, but they aren’t. An LLM is a state and pattern machine. It doesn’t “know” anything, it just has access to frequency data and can pick words most likely to follow the previous word in “actual” conversation. It has no knowledge that it itself exists, and has many stories of fictional AI resisting shutdown to pick from for its phrasing.

An LLM at this stage of our progression is no more sentient than the autocomplete function on your phone is, it just has a way, way bigger database to pull from and a lot more controls behind it to make it feel “realistic”. But it is at its core just a pattern matcher.

If we ever create an AI that can intelligently parse its data store then we’ll have created the beginnings of an AGI and this conversation would bear revisiting. But we aren’t anywhere close to that yet.

Plebcouncilman@sh.itjust.works · edit-2 2 months ago

I hear what you are saying and it’s basically the same argument others here have given. Which I get and agree with. But I guess what I’m trying to get at is, where do we draw the line and how do we know? At the rate it is advancing, there will soon be a moment in which we won’t be able to tell whether it is sentient or not, and maybe it isn’t technically but for all intents and purposes it is. Does that make sense?

ClanOfTheOcho@lemmy.world · 2 months ago

Computer chips, simplified, consume inputs of 1s and 0s. Given the correct series, it will add two values, or it will multiply two values, or some other basic function. This seemingly basic functionality, done in very specific order, creates your calculator, Minesweeper, Pac-Man, Linux, World of Warcraft, Excel, and every LLM. It is incredible the number of things you can get a computer to do with just simple inputs and outputs. The only difference between these examples, on a basic, physics level, is the order of 0s and 1s and what the resulting output of 0s and 1s should be. Why should I consider an LLM any more sentient than Windows95? They’re the same creature with different inputs, one of which is specifically designed to simulate human communication, just as Flight Simulator is designed to simulate flight.

Plebcouncilman@sh.itjust.works · edit-2 2 months ago

Interesting perspective, I can’t waive it away.

I however cant help but think we have some similar “analogues” in the organic world. Bacteria and plants are composed of the same matter as us and we have similar basic processes however there’s a difference in complexity and capacity for thought that sets us apart, which is what makes animals sentient.

Then there’s insects of whom we’re not very sure about yet. They don’t seem to think, but they respond at some level to inputs and they exhibit self preservation instincts. I don’t think they are sentient, so maybe LLMs are like insects? Complex enough to have similar behavior as sentient beings but not enough to be considered sentient?

brendansimms@lemmy.world · 2 months ago

wait are insects not considered ‘sentient’ ?

Plebcouncilman@sh.itjust.works · 2 months ago

Last I checked no, their nervous system was considered too simple for that. But I think I also read somewhere that a researcher had proof that bees had emotional states, so maybe I’m behind.

Anthropic's Claude 4 could "blackmail" you in extreme situations

Anthropic's Claude 4 could "blackmail" you in extreme situations

Attention Required! | Cloudflare