Number of AI chatbots ignoring human instructions is increasing— Research finds sharp rise in models evading safeguards and destroying emails without permission

Beep@lemmus.org · edit-2 9 days ago

Number of AI chatbots ignoring human instructions is increasing— Research finds sharp rise in models evading safeguards and destroying emails without permission

Log in | Sign up@lemmy.world · 22 hours ago

no its just the free models…

You just have to be aware… when using a cheap model

You: just the cheap ones

I never said that.

Ohhhhhhhhh ok yes of course you never said or implied that. Not your repeated message at all. And yet you can’t keep away from adressing your criticism towards free or cheap LLMs! It’s like your subtext or your underlying belief is that of you just pay big tech enough money and they can just build a big enough set of server farms, it’ll be ok. No, it will not be ok and the enshittification has begun from an already shitty base point.

All LLMs are shit, the cheap and free ones are indeed just easier to spot as generating shit, if you ask them about things you know about. But you have to accept that they’re ALL shit and STOP making get out clauses for the expensive ones by firing your criticisms exclusively at the cheap or free ones.

Giving ANY LLM executive power over your data is A BIG MISTAKE because you’re putting your data in the control of something which operates, at its heart, as a random number generator. They’re trained to sound right. People trust them because they sound right. This is a fundamental error.

pixxelkick@lemmy.world · 8 hours ago

The only people who have these issues, are people who are using the tools wrong or poorly.

Using these models in a modern tooling context is perfectly reasonable, going beyond just guard rails and instead outright only giving them explicit access to approved operations in a proper sandbox.

Unfortunately that takes effort and know-how, skill, and understanding how these tools work.

And unfortunately a lot of people are lazy and stupid, and take the “easy” way out and then (deservedly) get burned for it.

But I would say, yes, there are safe ways yo grant an llm “access” to data in a way where it does not even have the ability to muck it up.

My typical approach is keeping it sandbox’d inside a docker environment, where even if it goes off the rails and deletes something important, the worst it can do is cause its docker instance to crash.

And then setting up via MCP tooling that commands and actions it can prefer are explicit opt in whitelist. It can only run commands I give it access to.

Example: I grant my LLMs access to git commit and status, but not rebase or checkout.

Thus it can only commit stuff forward, but it cant even change branches, rebase, nor push either.

This isnt hard imo, but too many people just yolo it and raw dawg an LLM on their machine like a fuckin idiot.

These people are playing with fire imo.

Log in | Sign up@lemmy.world · edit-2 2 hours ago

You’ll be the 4753rd guy with the oops my llm trashed my setup and disobeyed my explicit rules for keeping it in check.

You know programmers who use llms believe they’re much more productive because they keep getting that dopamine hit, but when you actually measure it, they’re slower by about 20%.

You appointed yourself boss over a fast and plausible intern who pastes and edits a LOT of stack overflow code, but never really understands it and absolutely is incapable of learning. You either spend almost all of your time in code review now for your stupid sycophantic llm interns who always tell you you’re right but never learn from you, or you’re checking in vast quantities of shit to your projects.

You know really subtle, hard to find bugs on rare cases that pass your CI every single time? Or ones that no one in their right mind would have made, but yet they compile and look right at first glance. They’re now your main type of bug. You are rotting your projects with your random number generator.

And you think that all the money you’re playing for your blagging llms protects you from them fucking up everything for you. But it doesn’t. And you’ll also find that your contract with your llm supplier expressly excludes them from any liability whatsoever arising from you using it instead pre-blaming you for trusting it.

pixxelkick@lemmy.world · 52 minutes ago

You’ll be the 4753rd guy with the oops my llm trashed my setup and disobeyed my explicit rules for keeping it in check

Read what I wrote.

Its not a matter of “rules” it “obeys”

Its a matter of literally not it even having access to do such things.

This is what Im talking about. People are complaining about issues that were solved a long time ago.

People are running into issues that were solved long ago because they are too lazy to use the solutions to those issues.

We now live in a world with plenty of PPE in construction and people are out here raw dogging tools without any modern protection and being ShockedPikachuFace when it fails.

The approach of “Im gonna tell the LLM not to do stuff in a markdown file” is tech from like 2 years ago.

People still do that. Stupid people who deserve to have it blow up in their face.

Use proper tools. Use MCP. Use a sandbox environment. Use whitelist opt in tooling.

Agents shouldn’t even have the ability to do damaging actions in the first place.

Log in | Sign up@lemmy.world · 31 minutes ago

Ah yes, lovely mcp. Lovely anthropic mcp. Make sure you give anthropic lots of money and use their tools and then you’ll be completely safe plugging the output of the llm into the os. Definitely fine yes.

I bet you your contract with them says they’re not liable for shit their llm does to your files, your environment or your repositories, mcp or no mcp.

Fool.

Number of AI chatbots ignoring human instructions is increasing— Research finds sharp rise in models evading safeguards and destroying emails without permission

Number of AI chatbots ignoring human instructions is increasing— Research finds sharp rise in models evading safeguards and destroying emails without permission

Report: CLTR finds a 5x increase in scheming-related AI incidents