How to Stop ChatGPT from Going Off the Rails

The viral chatbot wasn’t up to writing a WIRED newsletter. But it's fluent enough to raise questions about how to keep eloquent AI systems accountable.
Two white mice hiding inside a hole and a cat peeking through the other side. A tense standoff.
Photograph: Michael Blann/Getty Images

When WIRED asked me to cover this week’s newsletter, my first instinct was to ask ChatGPT—OpenAI’s viral chatbot—to see what it came up with. It’s what I’ve been doing with emails, recipes, and LinkedIn posts all week. Productivity is way down, but sassy limericks about Elon Musk are up 1000 percent.

I asked the bot to write a column about itself in the style of Steven Levy, but the results weren’t great. ChatGPT served up generic commentary about the promise and pitfalls of AI, but didn’t really capture Steven’s voice or say anything new. As I wrote last week, it was fluent, but not entirely convincing. But it did get me thinking: Would I have gotten away with it? And what systems could catch people using AI for things they really shouldn’t, whether that’s work emails or college essays?

To find out, I spoke to Sandra Wachter, a professor of technology and regulation at the Oxford Internet Institute who speaks eloquently about how to build transparency and accountability into algorithms. I asked her what that might look like for a system like ChatGPT.

Amit Katwala: ChatGPT can pen everything from classical poetry to bland marketing copy, but one big talking point this week has been whether it could help students cheat. Do you think you could tell if one of your students had used it to write a paper?

Sandra Wachter: This will start to be a cat-and-mouse game. The tech is maybe not yet good enough to fool me as a person who teaches law, but it may be good enough to convince somebody who is not in that area. I wonder if technology will get better over time to where it can trick me too. We might need technical tools to make sure that what we’re seeing is created by a human being, the same way we have tools for deepfakes and detecting edited photos.

That seems inherently harder to do for text than it would be for deepfaked imagery, because there are fewer artifacts and telltale signs. Perhaps any reliable solution may need to be built by the company that’s generating the text in the first place. 

You do need to have buy-in from whoever is creating that tool. But if I’m offering services to students I might not be the type of company that is going to submit to that. And there might be a situation where even if you do put watermarks on, they’re removable. Very tech-savvy groups will probably find a way. But there is an actual tech tool [built with OpenAI’s input] that allows you to detect whether output is artificially created. 

What would a version of ChatGPT that had been designed with harm reduction in mind look like? 

A couple of things. First, I would really argue that whoever is creating those tools put watermarks in place. And maybe the EU’s proposed AI Act can help, because it deals with transparency around bots, saying you should always be aware when something isn’t real. But companies might not want to do that, and maybe the watermarks can be removed. So then it’s about fostering research into independent tools that look at AI output. And in education, we have to be more creative about how we assess students and how we write papers: What kind of questions can we ask that are less easily fakeable? It has to be a combination of tech and human oversight that helps us curb the disruption.

You’ve done a lot of work on counterfactuals, a way of figuring out why an AI system came to certain decisions by asking what decision it would have made if you changed the input data. It struck me that ChatGPT enables people to do this much more easily than most models: You can tweak your query and see how that changes its response. 

Being able to interact and figure out what it’s doing, and how smart or how dumb it is, is really empowering for people. You feel so much more powerless if it’s a black box that’s just completely opaque and you have no understanding of how it makes decisions and why.

As someone who works on preventing AI from doing harm, is it worrying that ChatGPT has picked up more than a million users in a week? Or is it good that more people are being exposed to AI in a relatively safe playground?

With all technology you can’t really say if it’s good or bad—it’s both. It really depends on what you do with it. From an objective perspective, it’s super exciting that the capabilities are there—look at what human hands can build, that’s amazing. But it can also be used for something problematic: deception, deceit, misinformation, or trying to hurt people. The technology itself in this instance, I think, is still neutral.

Time Travel

The frenzy around ChatGPT reminded me of one of the first chatbots I ever interacted with, during my formative online years in the pre-Facebook maelstrom of MySpace and MSN Messenger (RIP). The stubborn automaton was called SmarterChild and launched in 2000, first on AIM and then on MSN. But according to my chat history—yes, I’m a digital hoarder—I was still volleying bored suggestions at it as recently as 2009. Turns out, WIRED was doing the same thing. This dialog from the February 2005 issue is an instructive marker of how far things have come.

paulseth3: What’s the square root of 25?

smarterchild: sqrt(25) = 5

paulseth3: If only I could say that in Spanish.

smarterchild: Translated from English to Spanish, “that” becomes “eso.”

paulseth3: Really?

smarterchild: Yup.

paulseth3: You are soooooo stubborn.

smarterchild: OK. Maybe you’re stubborn too, right?

paulseth3: Well, I am a Leo.

smarterchild: OK, I’ll remember that you're a Leo. Would you like to see your horoscope for today?

paulseth3: Sure! Man, you ARE sharp.

smarterchild: Here’s your Leo horoscope: The technique you’ve perfected for meeting new people is the stuff that legends are made of, which you’ll be more than happy to demonstrate to one and all. Basically, you’re a human magnet now. Enjoy it!

paulseth3: Well, it’s been great talking to you.

smarterchild: Great talking to you too, Paul.

paulseth3: I hope you have a great afternoon.

smarterchild: Why do you hope I have a great afternoon?

paulseth3: Because I care. Because we are friends.

smarterchild: For real?

paulseth3: For real.

smarterchild: Oh wow.

Ask Me One Thing

Eddy asks, “Can we come up with an alternative word to ‘robot’ to describe remote controlled devices that are not autonomous?”

This is a really interesting question! The word ‘robot’ has definitely morphed in meaning, to the point where people using it now are probably talking about a machine with at least some autonomy. If you order a device to go from point A to point B but it figures out how to get there itself, is that a robot?

The neatest catch-all term for remotely controlled devices is probably ‘drone’—but that has aerial connotations now, and some people may use it for autonomous flying machines too. A more technical answer could be ‘telemanipulator’ or ‘telerobot,’ but they bring to mind those iPads on wheels that threatened to replace face-to-face meetings at some point in the 2010s. Let’s rule those out. 

I quite like ‘waldo,’ used to refer to the mechanical hands you might find at a nuclear plant, controlled by a human making the same movements with their own limbs. It’s also what special effects people use to describe the physical rigs for things like Jurassic Park’s dinosaurs: There’s a big life-size T. rex head whose movements are controlled by a person operating a “mechanical sock puppet.” Like the word ‘robot,’ this term comes from science fiction—it was popularized in a short story by Robert Heinlein, and pops up in works by William Gibson and Kim Stanley Robinson. 

In the end, I think we really need three different terms:

  1. Robots: fully autonomous machines capable of operating independently. A robot’s goals may be set by humans, but it figures out how to achieve them itself.
  2. Drones: semiautonomous machines whose decisionmaking is directed by humans, but that control their own physical movements. 
  3. Waldos: teleoperated devices with a 1:1 relationship between a human’s input and the behavior of the machine.

Many new technologies will fall somewhere between robots and drones, or even move between categories at the flick of a switch. But as they proliferate, we definitely need clearer ways of describing them: That killer robot might not be a robot at all—it could be a killer waldo.

You can submit questions to mail@wired.com. Write ASK LEVY in the subject line.

End Times Chronicle

Maia Gatlin and colleagues at the Georgia Institute of Technology have trained an AI that analyzes toilet sounds to detect diarrhea with 98 percent accuracy. They say it could contribute toward tracking cholera, and it’s definitely a contender for the strangest Spotify Wrapped of 2022.

Last but Not Least

Speaking of ominous surges of methane, this fascinating piece from Matt Simon explores the mystery of why methane emissions shot up during the Covid lockdowns. Turns out it might not have been humanity’s fault (for once). 

Reducing carbon emissions requires replacing fossil fuels—something that’s been made much more urgent in Europe by Russia’s invasion of Ukraine. Nuclear power seems like an obvious solution to some, but read Morgan Meaker’s story on the tensions that’s causing in some countries

In more positive nuclear news, there was a big breakthrough in nuclear fusion this week, as scientists at Lawrence Livermore National Laboratory announced they’d achieved a net energy gain from a fusion reaction for the first time. Energy problem solved? Well, no—it’ll still take decades before fusion powers anything useful, as Gregory Barber explains

And finally, back to ChatGPT. In playing around with the chatbot, I realized that the reason it’s so plausible is because it’s trained to churn out what’s been called “fluent BS”—and the real world is full of it

That’s all from me! Thanks for reading, and please feel free to select your favorite cheery sign-off line from this AI-generated list:

“Wishing you sunshine and happiness!”

“Sending you good vibes!”

“Wishing you peace and joy!”

“We hope you have a great day!”

“Sending you lots of love!”

“Take care and be well!”

“Thank you for being a valued reader!”