How The ChatGPT Watermark Works And Why It Could Be Defeated

How The ChatGPT Watermark Works And Why It Could Be Defeated

A watermark will easily detect chatGPT-generated content. This is how it works and what it can be defeated.

  • ChatGPT-generated content will be easier to find thanks to a cryptographic watermark.
  • OpenAI scientist reveals how ChatGPT watermark can be overcome
  • Scott Aaron, a computer scientist, discusses AI Safety and Alignment at OpenAI

ChatGPT by OpenAI introduced an automatic way to create content. However, plans to add a watermarking function to make it easier to spot make some people nervous. This is how ChatGPT watermarking works and why it may be possible to overcome it.

ChatGPT is a powerful tool that both online publishers and affiliates love and hate simultaneously.

It’s a favorite tool of marketers because it allows them to create content briefs, outlines, and complex articles.

Online publishers fear that AI content will flood search results and replace expert articles written by human authors.

With anxiety and hope, the news of a watermarking technology that allows ChatGPT-authored content to be detected is also anticipated.

How The ChatGPT Watermark Works And Why It Could Be Defeated

Cryptographic Watermark

A watermark (also known as a logo or text) is a semitransparent mark that is embedded into an image. The watermark indicates who is the original author.

It is often seen in photos and videos.

ChatGPT watermarking text involves cryptography. This is done by embedding a pattern of letters, words, and punctuation into a secret code.

Scott Aaronson, ChatGPT Watermarking

OpenAI hired Scott Aaronson, an influential computer scientist, to work on AI Safety and Alignment in June 2022.

AI Safety is a field of research that studies how AI can cause harm to people and creates ways to avoid that type of disruption.

The Distill scientific journal features authors who are affiliated with OpenAI. Defines AI Safety similar to this:

“Long-term safety in artificial intelligence (AI) is the goal to ensure that advanced AI systems align with human values and do what people want.

AI Alignment refers to the field of artificial intelligence that ensures that AI aligns with the intended goals.

ChatGPT, a large language model (LLM), can be used in ways contrary to AI Alignment’s goals. This is to create AI that benefits mankind.

Watermarking, therefore, is used to protect the human body from AI misuse.

Aaronson explained why watermark ChatGPT output.

This could be advantage for preventing academic plagiarism but also for mass generation of propaganda …”

How does ChatGPT watermarking work?

ChatGPT watermarking allows you to embed a statistical pattern (a code) into your words or punctuation marks.

Artificial intelligence generates content with predictable word choices.

A statistical pattern governs the words that humans and AI write.

To make it easier for systems to determine if an AI text generator created the generated text, a method to “watermark” text is to change the word order.

AI content watermarking is not detectable because the distribution of words has a random appearance, similar to ordinary AI-generated texts.

This is known as the pseudorandom distribution of words.

Pseudorandomness refers to a statistically random sequence of numbers or words that is not random.

ChatGPT watermarking does not currently exist. OpenAI’s Scott Aaronson has stated that this is in the works.

ChatGPT is currently in previews. This allows OpenAI to detect “misalignment”, through real-world usage.

Watermarking could be added in ChatGPT’s final version or earlier.

Scott Aaronson wrote about how watermarking works.

“My main project has been to statistically watermark the outputs of text models like GPT.

GPT generates long text. We want to see a subtle signal that is not obvious in its words. This secret signal can later be used to prove that this was GPT.

Aaronson went on to explain how ChatGPT watermarking works. First, let’s understand what tokenization is.

Natural language processing uses tokenization to break words into smaller units, such as sentences.

Tokenization transforms text into a structured format that can be used for machine learning.

Text generation is the machine predicting which token will be next based on the previous permit.

A mathematical function, also known as a probability distribution, accomplishes this. It determines the probability that the next token will occur.

Although it is possible to predict the next word, it is not guaranteed.

Aaron calls the watermarking pseudorandom in that although there is a mathematical reason for a specific word or punctuation mark being there, it is statistically random.

This is the technical explanation for GPT watermarking.

“GPT is a string that contains tokens. These tokens can be words or punctuation marks. There are approximately 100,000 tokens.

GPT generates a probability distribution for the next token it generates, conditional upon the string of permits that have been developed before.

The neural net will generate the distribution. The OpenAI server will then sample a token based on that distribution.

However, if the temperature remains nonzero, there will be some randomness in choosing the next token. You could run the prompt repeatedly, getting a different completion (i.e., a string of output tokens each time).

Instead of watermarking the next token randomly, we will select it pseudorandomly using pseudorandom cryptographic functions, whose key is only known to OpenAI.”

Because the watermark mimics the randomness of the words, it looks natural to the person reading the text.

However, this randomness is subject to a bias that cannot be detected unless someone has the key.

This is the technical explanation for

“An example: In the case where GPT has a lot of possible tokens it judges equally probable, you can choose which token maximizes g. However, someone who does know the key could sum g over all the n-grams to see that it is anomalously large.”

Watermarking can be used to protect your privacy.

OpenAI could be used to detect and keep track of all outputs it produces, as I have seen in discussions on social media.

Scott Aaronson confirmed that OpenAI could do this, but it poses privacy issues. He didn’t give details on the possible exception for law enforcement situations.

How do I detect ChatGPT and GPT watermarking

Scott Aaronson, who noted that watermarking could be defeated, is interesting but needs to be more well-known.

He did not say that it was possible to defeat watermarking. However, he stated that it could be defeated.

“Now, all of this can be overcome with enough effort.

For example, if you use another AI to paraphrase GPT’s output, that’s okay. We won’t be able to detect it.”

The watermarking is possible to defeat, at least for the time being, since November, when the above statements were made.

It is yet to be discovered if watermarking is used now. However, it is possible that it will not be in use if it is.

© Intentify Media Group