Guarding Your Art: How to Opt Out of OpenAI DALL-E Training?
In the era of artificial intelligence, artists and creative content owners face new challenges in safeguarding their work. One of these challenges is the use of publicly available artworks to train AI models like OpenAI's DALL-E. Understanding how these models learn and how you can opt out your work from being included in their training datasets is crucial for maintaining control over your creative output. This guide provides clear steps on how to protect your art from unauthorized use in AI training.
Understanding AI Training: Learning from Public Data
AI models like DALL-E learn from a broad range of information, including publicly available images and text. These models "learn" concepts similarly to how humans do. For example, after seeing enough pictures of a cat, an AI can create a completely new image of a cat, even though it has never seen that specific cat before. This learning process allows AI to generate original content based on the concepts it has been trained on.
The Retention of Concepts: Beyond Direct Access
After AI models have learned from their training data, they no longer have direct access to that data. Instead, they retain only the concepts they have learned. When a user requests an image from the model, it generates output based on its understanding of these concepts, not by copying from an existing database. This distinction is important for artists to understand how their work might be used by AI.
Opting Out: Disallowing GPTBot Access
OpenAI provides mechanisms for artists and content owners to opt out of having their works used in AI training. One efficient way is to disallow GPTBot, OpenAI's web crawler, from accessing your site. This can be done by modifying your site's robots.txt file. By following specific instructions, you can ensure that GPTBot does not gather your publicly available images for training purposes.
Submitting a Request: Direct Removal from Training Data
Alternatively, artists can submit a request to have their images removed from future training datasets. By filling out a form provided by OpenAI, content owners can specify which images they want excluded. OpenAI will review these requests and may contact you for additional information to verify ownership. Once verified, the specified images will be removed from future datasets, helping protect your creative rights.
Licensing Considerations: Third-Party Agreements
It's important to note that OpenAI also obtains licenses to datasets which may include your images if you have allowed third parties to license your work. Submitting a request to remove your images from OpenAI's training data may not affect these licensed images. Artists should review their licensing agreements to understand how their work might be used by third parties and address any concerns accordingly.
Handling High Volumes: Efficiency in Opting Out
For artists with a high volume of images from specific URLs, adding GPTBot to your site’s robots.txt file might be more efficient than submitting individual requests. This approach helps manage large collections of work by preventing the web crawler from accessing your site altogether. By updating your robots.txt file, you can efficiently opt out multiple images from being used in AI training.
Source: https://share.hsforms.com/1_OuT5tfFSpic89PqN6r1CQ4sk30