Copyright and Royalties in the World of Generative AI by MyThoo Aug, 2023 Medium
The US Copyright Office noted that it has already ruled on a some AI-content issues in the last year. In another, a book by a human author that included AI generated imagery, Yakov Livshits the Office only granted copyright for the text by the author. As can be seen above, generative AI is capable of producing content that is in the style of a particular artist.
If courts find that OpenAI illegally used Times articles to train its models, OpenAI could be forced to destroy its LLM dataset and rebuild it from scratch. Legally, these AI systems — including image generators, AI music generators and chatbots like ChatGPT and LaMDA — cannot be considered the author of the material they produce. Their outputs are simply a culmination of human-made work, much of which has been scraped from the internet and is copyright protected in one way or another. “So you use the training data to train this model and then in the final step, you have a trained model and then you can use it to create new outputs. Now, even the first step, even just taking the data and training an AI model can raise copyright issues because you’re now transforming this art into something new,” he said.
Primer on the EU AI Act: An Emerging Global Standard for Artificial Intelligence
And in enterprise applications that leverage premium licensed content – many market research and competitive intelligence systems, for example — copyright issues are not relevant because the content license controls what is and is not permitted and the content license would supersede any general question of copyright compliance. However, a work containing AI-generated material may still be eligible for copyright protection if it also contains sufficient human authorship. Examples include a human selecting or arranging AI-generated content in a creative way or an artist modifying AI-generated material to the extent that the modifications meet the standard for copyright protection.
It takes a user’s instructions, or “input”, and creates new text, visual or audio materials, or “outputs”, by drawing from preexisting works used for their training. The concerns around generative AI and copyright protection revolve mainly around determining authorship in AI-generated works, the level of human involvement required for copyright protection, and the potential infringement of copyrights by AI outputs. These questions address the areas of authorship, fair use, and infringement in the evolving field of generative AI content, requiring careful evaluation within copyright law and policy. The DC District’s decision in Thaler v. Perlmutter highlights the complexity of copyright authorship for AI-generated works.
To request reprint permission for any of our publications, please use our “Contact Us” form, which can be found on our website at The mailing of this publication is not intended to create, and receipt of it does not constitute, an attorney-client relationship. The views set forth herein Yakov Livshits are the personal views of the authors and do not necessarily reflect those of the Firm. At the opposite extreme, fair use maximalism posits that the fair use defense should cover any (or almost any) Output Work because each is unique and created by a sufficiently transformative process.
Since copyright laws mainly aim to encourage art and its creation by protecting artists and their unique creative ideas, lawmakers might need to consider joint ownerships. In US copyright law, there exists the notion of ‘fair use’ which basically allows creative work based on a copyrightable artwork – but it should be transformative enough that it’s somewhat different from the original. This type of altered work is considered separate from the original work of art and is not subject to copyright infringement. Generative AI’s impact on ownership and copyright in AI-generated content is a subject of debate and legal challenges due to uncertainties surrounding human involvement and training data. Thaler challenged the decision in federal court, arguing that human authorship is not a concrete legal requirement and allowing AI copyrights would be in line with copyright’s purpose as outlined in the U.S. constitution to “promote the progress of science and useful arts.” Thaler applied in 2018 for a copyright covering “A Recent Entrance to Paradise,” a piece of visual art he said was created by his AI system without any human input.
Founder of the DevEducation project
A prolific businessman and investor, and the founder of several large companies in Israel, the USA and the UAE, Yakov’s corporation comprises over 2,000 employees all over the world. He graduated from the University of Oxford in the UK and Technion in Israel, before moving on to study complex systems science at NECSI in the USA. Yakov has a Masters in Software Development.
Copyright Office, ruling that images created by AI are not eligible for copyright protection—at least not as works attributed to the AI itself. Rather than being programmed like traditional computer software, generative AI models are trained using an enormous quantity of data. Large language models (LLMs) like OpenAI’s GPT-4 and Google’s LaMDA, for example, were trained from huge volumes of text culled from the internet.
At Authors Alliance, we care deeply about access to knowledge because it supports free inquiry and learning, and we are enthusiastic about ways that generative AI can meaningfully further those ideals. In addition to all the mundane but important efficiency gains generative AI can assist with, we’ve already seen authors incorporate generative AI into their creative processes to produce new works. There are some clear concerns about how generative AI tools, for example, can make it easier to engage in fraud and deception, as well as perpetuating disinformation. There have been many calls for legal regulation of generative AI technologies in recent months, and we wanted to share our views on the copyright questions generative AI poses, recognizing that this is a still-evolving set of questions. Authors Alliance readers will surely have noticed that we have been writing a lot about generative AI and copyright lately. Since the Copyright Office issued its decision letter on copyright registration in a graphic novel that included AI-generated images a few months back, many in the copyright community and beyond have struggled with the open questions around generative AI and copyright.
But a key difference is this new set of tools relies explicitly on training data, and therefore creative contributions cannot easily be traced back to a single artist. If the resemblance is based only on general style or content, it is unlikely to violate copyright, because style is not copyrightable. “Even though copyright law currently doesn’t protect styles, if you create artwork that’s very similar to someone, maybe they kind of have a joint ownership,” he added. However, part of the AI Act that members of the European Parliament have recently voted on addresses the need for companies that deploy AI-generated content to disclose any copyrighted material that was used to develop their systems. On Friday author and comedian Sarah Silverman, along with two other authors, started legal action against OpenAI and Meta for alleged copyright infringement.
One obvious step forward is for AI researchers to simply create databases where there is no possibility of copyright infringement — either because the material has been properly licensed or because it’s been created for the specific purpose of AI training. One such example is “The Stack” — a dataset for training AI designed to specifically avoid accusations of copyright infringement. It includes only code with the most permissive possible open-source licensing and offers developers an easy way to remove their data on request. Ryan Khurana, chief of staff at generative AI company Wombo, says most companies selling these services are aware of these differences.
There is some nuance in this, of course, as the specificity of prompts varies substantially. Therefore, along the spectrum of fair use minimalism, some might believe that only with greater specificity in prompts should less leeway be given as to whether Output Works are infringing derivative works. Similarly, image diffusion GAIs analyze the correlation of pixel arrangements of images that are responses to search engine queries with the text queries that produced the results. After analyzing billions of data points, the image diffusion GAI can generate novel images from a user’s text prompt based on the calculated correlation with image outputs that one would expect to appear from a relevant Google search.
- Consider that search engines have been providing search results based on copyrighted text for decades, including short summaries of the presented documents.
- The current generation of flashy AI applications, ranging from GitHub Copilot to Stable Diffusion, raise fundamental issues with copyright law.
- But the views that using copyrighted training data without some sort of recognition of the original creator is unfair, which many hold, may support arguments for other regulatory or technical approaches that would encourage attribution and pathways for distributing new revenue streams to creators.
- In general, copyright law gives the exclusive right to copy works, among other exclusive rights, to the applicable copyright holder.
- The AI platforms recover patterns and relationships, which they then use to create rules, and then make judgments and predictions, when responding to a prompt.
All this uncertainty presents a slew of challenges for companies that use generative AI. There are risks regarding infringement — direct or unintentional — in contracts that are silent on generative AI usage by their vendors and customers. If a business user is aware that training data might include unlicensed works or that an AI can generate unauthorized derivative works not covered by fair use, a business could be on the hook for willful infringement, which can include damages up to $150,000 for each instance of knowing use.