© 2023 Bloomberg Industry Group, Inc. All Rights Reserved.
Artificial intelligence (AI) language model. ChatGPT can generate text output that feels human-generated, raising important questions about the implications under copyright law. Is text created by ChatGPT considered an “original work of authorship,” and if so, who owns the copyright? Additionally, ChatGPT is trained on a vast amount of copyrighted text, raising questions about the legality of using such copyrighted material.
© 2023 Bloomberg Industry Group, Inc. All Rights Reserved.
Technical Overview of ChatGPT
ChatGPT is generative AI. Generative AI is any artificial intelligence tool that generates something new from existing data when prompts are given, like an image or text. In the past few months, generative AI has dominated mainstream consciousness. From AI generated selfies flooding TikTok, DALL-E created images on Instagram and the skyrocketing popularity of ChatGPT, it seems that generative AI is everywhere.
ChatGPT was developed by OpenAI, an organization co-founded by Sam Altman and Elon Musk. The initial version of what would become ChatGPT was released in 2018 and a second version, GPT-2, in 2019. The latest version was initially made available to the public on Nov. 30, 2022, through OpenAI's website. According to a tweet by OpenAI CEO Sam Altman, ChatGPT had over 1 million users less than a week after its launch. By January 2023 ChatGPT had over 100 million users.
ChatGPT differs from search engines like Google. Traditional search engines work by crawling the web, creating an index of the content on websites, and using this index to return relevant results to users based on their search queries. The returns must then be curated by those users, which takes time and often leads to imperfect results. ChatGPT, on the other hand, analyzes the prompt and immediately generates text which gives the answer to a question or creates a work of authorship.
The “GPT” in ChatGPT stands for generative pre-trained transformer. ChatGPT is a transformer-based language model that uses a neural network AI system to generate human-like text. The model is trained on a large dataset of text and uses this data to learn patterns and relationships between words and phrases.
When given a prompt, the model actively generates a response by predicting the next word in the sequence, based on the context of the input and the patterns it learned during training. This is like the predictive text engines that complete words for users when typing on their smartphones. In addition, ChatGPT can remember previous prompts and output, allowing a user to iterate on, change and improve the output.
ChatGPT can be used to generate any type of written text such as essays, poems, articles, stories, summaries of long text, blog posts, advertising copy and even write computer code. In fact, some portions of this article and the article's title were initially created by using ChatGPT.
Copyright Law Implications – The Ins and Outs
Given the hype around ChatGPT and the speculation that it could be widely used, it is important to understand the legal implications of the technology. First, do copyright owners of the text used to train ChatGPT have a copyright infringement claim against OpenAI? Second, can the output of ChatGPT be protected by copyright and, if so, who owns that copyright?
To answer these questions, we need to understand the application of US copyright law.
Copyright Law Basics
Based on rights in Article I, Section 8 of the Constitution, Congress passed the first copyright law in 1790. It has been amended several times. Today, US copyright law is governed by the Copyright Act of 1976. This law grants authors of original works exclusive rights to reproduce, distribute, and display their work. Copyright protection applies from the moment of creation, and, for most works, the copyright term is the life of the author plus 70 years after the author's death. Under copyright law, the copyright holder has the exclusive right to make copies of the work, distribute it, display it publicly, and create derivative works based on it. Others who want to use the work must obtain permission from the copyright holder or use one of the exceptions to copyright law, such as fair use.
The purpose of copyright law is to incentivize authors to create novel and creative works. It does this by granting authors exclusive rights to control the use of their work, thus allowing them to financially benefit from their works. Copyright law also encourages the dissemination of knowledge by allowing others to use copyrighted works under certain conditions, such as through the fair use doctrine, which allows for limited use of copyrighted material for the purposes of criticism, commentary, news reporting, teaching, scholarship, or research. By protecting the rights of authors and creators while also allowing for the use of copyrighted works for the public benefit, copyright law aims to strike a balance between the interests of authors and the public.
Inputs – Training ChatGPT with Copyrighted Material
ChatGPT was trained on a large training dataset sourced from the internet, including a vast amount of text from websites, articles, books, social media posts, and academic papers. Importantly, ChatGPT was not aware of the source of the data, it only saw the text and learned the patterns and relationships between the words, phrases and sentences. The vast majority of the text used to train ChatGPT was certainly subject to copyright protection—except for text that is in the public domain like facts or discoveries or works whose term of copyright protection has ended.
Did the ChatGPT training process violate the copyright of the original authors? To answer that question we must determine: if during the training process OpenAI copied the underlying works—because copyright infringement requires an act of copying works that are protected by copyright; is the output of ChatGPT a “derivative work” under copyright law; and would such copying or production of a derivate work fall under the “fair use” exception of copyright law.
Copying During Training
We are not aware of publicly available information regarding whether, during the training process, the training text for ChatGPT was copied or simply referenced. Therefore, we asked ChatGPT and received this answer:
Prompt: During the ChatGPT training process, was the underlying training text copied into a OpenAI database?
Answer: During the training process, text from various sources is used to train the ChatGPT model. It is likely that the text was copied into an internal database at OpenAI for the purpose of training the model. However, the training text is typically preprocessed to remove any sensitive information such as personal identifiers, copyrighted text, etc. In addition, the text is also used to train the model and not stored in a database. The model uses the text to learn patterns and structure of the language and once the training is done, it doesn't require the text anymore.
So, it seems that during the training process, text was copied. It is also interesting that ChatGPT stated that the training text was “preprocessed” to remove any copyrighted text. That seems highly unlikely since otherwise nearly all text would have been removed.
Is ChatGPT Output a Derivative Work?
Under US copyright law, the owner of a copyright has the exclusive right “to prepare derivative works based upon the copyrighted work.” A “derivative work” is “a work based upon one or more preexisting works.” ChatGPT is trained on preexisting works and generates output based on that training.
As Daniel Gervais, a professor at Vanderbilt Law School who specializes in intellectual property law, says, the definition of a derivative work under copyright law “could loosely be used as a definition of machine learning when applied to the creation of literary and artistic productions because AI machines can produce literary and artistic content (output) that is almost necessarily ‘based upon’ a dataset consisting of preexisting works.” Under this view, it seems that all ChatGPT output is a derivative work under copyright law.
On a related point, it is worth noting that in producing its output, ChatGPT is not “copying” anything. ChatGPT generates text based on the context of the input and the words and phrase patterns it was trained on. ChatGPT is not “copying” and then changing text.
What About Fair Use?
Let's assume that the underlying text was copied in some way during the ChatGPT training process. Let's further assume that outputs from Chatto are, at least sometimes, derivative works under copyright law. If that is the case, do copyright owners of the original works have a copyright infringement claim against OpenAI? Not if the copying and the output generation are covered by the doctrine of “fair use.” If a use qualifies as fair use, then actions that would otherwise be prohibited would not be deemed an infringement of copyright.
In determining whether the use made of a work in any particular case is a fair use, the factors include:
• The purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes.
• The nature of the copyrighted work.
• The amount and substantiality of the portion used in relation to the copyrighted work as a whole.
• The effect of the use upon the potential market for or value of the copyrighted work.
In this case, assuming OpenAI copied copyrighted text as part of the ChatGPT training process, such copying was not for a commercial purpose and had no economic impact on the copyright owner. Daniel Gervais says “it is much more likely than not” that training systems on copyrighted data will be covered by fair use.
In determining if a commercial use will be considered “fair use,” the courts will primarily look at the scope and purpose of the use and the economic impact of such use. Does the use in question change the nature of the underlying copyright material in some material way (described as a “transformative” use) and does it economically impact the original copyright holder?
Without a specific example, it is difficult to determine exactly if a resulting output from ChatGPT would be fair use. The fact that ChatGPT does not copy and has been trained on millions of underlying works, it seems likely most output would be fair use—without using significant portions of any one protected work. In addition, because of the vast corpus of text used to train ChatGPT, it seems unlikely that ChatGPT output will have a negative economic impact on any one copyright holder. But, given the capabilities of ChatGPT, that might not always be the case.
Imagine if you asked ChatGPT to “Write a long-form, coming of age, story in the style of J.K. Rowling, using the characters from Harry Potter and the Chamber of Secrets.” In that case, it would seem that the argument for fair use would be weak. This story could be sold to the public and could conceivably have a negative economic impact on J.K. Rowling. A person that wants to read a story about Harry Potter might buy this story instead of buying a book by J. K. Rowling.
Finally, it is worth noting that OpenAI is a non-profit entity that is a “AI research and deployment company.” It seems that OpenAI is the type of research company, and ChatGPT is the type of research project, that would have a strong argument for fair use. This practice has been criticized as “AI Data Laundering,” shielding commercial entities from liability by using a non-profit research institution to create the data set and train AI engines that might later be used in commercial applications.
Outputs – Can the Output of ChatGPT be Protected by Copyright
Is the output of ChatGPT protected by copyright law and, if so, who is the owner? As an initial matter, does the ChatGPT textual output fit within the definition of what is covered under copyright law: “original works of authorship fixed in any tangible medium of expression.”
The text generated by ChatGPT is the type of subject matter that, if created by a human, would be covered by copyright. However, most scholars have opined, and the US Copyright Office has ruled that the output of generative AI systems, like ChatGPT, are not protectable under US copyright law because the work must be an original, creative work of a human author.
In 2022, the US Copyright Office, ruling on whether a picture generated completely autonomously by AI could be registered as a valid copyright, stated “[b]because copyright law as codified in the 1976 Act requires human authorship, the [AI Generated] Work cannot be registered.” The U.S. Copyright Office has issued several similar statements, informing creators that it will not register copyright for works produced by a machine or computer program. The human authorship requirement of the US Copyright Office is set forth as follows:
The Human Authorship Requirement - The U.S. Copyright Office will register an original work of authorship, provided that the work was created by a human being. The copyright law only protects “the fruits of intellectual labor” that “are founded in the creative powers of the mind.” Trade-Mark Cases, 100 U.S. 82, 94 (1879).
While such policies are not binding on the courts, the stance by the US Copyright Office seems to be in line with the purpose of copyright law flowing from the Constitution: to incentivize humans to produce creative works by giving them a monopoly over their creations for a limited period of time. Machines, of course, need and have no such motivation.
In fact, copyright law expressly allows a corporation or other legal entity to be the owner of a copyright under the “work made for hire” doctrine. However, to qualify as a work made for hire, the work must be either work prepared by an employee within the scope of his or her employment, or be prepared by a party who “expressly agrees in a written instrument signed by them that the work shall be considered a work made for hire.” Only humans can be employees and only humans or corporations can enter a legally binding contract—machines cannot.
If we consider it settled law that works created completely by AI are not protected by copyright, then what about a human that uses ChatGPT to create a story? Clearly humans can use machines to create works that are protected by copyright. For example, we use laptops to write stories, tablets to create copyright protected drawings, and electronic synthesizers to create copyright protected digital music.
Can ChatGPT be another electronic tool that humans use to create copyright protected works? The answer turns on the level of human involvement. If a person writes a brief prompt instructing ChatGPT to “write a story about a boy walking on a beach,” it seems unlikely that the output text was generated from enough human creativity to qualify for copyright protection. However, if a person added themes and directed ChatGPT to tailor a story—e.g., asking ChatGPT to change the color of the sky, develop additional characters, change the plot, etc.— then it seems that there would be sufficient human involvement such that the output could be entitled to copyright protection. There is no clear answer as to how much human involvement is enough to qualify a work as protectable under copyright law. But increasing the level of human involvement will increase the chance that the output is subject to copyright protection.
Because most uses of ChatGPT appear to be designed around minimal human input, it seems that immediately upon creation, most output of ChatGPT is not protectable and would be part of the public domain where no copyright owner is capable of asserting rights.
Interestingly, the OpenAI terms and conditions make clear that, if there is a copyright in the output, as between OpenAI and the user, the output is owned by the user. Section 3(a) of the OpenAI terms state:
Your Content. Input (what is provided to the Service) and Output (what is returned by the Service based on Input) are collectively “Content.” As between the parties and to the extent permitted by applicable law, you own all Input, and subject to your compliance with these Terms, OpenAI hereby assigns to you all its right, title and interest in and to Output. OpenAI may use Content as necessary to provide and maintain the Services, comply with applicable law, and enforce our policies. You are responsible for Content, including for ensuring that it does not violate any applicable law or these Terms.
Based on the issues raised in this article, the risk shifting in the final sentence of the OpenAI terms is of interest. It is possible that ChatGPT may have used copyright material without a license to create the Output. But the burden of complying with applicable law for both the user-provided Input and the ChatGPT-provided Output has been shifted to the user of ChatGPT.
Generative AI tools like ChatGPT are challenging how we think about fundamental issues of authorship and copyright protection. Even with the current iteration of ChatGPT, it is unclear if certain outputs might be a derivative work of underlying copyright material, thus risking a potential infringement claim if the output is subject to copyright protection.
As these tools continue to develop, their use cases will increase and the line between man and machine creations will continue to blur. Therefore, it is likely that we will continue to have fundamental questions about the scope of copyright on these creations.
Artificial intelligence (AI) in higher education, including background, integration in learning, ethics, and resources. AI as a technology and tools in education is quickly evolving, please check back for updates and developments. Along with AI technology, this guide is under construction and a work in progress.
Chatbot is a computer program that simulates and processes human conversation (either written or spoken), allowing humans to interact with digital devices as if they were communicating with a real person. Chatbots can be as simple as rudimentary programs that answer a simple query with a single-line response, or as sophisticated as digital assistants that learn and evolve to deliver increasing levels of personalization as they gather and process information."
New ways to manage your data in ChatGPT (April 26, 2023)
Learn more about ChatGPT-4 from OpenAI
Watch an interview with OpenAI CEO Sam Altman (ABC News) March 16, 2023
Five things to know about ChatGPT 4 (from The Hill)
OpenAI (the company that designed ChatGPT) collects quite a bit of data from ChatGPT users.
ChatGPT is not always trustworthy.
How should AI systems behave, and who should decide? (from OpenAI)
Model Behavior Guidelines from openAI on how ChatGPT should respond when prompted with things about US “culture wars.” The rules include not affiliating with political parties or judging one group as good or bad, for example. https://cdn.openai.com/snapshot-of-chatgpt-model-behavior-guidelines.pdf
ChatGPT responses are not always true and factual.
Users of ChatGPT provide free labor to OpenAI.
Potential Benefits of Using Generative AI
Automating repetitive daily work: Multiple companies are exploring how the integration of generative AI tools into office programs and applications can improve workflows and communication.
Multilingual assistance: Generative AI tools may help students and professionals who are required to write in English but do not feel comfortable in this language.
Improving instruction: Instructors have been utilizing natural language processing tools, like ChatGPT, to brainstorm new assignments, improve assessments, develop differentiated instruction, and engage students.
Over the last year, several generative AI toolsets have become accessible to the general public through a variety of means. While many of the most popular toolsets are not open-source, several have been made available through the means of various waitlists and credit systems. Below, you will find the most popular generative toolsets that range from search assistants to text generators, to even digital artwork generators. If you're interested in completing research with the assistance of an AI toolset, please review our Research Tools page.
According to the 2023 UNESCO's "Chat GPT and Artificial Intelligence in Higher Education Quick Start Guide", the main challenges and implications of ChatGPT in higher education are:
Citing AI Generated Writing
If you choose to use ChatGPT or some other AI technology for writing, be sure you are transparent about your use of it with your teachers and publishers and working within their policies and parameters. Each citation style has different recommendations for citing generative AI tools like ChatGPT.
Check with the individual style guides for more detailed instructions on in text citation and variations.
Citing AI in Your Research
Before including generative AI tools in a project you intend to get published, make sure your target journal and publisher allow the integration of AI generated text and images with manuscript submissions. Below are a few example publisher statements or policy on the use of AI in article submissions.
Ethics and AI
August 9, 2023, AP Fact Check article by Melissa Goldin.
August 8, 2023, London School of Economics and Political Science blog post by Leonard Bauersfeld, Angel Romero, Manasi Muglikar, and Davide Scaramuzza.
July 31, 2023, The Chronicle of Higher Education advice post by Kevin Gannon.
2023 guide to AI in Education from the University of New Mexico Libraries. Includes guidelines for privacy, free labor, AI assignments, and more.
2023 guide to AI in the Classroom by Kristen Palmer, Butler University Libraries and Center for Academic Technology. Includes info about chatbot assignments, detecting chatbots, and more.
January 6, 2023, guide for educators by Andrew Herfte, Centre for Education Statistics and Evaluation, New South Wales, Australia.
July 10, 2023, guide to using AI at UC Berkeley. Includes teaching recommendations, suggested writing prompts and activities for your students, and opportunities and risks of using ChatGPT for teaching and Learning.
Policies and guidelines for AI and teaching at Duke University.
July 7, 2023, MIT Technology Review article by Rhiannon Williams.