Skip to Main Content
John B. Coleman Library Ask A Librarian

Scholarly Communication & Institutional Repository: Copyright Chaos: Legal Implications of Generative AI

Scholarly communication(SC) is the process by which academics, scholars, and researchers share and publish their research findings so that they are available to the wider academic community and beyond.

ChatGPT

© 2023 Bloomberg Industry Group, Inc. All Rights Reserved.

Contributed by Shawn Helms and Jason Krieser, McDermott Will & Emery

March 2023

Artificial intelligence (AI) language model. ChatGPT can generate text output that feels human-generated, raising important questions about the implications under copyright law. Is text created by ChatGPT considered an “original work of authorship,” and if so, who owns the copyright? Additionally, ChatGPT is trained on a vast amount of copyrighted text, raising questions about the legality of using such copyrighted material.

Technical Overview of ChatGPT

© 2023 Bloomberg Industry Group, Inc. All Rights Reserved.

Contributed by Shawn Helms and Jason Krieser, McDermott Will & Emery

March 2023

Technical Overview of ChatGPT

ChatGPT is generative AI. Generative AI is any artificial intelligence tool that generates something new from existing data when prompts are given, like an image or text. In the past few months, generative AI has dominated mainstream consciousness. From AI generated selfies flooding TikTok, DALL-E created images on Instagram and the skyrocketing popularity of ChatGPT, it seems that generative AI is everywhere.

ChatGPT was developed by OpenAI, an organization co-founded by Sam Altman and Elon Musk. The initial version of what would become ChatGPT was released in 2018 and a second version, GPT-2, in 2019. The latest version was initially made available to the public on Nov. 30, 2022, through OpenAI's website. According to a tweet by OpenAI CEO Sam Altman, ChatGPT had over 1 million users less than a week after its launch. By January 2023 ChatGPT had over 100 million users.

ChatGPT differs from search engines like Google. Traditional search engines work by crawling the web, creating an index of the content on websites, and using this index to return relevant results to users based on their search queries. The returns must then be curated by those users, which takes time and often leads to imperfect results. ChatGPT, on the other hand, analyzes the prompt and immediately generates text which gives the answer to a question or creates a work of authorship.

The “GPT” in ChatGPT stands for generative pre-trained transformer. ChatGPT is a transformer-based language model that uses a neural network AI system to generate human-like text. The model is trained on a large dataset of text and uses this data to learn patterns and relationships between words and phrases.

When given a prompt, the model actively generates a response by predicting the next word in the sequence, based on the context of the input and the patterns it learned during training. This is like the predictive text engines that complete words for users when typing on their smartphones. In addition, ChatGPT can remember previous prompts and output, allowing a user to iterate on, change and improve the output.

ChatGPT can be used to generate any type of written text such as essays, poems, articles, stories, summaries of long text, blog posts, advertising copy and even write computer code. In fact, some portions of this article and the article's title were initially created by using ChatGPT.

Copyright Law Implications – The Ins and Outs

Given the hype around ChatGPT and the speculation that it could be widely used, it is important to understand the legal implications of the technology. First, do copyright owners of the text used to train ChatGPT have a copyright infringement claim against OpenAI? Second, can the output of ChatGPT be protected by copyright and, if so, who owns that copyright?

To answer these questions, we need to understand the application of US copyright law.

Copyright Law Basics

Based on rights in Article I, Section 8 of the Constitution, Congress passed the first copyright law in 1790. It has been amended several times. Today, US copyright law is governed by the Copyright Act of 1976. This law grants authors of original works exclusive rights to reproduce, distribute, and display their work. Copyright protection applies from the moment of creation, and, for most works, the copyright term is the life of the author plus 70 years after the author's death. Under copyright law, the copyright holder has the exclusive right to make copies of the work, distribute it, display it publicly, and create derivative works based on it. Others who want to use the work must obtain permission from the copyright holder or use one of the exceptions to copyright law, such as fair use.

The purpose of copyright law is to incentivize authors to create novel and creative works. It does this by granting authors exclusive rights to control the use of their work, thus allowing them to financially benefit from their works. Copyright law also encourages the dissemination of knowledge by allowing others to use copyrighted works under certain conditions, such as through the fair use doctrine, which allows for limited use of copyrighted material for the purposes of criticism, commentary, news reporting, teaching, scholarship, or research. By protecting the rights of authors and creators while also allowing for the use of copyrighted works for the public benefit, copyright law aims to strike a balance between the interests of authors and the public.

Inputs – Training ChatGPT with Copyrighted Material

ChatGPT was trained on a large training dataset sourced from the internet, including a vast amount of text from websites, articles, books, social media posts, and academic papers. Importantly, ChatGPT was not aware of the source of the data, it only saw the text and learned the patterns and relationships between the words, phrases and sentences. The vast majority of the text used to train ChatGPT was certainly subject to copyright protection—except for text that is in the public domain like facts or discoveries or works whose term of copyright protection has ended.

Did the ChatGPT training process violate the copyright of the original authors? To answer that question we must determine: if during the training process OpenAI copied the underlying works—because copyright infringement requires an act of copying works that are protected by copyright; is the output of ChatGPT a “derivative work” under copyright law; and would such copying or production of a derivate work fall under the “fair use” exception of copyright law.

Copying During Training

We are not aware of publicly available information regarding whether, during the training process, the training text for ChatGPT was copied or simply referenced. Therefore, we asked ChatGPT and received this answer:

Prompt: During the ChatGPT training process, was the underlying training text copied into a OpenAI database?

Answer: During the training process, text from various sources is used to train the ChatGPT model. It is likely that the text was copied into an internal database at OpenAI for the purpose of training the model. However, the training text is typically preprocessed to remove any sensitive information such as personal identifiers, copyrighted text, etc. In addition, the text is also used to train the model and not stored in a database. The model uses the text to learn patterns and structure of the language and once the training is done, it doesn't require the text anymore.

So, it seems that during the training process, text was copied. It is also interesting that ChatGPT stated that the training text was “preprocessed” to remove any copyrighted text. That seems highly unlikely since otherwise nearly all text would have been removed.

Is ChatGPT Output a Derivative Work?

Under US copyright law, the owner of a copyright has the exclusive right “to prepare derivative works based upon the copyrighted work.” A “derivative work” is “a work based upon one or more preexisting works.” ChatGPT is trained on preexisting works and generates output based on that training.

As Daniel Gervais, a professor at Vanderbilt Law School who specializes in intellectual property law, says, the definition of a derivative work under copyright law “could loosely be used as a definition of machine learning when applied to the creation of literary and artistic productions because AI machines can produce literary and artistic content (output) that is almost necessarily ‘based upon’ a dataset consisting of preexisting works.” Under this view, it seems that all ChatGPT output is a derivative work under copyright law.

On a related point, it is worth noting that in producing its output, ChatGPT is not “copying” anything. ChatGPT generates text based on the context of the input and the words and phrase patterns it was trained on. ChatGPT is not “copying” and then changing text.

What About Fair Use?

Let's assume that the underlying text was copied in some way during the ChatGPT training process. Let's further assume that outputs from Chatto are, at least sometimes, derivative works under copyright law. If that is the case, do copyright owners of the original works have a copyright infringement claim against OpenAI? Not if the copying and the output generation are covered by the doctrine of “fair use.” If a use qualifies as fair use, then actions that would otherwise be prohibited would not be deemed an infringement of copyright.

In determining whether the use made of a work in any particular case is a fair use, the factors include:

• The purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes.

• The nature of the copyrighted work.

• The amount and substantiality of the portion used in relation to the copyrighted work as a whole.

• The effect of the use upon the potential market for or value of the copyrighted work.

In this case, assuming OpenAI copied copyrighted text as part of the ChatGPT training process, such copying was not for a commercial purpose and had no economic impact on the copyright owner. Daniel Gervais says “it is much more likely than not” that training systems on copyrighted data will be covered by fair use.

In determining if a commercial use will be considered “fair use,” the courts will primarily look at the scope and purpose of the use and the economic impact of such use. Does the use in question change the nature of the underlying copyright material in some material way (described as a “transformative” use) and does it economically impact the original copyright holder?

Without a specific example, it is difficult to determine exactly if a resulting output from ChatGPT would be fair use. The fact that ChatGPT does not copy and has been trained on millions of underlying works, it seems likely most output would be fair use—without using significant portions of any one protected work. In addition, because of the vast corpus of text used to train ChatGPT, it seems unlikely that ChatGPT output will have a negative economic impact on any one copyright holder. But, given the capabilities of ChatGPT, that might not always be the case.

Imagine if you asked ChatGPT to “Write a long-form, coming of age, story in the style of J.K. Rowling, using the characters from Harry Potter and the Chamber of Secrets.” In that case, it would seem that the argument for fair use would be weak. This story could be sold to the public and could conceivably have a negative economic impact on J.K. Rowling. A person that wants to read a story about Harry Potter might buy this story instead of buying a book by J. K. Rowling.

Finally, it is worth noting that OpenAI is a non-profit entity that is a “AI research and deployment company.” It seems that OpenAI is the type of research company, and ChatGPT is the type of research project, that would have a strong argument for fair use. This practice has been criticized as “AI Data Laundering,” shielding commercial entities from liability by using a non-profit research institution to create the data set and train AI engines that might later be used in commercial applications.

Outputs – Can the Output of ChatGPT be Protected by Copyright

Is the output of ChatGPT protected by copyright law and, if so, who is the owner? As an initial matter, does the ChatGPT textual output fit within the definition of what is covered under copyright law: “original works of authorship fixed in any tangible medium of expression.”

The text generated by ChatGPT is the type of subject matter that, if created by a human, would be covered by copyright. However, most scholars have opined, and the US Copyright Office has ruled that the output of generative AI systems, like ChatGPT, are not protectable under US copyright law because the work must be an original, creative work of a human author.

In 2022, the US Copyright Office, ruling on whether a picture generated completely autonomously by AI could be registered as a valid copyright, stated “[b]because copyright law as codified in the 1976 Act requires human authorship, the [AI Generated] Work cannot be registered.” The U.S. Copyright Office has issued several similar statements, informing creators that it will not register copyright for works produced by a machine or computer program. The human authorship requirement of the US Copyright Office is set forth as follows:

The Human Authorship Requirement - The U.S. Copyright Office will register an original work of authorship, provided that the work was created by a human being. The copyright law only protects “the fruits of intellectual labor” that “are founded in the creative powers of the mind.” Trade-Mark Cases, 100 U.S. 82, 94 (1879).

While such policies are not binding on the courts, the stance by the US Copyright Office seems to be in line with the purpose of copyright law flowing from the Constitution: to incentivize humans to produce creative works by giving them a monopoly over their creations for a limited period of time. Machines, of course, need and have no such motivation.

In fact, copyright law expressly allows a corporation or other legal entity to be the owner of a copyright under the “work made for hire” doctrine. However, to qualify as a work made for hire, the work must be either work prepared by an employee within the scope of his or her employment, or be prepared by a party who “expressly agrees in a written instrument signed by them that the work shall be considered a work made for hire.” Only humans can be employees and only humans or corporations can enter a legally binding contract—machines cannot.

If we consider it settled law that works created completely by AI are not protected by copyright, then what about a human that uses ChatGPT to create a story? Clearly humans can use machines to create works that are protected by copyright. For example, we use laptops to write stories, tablets to create copyright protected drawings, and electronic synthesizers to create copyright protected digital music.

Can ChatGPT be another electronic tool that humans use to create copyright protected works? The answer turns on the level of human involvement. If a person writes a brief prompt instructing ChatGPT to “write a story about a boy walking on a beach,” it seems unlikely that the output text was generated from enough human creativity to qualify for copyright protection. However, if a person added themes and directed ChatGPT to tailor a story—e.g., asking ChatGPT to change the color of the sky, develop additional characters, change the plot, etc.— then it seems that there would be sufficient human involvement such that the output could be entitled to copyright protection. There is no clear answer as to how much human involvement is enough to qualify a work as protectable under copyright law. But increasing the level of human involvement will increase the chance that the output is subject to copyright protection.

Because most uses of ChatGPT appear to be designed around minimal human input, it seems that immediately upon creation, most output of ChatGPT is not protectable and would be part of the public domain where no copyright owner is capable of asserting rights.

Interestingly, the OpenAI terms and conditions make clear that, if there is a copyright in the output, as between OpenAI and the user, the output is owned by the user. Section 3(a) of the OpenAI terms state:

Your Content. Input (what is provided to the Service) and Output (what is returned by the Service based on Input) are collectively “Content.” As between the parties and to the extent permitted by applicable law, you own all Input, and subject to your compliance with these Terms, OpenAI hereby assigns to you all its right, title and interest in and to Output. OpenAI may use Content as necessary to provide and maintain the Services, comply with applicable law, and enforce our policies. You are responsible for Content, including for ensuring that it does not violate any applicable law or these Terms.

Based on the issues raised in this article, the risk shifting in the final sentence of the OpenAI terms is of interest. It is possible that ChatGPT may have used copyright material without a license to create the Output. But the burden of complying with applicable law for both the user-provided Input and the ChatGPT-provided Output has been shifted to the user of ChatGPT.

Conclusion

Generative AI tools like ChatGPT are challenging how we think about fundamental issues of authorship and copyright protection. Even with the current iteration of ChatGPT, it is unclear if certain outputs might be a derivative work of underlying copyright material, thus risking a potential infringement claim if the output is subject to copyright protection.

As these tools continue to develop, their use cases will increase and the line between man and machine creations will continue to blur. Therefore, it is likely that we will continue to have fundamental questions about the scope of copyright on these creations.

Introduction To Artificial Intelligence

 

Artificial intelligence (AI) in higher education, including background, integration in learning, ethics, and resources. AI as a technology and tools in education is quickly evolving, please check back for updates and developments. Along with AI technology, this guide is under construction and a work in progress.

What Are Chatbots?

Chatbot is a computer program that simulates and processes human conversation (either written or spoken), allowing humans to interact with digital devices as if they were communicating with a real person. Chatbots can be as simple as rudimentary programs that answer a simple query with a single-line response, or as sophisticated as digital assistants that learn and evolve to deliver increasing levels of personalization as they gather and process information."

Source: https://www.oracle.com/chatbots/what-is-a-chatbot/

 

What Do You Need To Know About Chatgpt?

 

New ways to manage your data in ChatGPT (April 26, 2023)

Learn more about ChatGPT-4 from OpenAI

Watch an interview with OpenAI CEO Sam Altman (ABC News) March 16, 2023

Five things to know about ChatGPT 4 (from The Hill)

ChatGPT Privacy Policies

OpenAI (the company that designed ChatGPT) collects quite a bit of data from ChatGPT users.

  • The privacy policy states that this data can be shared with third-party vendors, law enforcement, affiliates, and other users.
  • This tool should not be used by children under 13 (data collection from children under 13 violates the United States’ Children’s Online Privacy Protection Rule - COPPA).
  • The Terms of Use state that “you must be 18 years or older and able to form a binding contract with OpenAI to use the Services” (OpenAI, 2022, para. 2). .
  • While you can request to have your ChatGPT account deleted, the prompts and questions you input into ChatGPT cannot be deleted. If you were to ask ChatGPT about sensitive or controversial topics, this data cannot be removed.

TIP: Before using ChatGPT, please read over the privacy policy and terms of use and review how data is collected and shared as outlined in the policies. 

ChatGPT is not always trustworthy.

  • ChatGPT was trained using a massive dataset of text written by humans that was pulled from the Internet.
  • Responses can reflect the biases of the humans who wrote the text used in the training dataset.
  • ChatGPT is not connected to the Internet and as of January, 2023, the data used to train it was collected prior to 2021.
  • According to the OpenAI FAQs, ChatGPT “has limited knowledge of world and events after 2021 and may also occasionally produce harmful instructions or biased content”. 

How should AI systems behave, and who should decide? (from OpenAI)

Model Behavior Guidelines from openAI on how ChatGPT should respond when prompted with things about US “culture wars.” The rules include not affiliating with political parties or judging one group as good or bad, for example.   https://cdn.openai.com/snapshot-of-chatgpt-model-behavior-guidelines.pdf

ChatGPT responses are not always true and factual. 

  • To make up for knowledge gaps ChatGPT will provide a response to the best of its ability (often fabricated) rather than say “error” or “cannot compute.”

Users of ChatGPT provide free labor to OpenAI.

  • ChatGPT is in its infancy. It will continue to become a more intelligent form of artificial intelligence with the help of users who provide feedback to the responses it generates.

ChatGPT Tutorial

Potential Cons and Pros of Using Generative AI

Potential Benefits of Using Generative AI

  • Automating repetitive daily work:  Multiple companies are exploring how the integration of generative AI tools into office programs and applications can improve workflows and communication.

  • Multilingual assistance:  Generative AI tools may help students and professionals who are required to write in English but do not feel comfortable in this language.

  • Improving instruction:  Instructors have been utilizing natural language processing tools, like ChatGPT, to brainstorm new assignments, improve assessments, develop differentiated instruction, and engage students.

Websites and Other Resources

Popular Generative Toolsets

Over the last year, several generative AI toolsets have become accessible to the general public through a variety of means. While many of the most popular toolsets are not open-source, several have been made available through the means of various waitlists and credit systems. Below, you will find the most popular generative toolsets that range from search assistants to text generators, to even digital artwork generators. If you're interested in completing research with the assistance of an AI toolset, please review our Research Tools page.

Select Articles

LibGuides AI from universities

Challenges and Ethical Implications

According to the 2023 UNESCO's "Chat GPT and Artificial Intelligence in Higher Education Quick Start Guide", the main challenges and implications of ChatGPT in higher education are:

  • Academic integrity
    • ChatGPT raises academic integrity concerns in higher education due to potential plagiarism and cheating. Reliable ChatGPT detection tools have yet to be developed.
  • Lack of regulation ChatGPT
    • ChatGPT's unregulated development raises concerns. Over 1,000 academics and leaders call for a pause to investigate risks and develop shared protocols.
  • Privacy concerns
    • In April 2023, Italy became the first country to block ChatGPT over privacy concerns and ethical issues regarding data collection and age verification, setting a precedent for AI-related data practices.
  • Cognitive bias
    • ChatGPT lacks ethical principles and can't differentiate between truth and bias or truth and fiction ("hallucination"). Critical analysis and cross-referencing with other sources are crucial when using its results.
  • Gender and diversity
    • Gender and discrimination concerns extend beyond ChatGPT to all AI forms due to underrepresentation of females in AI-related fields and generative AI's capability to perpetuate biased content and stereotypes.
  • Accessibility
    • Two main accessibility concerns for ChatGPT are restricted availability due to government regulations and uneven internet access, raising issues of equity and regional disparities in AI education and development.
  • Commercialization
    • ChatGPT offers both free and subscription options. Careful regulation is necessary for AI tools run by profit-driven companies, which may lack openness and use data for commercial purposes in higher education settings.

AI Tools and Resources

Citing AI Generated Writing

If you choose to use ChatGPT or some other AI technology for writing, be sure you are transparent about your use of it with your teachers and publishers and working within their policies and parameters. Each citation style has different recommendations for citing generative AI tools like ChatGPT.

  • APA 7:  Open AI.  (Year). ChatGPT (month day version) [Large Language Model].  https://chat.openai.com/chat
  • MLA 9:  "Prompt text" prompt.  ChatGPT, day month. version, OpenAI, day month year, chat.openai.com/chat
  • Chicago:  ChatGPT, response to "Prompt text," OpenAI, month, day, year, https://chat.openai.com/chat

Check with the individual style guides for more detailed instructions on in text citation and variations. 

Citing AI in Your Research

Publisher policies and requirements

Before including generative AI tools in a project you intend to get published, make sure your target journal and publisher allow the integration of AI generated text and images with manuscript submissions.  Below are a few example publisher statements or policy on the use of AI in article submissions.

AI Exploration and Instruction

Ethics and AI

AI In the Classroom

Use Caution

Websites and Other Resources

John B. Coleman Library
Mailing Address: P.O. Box 519, MS 1040, Prairie View, Texas 77446
Physical Address: L.W. Minor St. / University Drive, Prairie View, Texas 77446
Reference: (936) 261-1535, Circulation: (936) 261-1542
Email: askalibrarian@pvamu.edu

Library Hours