Copyright Could Change AI as We Know It in 2024

Ryan Seow
Jan 8, 2024
5 min read

In the ever-evolving tech landscape, generative AI has taken centre stage, thanks to trailblazers like OpenAI, Meta Platforms, and Midjourney. This advancement isn't just a technological leap; it's akin to opening Pandora's box in the creative world; a new artist in town, one that's digital and can mimic the styles of countless others (as detailed in a couple of other articles we have covered in the past). Needless to say, it's sparked a wave of legal battles, with writers, artists, and other creators stepping forward to claim that these AI marvels have reached their heights by standing on the shoulders of their hard work.

The courts are now faced with a novel challenge. While judges have so far approached the infringement claims against AI-generated content with a degree of scepticism, they're yet to dive into the deeper, more complex waters. The crux of the matter lies in a potentially billion-dollar question: are AI companies inadvertently crossing legal boundaries by feeding their algorithms with a vast array of images, writings, and other internet-sourced data?

The Argument

The legal landscape in the world of generative AI is becoming increasingly complex, as evidenced by a range of lawsuits filed by copyright holders. These legal actions span from class action lawsuits brought by visual artists and music publishers to significant claims by major corporations such as the New York Times.

In early 2024, the generative AI sector is mired in a complex web of legal battles, illustrating the contentious intersection of technology and copyright law. Key players like OpenAI, Microsoft, and Midjourney face multiple lawsuits from media companies and content producers over allegations of using copyrighted material to train their AI models. Notably, The New York Times is suing OpenAI and Microsoft, accusing them of using its content to train AI models that rival traditional journalism. Similarly, Getty Images has taken legal action against Stability AI for unauthorized use of its images, while GitHub, Microsoft, and OpenAI grapple with a class-action suit concerning GitHub's Copilot tool. Furthermore, Stability AI, Midjourney, and DeviantArt are embroiled in lawsuits for allegedly training AI on artists' work. OpenAI faces additional legal challenges from authors Paul Tremblay and Mona Awad, as well as comedian Sarah Silverman. These are just the tip of the iceberg.

An independent analysis of Stability AI’s dataset found that Getty Images and other stock image sites constitute a large portion of its contents, and evidence of Getty Images’ presence can be seen in the AI software’s tendency to recreate the company’s watermark.

The lawsuits mark an escalation in the developing legal battle between AI firms and content creators for credit, profit, and the future direction of the creative industries. AI art tools like Stable Diffusion rely on human-created images for training data, which companies scrape from the web, often without their creators’ knowledge or consent. AI firms claim this practice is covered by laws like the US fair use doctrine, but many rights holders disagree and say it constitutes a copyright violation. Legal experts are divided on the issue but agree that such questions will have to be decided for certain in the courts.

The Counter Argument

Despite facing legal challenges, companies like Stability AI maintain that their practices do not pose any legal or ethical issues. However, in a move that seems to acknowledge the concerns of content creators, Stability AI has announced plans to allow artists to opt out of having their work included in future versions of their software. This step indicates a shift towards more creator-friendly practices amidst the ongoing controversy, and we could see more companies doing this in the future, such as OpenAI for its AI image generator, DALL-E 3.

In the legal arena, judges have so far shown scepticism towards claims of copyright infringement based on AI-generated content. Yet, they have not fully delved into the more complex and potentially high-stakes issue of whether AI companies are committing large-scale infringement by using extensive collections of images, writings, and other data from the internet for training their AI systems. Tech companies, including giants like Meta, are vigorously defending their AI training methods. They draw parallels between AI learning processes and human learning, arguing that their use of existing material falls under the 'fair use' doctrine of copyright law.

The Legislative Response

The legislative response to these complex issues comes in the form of the AI Foundation Model Transparency Act. Proposed by Reps. Anna Eshoo (D-CA) and Don Beyer (D-VA), this bill aims to bring clarity and accountability to the process of training AI models. It mandates that creators of foundational AI models disclose their sources of training data, ensuring that copyright holders are aware of the use of their information. This requirement is part of a broader set of guidelines that companies would need to follow, including detailing the data retention during inference processes, explaining the limitations and risks of their models, and aligning with federal standards such as NIST’s AI Risk Management Framework. These measures are intended not only to protect intellectual property but also to mitigate risks in various sensitive areas like healthcare, cybersecurity, and public services.

This legislative move is a direct reaction to the mounting legal disputes and calls for transparency in AI development. By specifically referencing cases like the artists’ lawsuit against Stability AI, the bill highlights the growing urgency to address these copyright concerns. If passed, it could significantly alter the landscape of AI development, prompting an increased focus on ethical sourcing and transparency. This, in turn, may lead to a shift in how AI companies operate, potentially sparking a race for partnerships with data providers who can offer large, legally-sound datasets.

The Potential Arms Race

The emerging narrative in the realm of generative AI is heavily centered on the transparency of data sources, a concept that seems to resonate widely. Should the AI Foundation Model Transparency Act become law, it could cause a significant shift in the industry dynamics. We might witness what could be described as an "arms race" for strategic alliances. Generative AI companies may find themselves in a competitive scramble to form partnerships with entities holding vast, compliant data troves. These data-rich companies, in turn, stand to monetize their assets significantly. Such a scenario could tilt the balance of power, at least temporarily, towards those who possess extensive datasets.

However, this change could also present challenges. The requirement for newer AI models to adhere strictly to these transparency laws might result in less robust AI capabilities. Without access to the same breadth of data as established players, newer entrants in the generative AI market might struggle to forge advantageous partnerships; their lack of bargaining power could potentially lead to a monopolistic scenario in the industry. Even for the dominant players, the restrictions imposed on training datasets might still prove to be a significant impediment.

In the race towards AI dominance, it seems even our smartest machines have to take a detour through the bureaucracy.

4 Comments

Rachel Wong Mei Yin

Jan 08, 2024

I think navigating the delicate equilibrium between legislative frameworks and the unhindered progression of AI, a formidable challenge emerges. Undeniably, the advent of AI has ushered in unparalleled efficiencies, liberating us from mundane tasks to devote our energies to matters of greater import and immediacy. Yet, this transformative journey is not without peril, particularly as it pertains to the welfare of our artists and content creators. The very essence of creativity, the soulful emanations from minds scattered across the globe, finds expression in the artworks AI now generates. The profound dilemma arises when this profound wellspring of inspiration becomes subject to replication, a form of intellectual larceny. The recently instituted AI Act in the EU, with its noble aim to…

Walter Tong

It will be interesting to see if these developments present potential new revenue streams for the various data centres that can be used as training data, especially for print media and artists. In the case of traditional print media, where newspapers have been experiencing declining readership in the past few years, their vast archives of reference material can function as an alternative income source as training data for AI companies.

I also think that while this could be perceived as a headwind for AI as a whole, I view it as a tailwind for the incumbents. The added complexity of copyright infringement for the training of LLMs makes navigating the already bottlenecked training phase of AI companies even more difficult…

婷予鄭

Interesting to point out the potential monopoly in generative AI industry given the restrictions on data access. Though the copyright war is still tentative so far, it's predictable that there will be a stronger generative AI alliance in the future. Besides, whether or not it can evolve to an inclusive industry is crucial for market entrants and future investment opportunities.

Alex Salov

A very interesting and conflicting dilemma... Although data privacy is crucial for our own safety, I think that the Act has the potential to lead to a significant headwind to AI's development. Given AI's ability to notably affect global productivity, I think that this growth should be disrupted as little as possible. The power that data providers derive from this Act may also be deemed as unfair. A monopolistic scenario, like in any market, may also lead to further inefficiencies and disrupt productivity globally.

Copyright Could Change AI as We Know It in 2024

Recent Posts

4 Comments

Want to Know When We Post?

Copyright Could Change AI as We Know It in 2024

Already Accessed Free Article !