Entertainment

A Deep Dive Into the World of AI Voice Cloning

September 1, 2023

Photo Credit: s2dungnguyen

AI voice cloning, or voice synthesis, leverages sophisticated machine learning algorithms to recreate a specific human voice. To accomplish this voice recreation, algorithms are trained using vast volumes of the target voice data, honing in on unique aspects such as tone, pace, accent, and more nuanced vocal idiosyncrasies.

This article provides an in-depth examination of AI voice cloning technology. From its innovative applications in music and creative industries to pressing legal and ethical concerns such as copyright infringement and identity theft, we discuss the need for adapted regulations to balance risks and opportunities.

The applications of AI voice cloning technology are manifold and benefit diverse sectors. For instance, the technology can assist visually-impaired individuals by creating more personalized and human-like virtual assistants or acting as a communication tool for individuals who have lost their ability to speak. Animators can also use the tool to generate voiceovers for their creations. But for the music industry, AI can offer an entirely new dimension of creativity and experimentation by assisting the creation of tracks from scratch — or even entire albums.

Copyright Infringement and Ownership

The advancement of AI in the music industry has posed challenging questions regarding copyright and ownership. These AI models, trained on vast datasets that potentially encompass copyrighted music, generate compositions that could violate an original artist’s rights. The issue becomes incredibly convoluted when AI systems employ voice cloning, reproducing an artist’s distinct performance and venturing into potential legal pitfalls.

At the heart of this debate is a pressing legal question: who should own AI-generated music? Is it the user interacting with the AI or the organization that developed the algorithm? Or, given the absence of human creativity, does the output not qualify for copyright protection at all?

This dilemma is at the epicenter of ongoing discussions regarding AI’s role in legal authorship and ownership rights for content produced by machines rather than humans. AI lacks intentional agency, so while it’s highly capable of mimicking human creative processes, its output ultimately depends on the underlying training data.

Examples of AI Voice Cloning in Music

As the world of music witnesses the emergence of voice cloning and AI-generated vocals, the lines between human and machine creation become blurry. For instance, 2022 saw the release of “Libra,” a fabricated collaboration track between The Weeknd and Drake, featuring vocals cloned from Drake using AI. Similarly, the track “From the D 2 the LBC” by Eminem and Snoop Dogg leveraged AI startup Mammoth’s Ghostwriter technology to emulate their signature styles skillfully.

In 2021, Capitol Records made headlines by signing Virtual musician FN Meka, presented as an entirely AI-voiced rapper. Created by the music tech company Factory New, FN Meka’s debut track “Florida Man” garnered attention for its AI-cloned vocals. However, fans later discovered that a human voice actor had provided the voice behind FN Meka. That voice actor was never credited for his talent and did not receive compensation from Factory New.

As intrigue around FN Meka’s voice grew, skeptics scrutinized the artist’s public persona and creative choices. Presented as a black male cyborg, FN Meka was shrouded in further controversy for using African American vernacular speech and mannerisms. Critics argued that this amounted to digital blackface and cultural appropriation.

Capitol Records faced backlash for profiting from an AI rapper amid layoffs of real artists. Consequently, the record label terminated the contract due to widespread criticism and pulled FN Meka’s music within a week of signing. Factory New removed FN Meka from online platforms, issued an apology, and promised to evaluate their creative process to prevent future issues of cultural insensitivity.

The repercussions of the FN Meka case highlighted the significance of authenticity and appropriate cultural representation in music, revealing the potential pitfalls associated with the misuse of AI technology. Eventually, the controversy proved how AI’s capabilities, when used without proper consideration, can negatively impact an artist’s reputation — even if the technology initially intended to enhance the musical experience.

With its ability to generate and manipulate sounds, AI offers the possibility of innovative contributions to the music industry, but only if applied responsibly. This responsibility includes considering ethics and cultural sensitivities when employing AI for voice cloning and similar creative processes.

The incident with FN Meka is just one example of how an unbridled entry of AI into the music industry can lead to unexpected consequences and spark intense debate.

Another incident underscoring these AI-induced challenges occurred in 2020, when an anonymous individual used AI to emulate Jay-Z’s distinct voice for a rendition of Shakespeare’s iconic soliloquy, “To be, or not to be.” The video quickly gained traction on YouTube and was removed at the behest of Jay-Z’s record label, which cited copyright infringement. However, YouTube eventually reinstated the video citing a lack of valid legal grounds for removal.

Jay-Z’s case highlights the power of such AI technologies to infringe on an artist’s rights and impact the perception of their work. While this incident may seem innocuous, it raises questions about how voice cloning can potentially misrepresent or damage an artist’s reputation or brand. As technology continues to evolve and AI gains higher sophistication, there needs to be a framework in place to help navigate its challenges. Alongside the support of legal and regulatory bodies, the music industry must balance artists’ rights while fostering technological innovation.

Impact on Music Creators’ Livelihood

AI’s capacity to generate music tirelessly, without the need for breaks and at minimal cost, poses a significant challenge to human artists. While a human artist may require years of practice and substantial financial investment to produce an album, AI can churn out a comparable album in minutes without any human-associated costs. If an AI-led music churn-out is allowed to happen unconstrained, the cost disparity could lead to the market flooding with AI music, undercutting the value of human-generated work.

OpenAI’s MuseNet, launched in 2019, is a prime example of how anyone can use AI to create music. MuseNet can generate four-minute-long compositions with ten different instruments and produce music in various styles. It was trained on a diverse dataset of music from multiple genres, showcasing the potential of AI in the realm of music composition. OpenAI, also known for developing ChatGPT, is not the only company venturing into the AI music space. Tech giants like Meta and Alphabet are actively developing their music-generation applications. While these efforts are still early, they are part of an increasingly competitive race to harness AI for music creation.

Moreover, replicating established artists’ voices and styles can further complicate matters. As AI becomes more skilled at impersonation, fans may be less inclined to purchase the original artist’s work, opting for AI-generated renditions. This scenario could lead to dwindling concert attendees and declining music sales, causing a substantial loss of revenue for artists.

But it’s important to note that AI doesn’t necessarily spell complete doom for artists. Instead, it presents an opportunity for reevaluation and adaptation. Artists can leverage AI technology to enhance their work and diversify their income streams. AI tools can be incredible assistants for songwriting, composition, and production. Furthermore, AI could also enable more artists to produce music independently, reduce financial reliance on record labels, and allow for more direct-to-fan sales.

On the subject of opportunities, some artists are already venturing into AI-assisted music creation, such as Grimes’ experimentation with an AI co-composition. Companies are also angling to create ‘ethical’ voice-focused AI products, such as Resemble AI, a voice cloning platform that recently raised $8 million in Series A funding. The platform’s goal is to develop voice cloning technology in collaboration with musicians to prevent malicious usage. Rather than replacing artists, such companies aim to provide tools that empower creators.

AI also brings other opportunities for innovation to artists. Musicians could explore the use of AI in live performances, combining their human artistry with AI’s capabilities to create unique and exciting performances. Such innovation could drive concert ticket sales, offering a potential buffer against any loss in music sales revenue.

Technological advancements in music often evolve differently than initially predicted and tend to complement rather than replace human creativity. The widespread use of drum machines, for example, hasn’t eliminated the need for drummers — even though it has transformed the percussion and drums landscape. Likewise, integrating AI into music will likely shape the industry in unexpected ways, opening new avenues for artistic expression.

By harnessing the positive potential of AI, while mitigating risks through thoughtful regulation and artistic ingenuity, the music industry could thrive on an AI wave. The key for the music industry will be to shape an ecosystem that allows AI to support rather than rule.

Music has already survived several disruptive technologies in the past, and it can do it again — once artists navigate the legal and creative challenges ahead. There are always opportunities amidst uncertainty — as long as we shape technological change through an ethical lens.

Misrepresentation and Its Repercussions

While voice cloning technology opens up a vast new world of possibilities, it also opens the door to potential misuse. Without the artist’s permission, an entity may clone an artist’s voice and use it to create music in the artist’s unique style. This unethical practice infringes upon the artist’s rights over their voice and style and can mislead fans and harm the artist’s reputation.

The picture painted above is not merely hypothetical, as demonstrated by the recent case of Singaporean singer Stefanie Sun. An unauthorized AI model cloned Sun’s voice, leading to the production of a song impersonating her without her consent. When fans encounter such deceptively produced music, they may mistakenly attribute it to the original artist. If the piece is substandard or inconsistent with the artist’s brand, it can damage their credibility and tarnish their image.

The incident involving Stefanie Sun is a stark illustration of the risk of misrepresentation, where a cloned voice led to a song that compromised Sun’s reputation. Misappropriating artists’ likenesses and creative voices can undermine their control over their content, sow confusion among fans, and ultimately harm their careers and livelihoods.

As AI capabilities advance rapidly, strong regulations, policies, and rights protections are needed to counter the growing threat of AI model misuse and content forgery. Artists must have the power to control their creative works and public image to prevent their careers from being hijacked by AI.

A parallel concern lies in AI’s ability to generate compositions autonomously. Developers can train machine learning models on a dataset featuring a specific artist’s style and develop new music that closely mirrors that style. This issue introduces a unique form of misrepresentation where the AI isn’t cloning the artist’s voice but replicating their distinctive style without their permission. In such cases, the artist’s musical signature — their unique arrangements, rhythms, and melodies — is effectively exploited.

In 2016, a team of researchers from Sony CSL Research Laboratory used an AI tool called Flow Machines to create a pop song titled “Daddy’s Car.” The AI was trained on a dataset of 13,000 music samples from various genres and periods, and the result was a catchy song reminiscent of The Beatles’ style.

Voice Cloning as a New Avenue for Identity Theft

While the use of AI in generating music can be innovative and entertaining, the broader implications of voice cloning technology extend beyond the realm of music, posing new risks and challenges. Voice cloning technology, now increasingly accessible and sophisticated, opens the door to a different form of identity theft. Unscrupulous individuals or entities could use this technology to clone an artist’s voice for potentially fraudulent activities unrelated to music creation.

For instance, in 2019, an AI company named Dessa produced an eerily accurate clone of the voice of Joe Rogan, a well-known podcast host. This realistic voice clone highlighted the potential misuse of the technology, raising concerns about the authenticity of online content and the potential for identity theft.

Such misuse could range from spreading false information or making fraudulent claims under the guise of the artist’s voice to more sinister criminal activities like impersonating the artist for financial fraud. Given the public’s trust in the authenticity of an artist’s voice, this form of identity theft could cause significant harm to the artist, their fans, and the general public.

In extreme cases, ill-intentioned individuals can also misuse voice cloning to create deepfake audio, in which the artist’s voice is convincingly manipulated to say things they never said. In the era of social media and rapid information sharing, the propagation of such a deepfake could have swift and severe implications, from manipulating public opinion to causing emotional distress to the artist.

Addressing these threats will require concerted efforts from multiple stakeholders, including artists, technology companies, lawmakers, and the public. This battle could involve creating more explicit legal frameworks for voice ownership, developing technology to detect and flag voice cloning, and promoting public awareness of these potential abuses. To counter the rapidly evolving landscape of voice cloning technology, the ethical guidelines that govern AI development and usage will require continuous reassessments and updates.

Adapting Legal Frameworks to Contemporary AI Challenges

Addressing the risks of AI involves proactive efforts on multiple fronts, including law, technology, and ethics. As AI technologies continue to reshape the music industry, addressing the accompanying legal complexities is imperative. Artists, rights holders, and technology developers navigate uncharted waters, where a lack of clarity could lead to disputes or obstacles to innovation. The need for a modernized legal approach tailored to the evolving AI landscape in music is critical.

Copyrights and Royalties: The Legal Landscape

Voice cloning and impersonation enabled by AI present two distinct copyright challenges. Voice cloning involves using AI to synthesize new vocal content by mimicking a singer’s timbre and style. The AI process raises questions around legal ownership of the output — does it belong to the user or the developer, or is it downright unprotected?

Voice impersonation, on the other hand, relates to imitating a vocal performance without consent, like an AI impersonator covering a song by mimicking the original artist’s protected style. While the composition may be public, the unauthorized usage of copyrighted performance works could constitute infringement. In summary, voice cloning grapples with ownership over AI creations, while in contrast, voice impersonation deals with the misuse of protected performance IP — two sides of the copyright dilemma posed by increasingly realistic AI vocal mimicry.

Legal Measures to Counter Unauthorized Voice Cloning

With the advancement of AI in music creation, there is a clear need for laws to adapt to these technological changes. Artists, representatives from the music industry, and legal experts are lobbying for changes in copyright laws to encompass the issues brought forth by AI. One such change could be recognizing a performer’s voice as a unique and copyright-protected aspect of their work. Currently, copyright laws focus on protecting tangible creative expressions like lyrics and melodies, leaving intangible aspects like voice unprotected. By modifying copyright laws to cover an artist’s distinctive voice, artists would have legal grounds to contest unauthorized voice cloning. This upgraded legal framework would significantly deter potential misuse of voice cloning technology.

In the US, ‘fair use’ policies allow limited use of copyrighted material without requiring permission from the rights holders. But in the context of AI-generated music, what constitutes ‘fair use?’ Establishing more precise guidelines could serve as an indispensable rulebook as we head toward an evolved, AI-supported music industry.

Impressively, the EU has been at the forefront of policy updates for the AI era, proposing reforms that address copyright issues in the digital space. One notable effort is the European Union Copyright Directive, aimed at enhancing protections for copyright holders regarding digital uses and modernizing and unifying EU copyright law for the digital age.

In April 2021, the European Commission introduced the AI Act to regulate AI development and usage by setting a framework for developers and users. With a primary focus on transparency, the framework ensures that AI systems notify users of their AI interactions. Specifications can vary based on the type of AI but may include details on AI functionalities, human oversight, and decision-making responsibilities. Although the Act initially excluded non-specific-purpose AI, recent updates now address “foundation models in generative AI.” These changes mandate transparency, data governance, and risk mitigation for such AI providers. Additionally, developers must disclose copyrighted data summaries used in AI training to prevent content from violating the law.

Training AI on Non-Copyrighted Music

One potential solution to mitigate copyright issues is for AI developers to train their models exclusively on non-copyrighted music or to use music for which they have obtained the necessary permissions.

A logical next step is transitioning from a model where AI developers rely solely on permissions to use copyrighted music for training to a more structured approach involving licensing agreements. By engaging directly with artists and negotiating licensing terms, AI developers can create a more comprehensive framework that respects artists’ rights. This shift would give artists more control over their music and ensure they receive fair compensation when their work is training AI models. Establishing such licensing agreements could be a win-win situation, allowing AI to evolve while protecting and rewarding human creativity.

For instance, legal bodies could arrange licensing agreements between AI developers and artists. Alternatively, implementing a royalty system could ensure artists receive a portion of the profits from AI-generated music that uses their style or voice.

However, this approach might prove to be flawed. Even when trained on non-copyrighted music, an AI system might still unintentionally replicate the ‘style’ of copyrighted music, further complicating the matter. Notably, the high-profile legal case surrounding the song “Blurred Lines” has already set a precedent that could impact future challenges. In that case, the court ruled that the song had copied the “feel” or “sound” of Marvin Gaye’s “Got to Give It Up” — elements that are typically considered intangible and non-copyrightable. Such legal interpretations may introduce ambiguity and uncertainty into the already complex landscape of AI and music copyright.

Technical Safeguards Against Voice Cloning Misuse

Despite its potential for misuse, technology can also be a vital tool in countering the negative impacts of AI advancements. Developers could create advanced algorithms to identify AI-generated music and voice clones, akin to existing systems that recognize copyrighted music. Such algorithms could analyze parameters such as sound frequencies, tonal shifts, and speech patterns to differentiate between human and AI-generated voices.

These advanced detection systems could then be integrated into streaming platforms, music stores, and social media sites, allowing for the automatic removal or flagging of unauthorized uses of an artist’s voice. Additionally, these systems could assist in enforcing the revised copyright laws, making it easier to identify and act against instances of copyright infringement.

Watermarking: Embedding Identity in Digital Content

The concept of watermarking, traditionally used to protect visual and digital media, could be applied as an additional protective measure against voice cloning misuse. In the context of AI voice cloning, watermarking would involve embedding an imperceptible audio code within the AI-generated content. This code, detectable only through specific software, would contain information about the origin of the content.

This watermark would serve multiple purposes. First, it could help trace the origin of the content, providing valuable evidence in cases of misuse or copyright infringement. Second, it could discourage potential misuse, as a watermark would indicate that rights owners can trace the content back to its source. Lastly, watermarks could also facilitate automated detection systems, making flagging and removing unauthorized uses of an artist’s voice easier.

Encouraging Creative Exploration and Protecting Artists’ Rights

These considerations highlight the delicate balance between encouraging creative exploration and protecting artists’ rights in the age of AI voice cloning and impersonation. The laws governing these issues may differ significantly from one country to another, but many aspects of voice impersonation fall into legal gray areas. Therefore, fostering a culture of respect for original work and encouraging responsible attribution while developing a nuanced understanding of these complex issues is essential.