Register

Resources

Home Vantage point First time? Don’t pAnIc!

Vantage point

First time? Don’t pAnIc!

1 April 2024

345

Smita Rajmohan, senior product counsel and co-head of the AI Practice Group at software company Autodesk, offers expert advice to in-house counsel on how their companies can implement AI systems

Companies worldwide are embracing new artificial intelligence (AI) technology at a rapid pace but organisations need to tread with care.

As with all good things, it is important to be thoughtful when adopting these new technologies, and to carefully consider potential harms and risks.

Many such risks involve legal issues around data protection, intellectual property and liability. This article delves into the evolving intersection of AI and the associated legal and ethical issues, as well as how to think about AI governance when implementing the technology.

It also looks at how, through a strategic risk management framework, businesses can mitigate major compliance risks to drive innovation in a responsible way, while upholding customer trust.

Data for AI training

Smita Rajmohan, Autodesk — **Smita Rajmohan**
Senior Product Counsel and Co-Head of AI/ML Practice Group
Autodesk

From the very beginning of any AI project, it is important that the first step should be to assess whether the data used for training your choice of AI model complies with applicable laws. Companies must consider factors such as data ownership, consent and compliance with data protection laws such as India’s Digital Data Protection Act or the European Union’s GDPR. Timely legal review helps determine if the data can be lawfully used for machine learning purposes, and helps prevent future headaches should regulators or the courts come knocking at the door.

This legal assessment might involve a deep dive into your company’s existing terms of service (TOS), privacy policy statements, and other contractual terms to determine what rights, if any, have been obtained from a customer or user. The next step would require a determination as to whether those rights will suffice for the proposed purpose of training an AI model, and whether additional customer notice or consent may be required.

Your strategy for getting consent or relying on a legitimate basis for processing such data will likely depend on the type of data you propose to use. This data might be personally identifiable in nature, it may be synthetic content (generated by technology), or it may be someone else’s intellectual property. As always, data minimisation is a good principle to apply at this stage. In other words, only use what you need.

Close attention should also be paid to how this data is obtained if it is not already in your possession. Companies such as OpenAI have been sued for scraping personal data for training AI algorithms. Scraping data for training models raises its own sets of questions with respect to copyright infringement in India and the US. In addition, US tort laws (civil actions) can also apply to scraping in violation of a website TOS, such as trespass to chattel. US security focused laws, such as the Computer Fraud and Abuse Act, may arguably be applied extraterritorially to prosecute foreign bad actors if data is stolen from secure systems.

IP problems

Many AI practitioners are likely aware that The New York Times recently sued OpenAI for using its content for training OpenAI models. The New York Times based its claims on copyright infringement arguments and trademark dilution.

There is an important lesson to be learnt here for all companies dealing in AI development. Be careful if you use copyrighted content for training models, particularly when an opportunity to license content from the copyright owner is easily feasible. Companies such as Apple have explored those options with news publishers, and these options will likely emerge as the best route for companies building generative AI to mitigate potential copyright infringement claims.

Intellectual property issues also extend to a potential for inadvertent leakage of confidential and trade secret information by AI products. While allowing their employees to use technologies like ChatGPT for text, and Github Copilot for code generation internally, companies must pay close attention to the fact that such generative AI tools often train on user prompts and outputs to further improve their models. Luckily, these companies also offer more secure enterprise offerings and the ability to opt-out of model training.

Some companies, including Microsoft, have also offered to stand behind the outputs of their AI assistants by promising to defend customers against any potential copyright infringement claims. It is possible these provisions will become industry standard, and companies may need to offer IP protection to savvy customers.

Hallucination issues

Copyright infringement claims and data protection-related issues also emerge when generative AI models spit out training data in their outputs. This is often a result of “overfitting”, essentially a flaw or bug where the model over-indexes on memorising the training data, and is therefore unable to respond correctly to the provided prompts.

This memorisation by the AI model leads to training data being regurgitated as output. This may be a disaster from a copyright or data protection perspective, depending on the nature of the leak. It also leads to inaccuracies in the output, sometimes referred to as an “hallucination”. In a case involving a reporter from The New York Times and Bing chat, the AI chatbot, “Sydney”, professed her love to the reporter in a disturbing way, prompting a viral discussion internationally about the need to monitor the use of such tools, especially for younger users who may have a tendency to humanise AI.

This demonstrates why rigorous testing and validation is a must for companies to avoid not only the legal risks from a badly functioning AI product, but also the reputational harm that could follow. Many companies have devoted engineering talent and resources to develop content filters that ensure accuracy, and that output is not offensive, abusive, inappropriate or defamatory.

Respecting access rights

If you have access to user data that is personally identifiable to a user, it is vitally important to ensure you can securely use and handle the data, and you have the ability to delete the data and/or prevent its use for machine learning if required to by a user, regulator, or judicial instructions.

The need to maintain data provenance and ensure robust infrastructure to perform complex AI training is paramount for any AI/ML engineering teams. These technical requirements also impact legal risk.

In the US, regulators such as the Federal Trade Commission have relied on punitive measures such as “algorithmic disgorgement”. This mandates that companies that have run afoul of applicable laws while collecting data for ML training must delete not only incorrectly obtained data, but also the models that were trained on such “tainted data”.

This is hugely detrimental to a company from a resource and financial perspective, taking into account the computing costs of training a model. Keeping an accurate record of which datasets were used to train various models is a good way to be prepared for such a scenario, should it present itself.

Bias in algorithms

One of the major challenges addressed by AI governance is the potential for bias and discrimination ingrained within AI algorithms. As discussed earlier, AI models often depend on large datasets that may harbour historical biases. When these biases are not mitigated prior to product launch, such AI applications can perpetuate or even worsen existing discrimination.

AI algorithms employed for predictive policing by law enforcement have been discovered to reinforce prevailing biases, resulting in the disproportionate targeting of specific communities. In the US, for example, AI has exhibited bias against black and Latino individuals.

In the context of loan approvals or job recruitment, biased algorithms can lead to discriminatory outcomes. Experts and policymakers agree it is the responsibility of developers and organisations to ensure fairness, transparency and accountability in AI systems. The implications of inaccuracies are likely to have a tangible and deeply problematic impact on civil liberties and human rights.

Transparency and ethics

Many companies have established ethics review bodies to ensure their business practices are aligned with principles of transparency and accountability. It is best practice to be transparent about data use and be accurate in your statements to customers when touting the abilities of your artificial intelligence products.

US regulators frown on companies that over-promise AI capabilities in their marketing to consumers. They have also warned companies against quietly and unilaterally changing the data licensing terms in their customer contracts to expand the scope of their rights to customer data.

Risk-based approach

When thinking about AI governance, it can be hard to know where to start. A good place would be to take a harm and risk-based approach.

This involves mapping the various AI projects at the company, scoring them on a risk scale, and managing those risks by implementing actions for mitigation. This is a methodology recommended by many standards bodies and experts, and many companies now incorporate risk assessments into existing processes to measure privacy-based impacts of proposed features.

While establishing AI policies at a company, it is important to ensure the rules and guidelines you may implement will adequately mitigate risk and boost compliance in a holistic manner, considering the latest updates in international laws.

A regionalised approach to AI governance may be cost-intensive and error-prone. The EU recently passed the EU AI Act, with an extensively detailed set of requirements for companies developing and using AI. Many such laws are likely to emerge soon, within Asia and outside of it, with extraterritorial applicability.

Conclusion

As we have now learnt, legal and ethical review and the need for an AI governance process are important through the entire life cycle of AI development – at the time of training a model, at the time of model development and testing, at the time of launch of the model, and even post-launch.

It is a valuable investment for any company wishing to dip into the waters of artificial intelligence to proactively think about how they can implement AI most usefully to remove inefficiencies, while also preserving the confidentiality and intellectual property of their own data, and their customers’ data.

Finally, companies should invest in training programmes to upskill their workforce so staff understand how best to benefit from AI tools, and use them to propel the company’s business to new heights.

Smita Rajmohan is senior product counsel and co-head of AI/ML Practice Group at Autodesk, a US-based MNC software company. She leads Autodesk legal’s generative AI and machine learning platform strategy and is also the lead on legal support for the machine learning platform team supporting Autodesk’s AI and generative AI special projects.