Compliance with personal data use for machine learning

By Yu Rong and He Wei, Hylands Law Firm

In recent years, in addition to enterprises specialising in artificial intelligence (AI) development technologies, an increasing number of internet enterprises have begun to use the data collected in the course of their business operations in machine learning, to improve product performance and develop new products. For example, if a ride-hailing enterprise, which originally collected recorded information for protecting the safety of passengers, also uses the same in machine learning to possibly develop a voice recognition product.

俞蓉 Yu Rong, Partner, Hylands Law Firm
Yu Rong
Hylands Law Firm

Since such data will usually contain a large quantity of users’ personal information, an enterprise using this data in machine learning is required to abide by compliance requirements relating to personal information. This article analyses the key compliance points that enterprises that do not specialize in AI technology development need to pay attention to when using users’ personal information in machine learning.

Securing valid consent of users

In general, the privacy policy for an internet product only specifies that the enterprise will use the personal information it collects to realise specific service functions. For example, the privacy policy for a job search and recruitment app will generally specify that the user’s educational information (e.g., school, educational history, major, etc.) collected by the enterprise will only be used for preparing and submitting a résumé.

Pursuant to the Cybersecurity Law, when collecting and using personal information, the purpose, manner and scope of collection, and use, should be expressly stated, and the consent of the user whose information is to be collected secured. If an enterprise intends to use the personal information it collects in machine learning, it should add this purpose to its privacy policy, and secure users’ express consent.

It should be noted that the Cybersecurity Law requires compliance with the “minimum necessary” principle in the collection of personal information. As the use by an enterprise of personal information in machine learning is not required for the realisation of a specific service function, even if it secures users’ consent it nevertheless does not have the right to collect personal information solely for machine learning purposes, or to exceed what is necessary for the realisation of the service functions.

何为 He Wei, Associate, Hylands Law Firm
He Wei
Hylands Law Firm

The Information Security Technology – Personal Information Security Specification, implemented on 1 October 2020, also addresses the above-mentioned issues. Article 5.3 of the specification adds a new requirement that users’ personal information may not be mandatorily collected for the sole purpose of improving service quality, enhancing the user experience, or developing new products.

Notwithstanding the fact that the specification is a recommended national standard, on the one hand, the added provision is consistent with the requirements of the Cybersecurity Law, while on the other hand, in practice, the specification is already being treated by the competent authorities as a regulatory basis. Therefore, subject to user consent, the scope of personal information that an enterprise can use in machine learning is limited to the personal information, the collection of which is necessary for realising the service functions.

Engagement of third parties to carry out data annotation

Data can be used in machine learning only if annotated. To enhance efficiency and save on costs, an enterprise will usually engage a data service company to complete data annotation. Current laws does not specifically address the engagement of a third party to process personal information, but the specification sets out express requirements in this regard.

Based on the specification, when an enterprise engages a third party to annotate data, it needs to pay attention to the following points. First, in engaging the third party, the enterprise should not exceed the scope of the authorisation and consent secured from users.

Second, the enterprise should stipulate the contractor’s obligations by way of a contract, mainly including: (1) setting out in the contract the specific requirements for processing by the contractor; (2) requiring the contractor to secure the enterprise’s consent before any sub-contracting; (3) requiring the contractor to promptly report to the enterprise any failure by it to process personal information in the manner required by the enterprise, its inability to provide an adequate level of security protection, or the occurrence of a security breach; and (4) requiring the contractor to cease storing the personal information once the engagement is terminated.

Finally, the enterprise should manage and supervise the contractor, including conducting a personal information security impact assessment, recording and storing information on the processing of the personal information by the contractor, auditing the contractor, etc.

In a scenario where the processing is done by a contractor, the latter only has the right to process personal information within the scope authorised, and control of the personal information entrusted to the contractor remains with the enterprise. Accordingly, the enterprise does not need to secure the specific authorisation of the users for the processing by the contractor.

However, if, after the enterprise provides users’ personal information to the data service company, such third party is able to control such information as it deems fit, such an act constitutes the sharing of personal information, not processing by a contractor, and the enterprise is required to separately secure the consent of users.

Special restrictions on personal biometric information

Personal biometric information includes fingerprints, voice prints, iris and facial identifying features, etc. Personal biometric information is unique, and, once disclosed or misused, is likely to jeopardise the safety of the person and his/her property.

Article 6.3 of the specification sets out for the first time the requirement that, in principle, raw personal biometric information should not be stored, and specifies that an enterprise can take alternative measures such as storing a summary of personal biometric information, directly using this information at the collection terminal to realise the service function, or deleting, after completion of the relevant service function, the raw images from which the personal biometric information can be extracted.

After implementation of the specification, even if an enterprise collects personal biometric information based on the valid consent of users, the above-mentioned alternative measures will result in its being unable to use that information directly, or to use, for a relatively extended period of time, the raw personal biometric information in machine learning.

Yu Rong is a partner at Hylands Law Firm. She can be contacted on +86 10 6502 8935 or by email at

He Wei is an associate at Hylands Law Firm. She can be contacted on +86 10 6502 8745 or by email at