A review of the Chinese legal framework on copyright and generative AI training data reveals a lack of judicial clarification
The rapid development of generative AI models has triggered global debate over the legal and ethical use of copyrighted materials for model training. While jurisdictions including the European Union, Japan and the United States have developed divergent approaches, China is formulating its own regulatory response to balance copyright protection with technological innovation.
Copyright Law and its limitations
China’s Copyright Law forms the cornerstone of domestic copyright protection. However, it has no explicit provision specific to AI training. Article 24 of the Copyright Law enumerates certain scenarios (eg, personal study, news reporting and scientific research) where use of a published work is allowed without prior authorisation. For example, Article 24 permits use of published works “for the individual’s study, research or appreciation”, but expressly prohibits commercial exploitation or publication of such copies without permission.
This scenario-based approach means that large-scale, commercial use of copyrighted works for AI training generally falls outside statutory exceptions. Unlike some jurisdictions, which have introduced text and data mining exceptions to support AI development, China currently offers no such dedicated provision. As a result, using protected works for AI model training, which generally serves a commercial purpose, cannot be automatically exempted from infringement liabilities under the Copyright Law.
Emerging regulations and standards
As of September 2025, there is no standalone AI law, and there has been persistent academic debate regarding whether a standalone law is needed. The Standing Committee of the National People’s Congress included legislative initiatives to boost the healthy development of the AI industry in its 2025 Annual Legislative Workplan. It is yet to observe further legislative development. To address gaps in the law, Chinese regulators have been rolling out targeted rules and standards to provide guidance and support.
The first and most notable is the ‘Interim Measures for the Administration of Generative AI Services’ (2023), issued by the Cyberspace Administration of China and six other state-level regulatory authorities. The interim measures explicitly require that providers of generative AI services must use data and foundational models from legitimate sources and must not infringe others’ lawful IP rights or personal information. However, it does not directly answer the question whether using data for AI training purposes constitutes infringement.
In addition, several AI-related national standards have been published, requiring AI developers to:
conduct IP risk assessments on training data;
establish mechanisms for complaint handling; and
inform users of potential copyright issues relating to AI-generated content.
These standards are usually drafted by working groups consisting of law and tech experts, research institutes, and company representatives. While these standards are neither laws nor mandatory, they serve as benchmarks for best practices and may be used by the authorities when evaluating AI systems for security and compliance.
In 2024, the National Technical Committee 260 on Cybersecurity of Standardisation Administration of China released the ‘Basic Security Requirements for Generative AI Service’ (TC260-003), which includes a section on managing and tracing AI training data. It recommends:
using open-source data according to the open-source licence;
maintaining a collection record for self-collected or produced data and refraining from collecting data which others own and explicitly prohibit any collection of (eg, web data clearly marked as off-limits via the Robots Exclusion Protocol or other technical restrictions, and personal information for which the individual has refused to grant consent);
maintaining the relevant legal agreements and records for data obtained via commercial channels, and auditing the documents provided by data providers;
formulating data administration rules and designating persons responsible; and
setting up reporting channels to receive complaints and feedback.
The ‘Basic Security Requirements for Generative AI Services’ (GB/T 45654–2025, consolidated and released by the State Administration of Market Regulation and the National Standardisation Administration) share similar recommendations. There is also a pairing standard called ‘Security Specification for Generative AI Pre-Training and Fine-Tuning Data’ (GB/ T45652–2025).
China is moving quickly through administrative regulations to set ground rules. It has also introduced a registration and security assessment regime for generative AI models before services are deployed, such that adherence to data requirements is increasingly critical to obtaining approval. AI developers should be prepared to demonstrate the lawful origin of training data during security reviews and model filings.
Ongoing debate and controversies
Whether training generative AI models on copyrighted works can be considered fair use is a key point of contention. Some argue that AI training is transformative and non-expressive use: the AI “reads” the works to discern patterns, not to enjoy or publicly exploit the expression. They contend that this process does not substitute for the original markets of the works and yields public benefits by advancing technology.
In 2024, the Hangzhou Internet Court ruled on a case involving an AI product trained on comics (No (2024) Zhe 01 Civil Second Instance No 10332). In its decision, the court suggested that using works for training without directly harming the copyright owner’s market may be seen as permissible. This aligns with a more liberal ‘fair use’ approach seen in some overseas jurisdictions, emphasising the transformative nature and lack of direct market competition.
Opponents, especially rights holders and many legal scholars, maintain that AI training without permission constitutes infringement under China’s current Copyright Law, absent a specific statutory or regulatory exemption. They emphasise that copying works into training datasets involves reproducing protected content in full, often on a massive scale and possibly creating derivative analysis of those works. These actions directly implicate the author’s exclusive rights of reproduction, adaptation and, possibly, distribution, if the model outputs parts of the training data. Furthermore, they caution that an overly broad fair use interpretation could undermine the incentive system of copyright: if AI developers could freely use anyone’s content to build lucrative models, it might disincentivise authors and content platforms from creating and investing in original works.
Therefore, legal uncertainty persists. Companies face a risk that the courts could go either way in future disputes; some judges may be sympathetic to AI innovation and apply an expansive fair use rationale, while others might strictly enforce the letter of the Copyright Law. Until legislative or higher judicial clarification is provided in China, AI developers are advised to be cautious and seek licences or use legitimately open data wherever possible.
Key challenges and future outlook
China’s legal and regulatory response to the data needs of generative AI is rapidly evolving. Current laws impose a generally strict requirement of authorisation for copyrighted content, and new regulations reinforce the mandate for legitimate data sourcing and respect for rights. At the same time, there is a clear policy intention not to stifle AI innovation.
In addition to legal ambiguity, another key challenge is building an effective licensing regime, if AI firms are required to obtain licences for AI training data. While the above regulations and national standards have provided some guidance, it is still costly and overwhelming work if taking into account the massive scale of data needed to effectively train AI models.
There has been discussion about how collective licensing solutions may help (eg, leveraging China’s existing copyright collective management organisations for music, text and images) to broker deals that allow AI firms to use large catalogues of works and pay fair remuneration to rights holders.
Stakeholders in China’s AI sector must navigate this nuanced environment by combining technological solutions with legal diligence. By doing so, they contribute to an emerging model of AI development that seeks to be both innovative and compliant, a model that could also influence global norms.
"Some judges may be sympathetic to AI innovation and apply an expansive fair use rationale, while others might strictly enforce the letter of the Copyright Law"









