California has passed Cal. AB 2013, or the California Generative Artificial Intelligence Training Data Transparency Act, a groundbreaking new bill set to take effect January 1, 2026. Cal. AB 2013 requires companies that design, code, produce, or significantly update certain generative AI systems to publicly disclose on their websites information concerning the datasets used to train those generative AI systems. Our questionnaire below is a good place to start to help determine (1) whether your company is subject to the requirements of Cal. AB 2013 and (2) if so, what information to include in your company's website disclosure.
Part A. Is Our Company Subject to the Requirements of Cal. AB 2013?
1. Have we designed, coded or produced a generative AI system1 that we (or a third party) have made publicly available?
2. Have we retrained, fine-tuned or made any other significant update to a generative AI system that changes its functionality or performance that we (or a third party) have made publicly available?
If the answer to both questions in Part A is no, your company is not subject to the requirements of Cal. AB 2013.
If the answer to either question in Part A is yes, please move to Part B to determine whether an exception is applicable.
Part B. Is Our Generative AI System Subject to Any Exceptions in Cal. AB 2013?
1. Are any of the following statements true about this generative AI system?
a. The system's sole purpose is to help ensure security and integrity;2
b. The system's sole purpose is the operation of aircraft in the national airspace;
c. The system was developed for national security, military, or defense purposes and is made available only to a U.S. federal government entity.
If the answer to any of the subparts in Part B is yes, the generative AI system is not within the scope of Cal. AB 2013.
If the answer to each of the subparts in Part B is no, your company is subject to the requirements of Cal. AB 2013 and you should move to Part C to determine the information required for your company's website disclosure.
Part C. What Information Should We Disclose Regarding Our Generative AI System?
1. For each dataset (i.e., each single, pre-packaged collection of data) used to test, validate or fine-tune the generative AI system subject to Cal. AB 2013,3 please:
a. Identify the source or owner of the dataset and indicate whether the dataset was purchased or licensed;
b. Provide the time period when the data in the dataset were collected, indicate whether collection is ongoing, and identify when the dataset was first used in the development of the generative AI system;
c. Provide a general range for the number of data points included in the dataset (e.g., for text, in tokens; for images, in numbers of images; for video, in hours of video content), including an estimated figure if the dataset is a dynamic dataset;
d. Confirm whether the dataset contains:
i. Data protected by copyright, trademark or patent law (or, if applicable, indicate whether the data is entirely in the public domain, and therefore not subject to copyright, trademark or patent law)
ii. Personal information;4
iii. Aggregate consumer information;5
iv. Synthetic data6.
e. Confirm whether we cleaned, processed or otherwise modified the dataset.
i. If so, please describe the intended purpose of the cleaning, processing or modification of the dataset.
f. Describe:
i. If the dataset includes labels, the types of labels used; ii. If the dataset does not include labels, the general characteristics of the data;7 and i
ii. With respect to both (i) and (ii), as applicable, how the dataset will contribute to the purpose of the generative AI system.8
Footnotes
1. "Generative artificial intelligence" refers to AI that can generate synthetic content, such as text, images, video, and audio.
2. For purposes of this questionnaire, "Security and Integrity" means the ability to detect security incidents, resist malicious, deceptive, fraudulent, or illegal actions and to help prosecute those responsible for those actions, and ensure physical safety.
3. For pre-existing AI models that we fine-tuned, distilled or otherwise modified, these questions should be answered with respect to the training content we used and our own modifications of the generative AI system, not the training content or process used by the underlying AI model provider.
4. "Personal information" means information that identifies or is reasonably capable of being associated or linked with a particular consumer or household.
5. "Aggregate consumer information" means information that relates to a group or category of individuals, from which individual identities have been removed such that the information is not reasonably linkable to any individual or household, including via a device.
6. "Synthetic data" refers to data generated when seed data are used to create artificial data that have some of the statistical characteristics of the seed data.
7. General characteristics may include the format (e.g., image, audio, video, text, other) and sample values of the underlying data points. Sample values will depend on the format of the data: (i) for image, examples may include photography, visual art works, infographics, social media images, logos, or brands; (ii) for audio, examples may include musical compositions and recordings, audiobooks, radio shows and podcasts, private audio communication; (iii) for video, examples may include music videos, films, TV programs, performances, video games, video clips, journalistic videos, social media videos; (iv) for text, examples may include fiction and non-fiction text, scientific text, press publications, legal and official documents, social media comments, and source code.
8. This purpose explanation can be relatively high-level (e.g., "because the dataset is comprised of images of trees, it will help our AI system, which is intended to identify objects in nature, achieve its intended purpose", or "because the dataset is compromised of guitar sounds, it will help our AI system, which is intended to create music based on specific genres.")
The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.