Why this framework exists
Every AI music system is built on training data. As platforms, distributors, and regulators examine how these systems are developed, one question has become increasingly important: what data entered the model, and can that be shown?
Attribution at the point of generation is complex. Provenance at the point of training is more solvable. It requires discipline at the moment of data acquisition, before any model is trained.
iah.fit inc., the company behind HitZERØ and the IAH Resonance Engine™, developed this framework to bring structure and verifiability to that process. HitZERØ was built to operate within it. The framework is published so that qualified partners and their counsel can evaluate our approach to training data sourcing.
The eight criteria
A training corpus aligns with the IAH Provenance Framework when it satisfies all eight criteria below. Each criterion is written to be verifiable through documentation.
1. Documented Source.
Every dataset in the corpus is recorded in a provenance registry and traces to a named rights holder. No dataset enters from an unidentified or anonymous origin.
2. Acquired Under Written Terms.
Every dataset is acquired under documented terms, whether through platform-mediated IP assignment, a commercial license, an open academic license, or in-house creation. The terms permit machine learning and model training use, and they are secured in writing before training begins.
3. Affirmative Exclusions.
The company acquires no audio through scraping and acquires no audio through rips from any streaming platform. Third-party datasets enter only under written license. These exclusions are enforced at the point of data acquisition.
4. No User Data.
Customer prompts, customer recordings, and customer generations remain outside the training corpus. The company maintains this as standing policy and records it in its provenance documentation.
5. Auditable Registry.
The corpus is documented in a master provenance registry that records, for each source, the rights holder, the acquisition terms, any restrictions, and compliance status. The registry is maintained as a living document and is available for review by qualified partners and their counsel under non-disclosure agreement.
6. Accountability.
The integrity of the corpus is supported by executed declarations and documentation maintained by the company.
7. Indemnified Commitment.
The company is prepared to extend contractual indemnification for training data provenance to partners under written agreement, scoped to the terms of each arrangement.
8. Standards Readiness.
The company maintains its provenance documentation in a form structured to support source-level traceability contemplated by emerging AI transparency frameworks, including the EU AI Act and guidance from national copyright authorities.
How HitZERØ conforms
HitZERØ maintains verified conformance with the IAH Provenance Framework. The IAH Resonance Engine™ was trained on a corpus drawn from four categories of lawfully acquired data, recorded in the Master Training Data Provenance Registry:
- Original in-house works. Original compositions produced internally by iah.fit inc. and wholly owned by the company.
- Commissioned works. Music created by independent creators and delivered to iah.fit inc. under platform terms of service that assign intellectual property rights in the delivered work to the buyer upon payment, with all creators compensated in full.
- Commercial licenses. Datasets acquired under paid commercial licenses that expressly grant rights to use the content for machine learning training and to commercially exploit models and outputs trained on the data. This includes a machine learning license agreement with Rightsify Group, LLC.
- Open academic research. Vocal and musical datasets published by research institutions under open academic and Creative Commons terms, including VocalSet (Wilkins et al., 2018), licensed under CC BY 4.0.
Each source is recorded in the registry with its rights basis. Provenance documentation is available to qualified partners and their counsel under non-disclosure agreement. Affirmative assurances apply to sources recorded as verified. Sources recorded as pending verification are excluded from these assurances.
The conformance program
Transparency Shield™is HitZERØ’s conformance program for the IAH Provenance Framework.
Proof-of-Origin is the per-generation record that allows the origin of a creation to be evidenced without exposing private data.
What this framework does not require
This framework governs the provenance and rights of training data. It does not require disclosure of model architecture, generation methods, proprietary conditioning systems, or the commercial terms of any individual license. A company can document its data sources without disclosing how its technology works.
The IAH Provenance Framework is published and maintained by iah.fit inc. Conformance claims are subject to the documentation and verification mechanisms described herein.