Buyer Guide: ML Training & Industrial Licensing
This guide explains how Applesauce prices its Derived Audio Data SKU for Machine Learning, AI training, and industrial applications. We use a programmatic, utility-based model to ensure transparency and legal safety for enterprise buyers.
Sensitive Inference Policy
All datasets adhere to our Sensitive Inference Governance policy (see Appendix A). Labels for protected attributes (e.g., emotion, demographics) are opt-in only, and use for surveillance or hiring eligibility is strictly prohibited.
Embeddings Usage: Embeddings are provided for modeling utility; buyers are prohibited from using them for identification, authentication, or re-identification of individuals.
The Price Foundation
Unlike standard licensing which relies on subjective artist weights, ML Training data is priced based on Industrial Utility and Scale Economics.
1. The 70/30 Transparency Split
Every dollar you spend is allocated through a fixed ratio:
- 70% to the Human Pool: This goes directly to the creators whose data is included in your bundle.
- 30% to Applesauce: This covers the cost of identity verification (KYC), secure data extraction, high-fidelity hosting, and legal chain-of-custody management.
2. Industrial Utility Buckets
Your price is determined by how you use the data. We use multipliers ($W_t$) to reflect the transformative value of the assets:
| Bucket | Use Case | Multiplier | Data Provided |
|---|---|---|---|
| I. Analysis | Research, Feature Extraction, LLM Labeling | 1.0x | Features + Meta-data |
| II. Transformative | Style Transfer, Real-time Remixing | 2.0x | Features + Watermarked Stream |
| III. Foundational | Model Training, Voice Cloning, GenAI | 10.0x | Derived Training Package (Features + Embeddings) |
Scale & Efficiency
The cost per user in a bundle is optimized for scale. As you increase the number of users in your dataset, the per-user price decreases to reflect the efficiency of processing large cohorts.
- Small Batches: Standard pricing applies to smaller, highly targeted cohorts.
- Industrial Batches: Significant volume discounts are applied automatically for large-scale datasets (5,000+ users).
Pricing Model: The final deal price is calculated based on the number of users, the Industrial Utility Bucket (Multiplier), and the volume discount tier.
Security & Intellectual Property
Applesauce provides more than just data; we provide a Legal Shield:
- Verified Human Chain: 100% of data is sourced from KYC-verified humans. No "scraped" data.
- Consent Management: Every asset includes explicit "Synthetic & Transformative" usage rights.
- Evolutionary Access: Your license covers the human's evolution. If a creator updates their model or adds new high-quality samples, your "Foundational" license maintains the sync.
FAQ for Buyers
Q: Why is the "Foundational" bucket priced higher? A: Foundational provides a downloadable derived training package (features, embeddings, and labels) alongside expanded contractual rights for model training. Raw audio is excluded in this SKU to reduce biometric risk.
Q: Can I mix buckets in a single deal? A: Yes. You can request a cohort-based deal (e.g., 3,000 users for Analysis and 500 for Foundational training). Our Pricing Oracle will calculate a single transparent price for the entire bundle.