Buyer Guide: ML Training & Industrial Licensing

This guide explains how Applesauce prices its Derived Audio Data SKU for Machine Learning, AI training, and industrial applications. We use a programmatic, utility-based model to ensure transparency and legal safety for enterprise buyers.

Sensitive Inference Policy

All datasets adhere to our Sensitive Inference Governance policy (see Appendix A). Labels for protected attributes (e.g., emotion, demographics) are opt-in only, and use for surveillance or hiring eligibility is strictly prohibited.

Embeddings Usage: Embeddings are provided for modeling utility; buyers are prohibited from using them for identification, authentication, or re-identification of individuals.

The Price Foundation

Unlike standard licensing which relies on subjective artist weights, ML Training data is priced based on Industrial Utility and Scale Economics.

1. The 70/30 Transparency Split

Every dollar you spend is allocated through a fixed ratio:

  • 70% to the Human Pool: This goes directly to the creators whose data is included in your bundle.
  • 30% to Applesauce: This covers the cost of identity verification (KYC), secure data extraction, high-fidelity hosting, and legal chain-of-custody management.

2. Industrial Utility Buckets

Your price is determined by how you use the data. We use multipliers ($W_t$) to reflect the transformative value of the assets:

Bucket Use Case Multiplier Data Provided
I. Analysis Research, Feature Extraction, LLM Labeling 1.0x Features + Meta-data
II. Transformative Style Transfer, Real-time Remixing 2.0x Features + Watermarked Stream
III. Foundational Model Training, Voice Cloning, GenAI 10.0x Derived Training Package (Features + Embeddings)

Scale & Efficiency

The cost per user in a bundle is optimized for scale. As you increase the number of users in your dataset, the per-user price decreases to reflect the efficiency of processing large cohorts.

  • Small Batches: Standard pricing applies to smaller, highly targeted cohorts.
  • Industrial Batches: Significant volume discounts are applied automatically for large-scale datasets (5,000+ users).

Pricing Model: The final deal price is calculated based on the number of users, the Industrial Utility Bucket (Multiplier), and the volume discount tier.


Security & Intellectual Property

Applesauce provides more than just data; we provide a Legal Shield:

  1. Verified Human Chain: 100% of data is sourced from KYC-verified humans. No "scraped" data.
  2. Consent Management: Every asset includes explicit "Synthetic & Transformative" usage rights.
  3. Evolutionary Access: Your license covers the human's evolution. If a creator updates their model or adds new high-quality samples, your "Foundational" license maintains the sync.

FAQ for Buyers

Q: Why is the "Foundational" bucket priced higher? A: Foundational provides a downloadable derived training package (features, embeddings, and labels) alongside expanded contractual rights for model training. Raw audio is excluded in this SKU to reduce biometric risk.

Q: Can I mix buckets in a single deal? A: Yes. You can request a cohort-based deal (e.g., 3,000 users for Analysis and 500 for Foundational training). Our Pricing Oracle will calculate a single transparent price for the entire bundle.