Training Data Inventory
Data Inventory and Rights
Leena AI maintains an inventory of data used to fine-tune the model. Leena AI has implemented processes to ensure appropriate rights and permissions are obtained for the data used in training and fine-tuning their model.
Their approach involves using three distinct data sources, each with its own considerations for rights and permissions:
Data Sources
1. Manually Curated Internal Data
- Source: Internal team.
- Time frame: Collected over the year.
- Rights consideration: As this data is generated internally, Leena AI likely has full rights to its use.
2. Public Permissible License Data
- Source: Public datasets.
- Focus: Specific to use cases Leena AI addresses.
- Rights consideration: Leena AI uses data with permissible licenses, ensuring legal compliance for its intended use.
3. Synthetic Data
- Source: Generated from larger open-source models.
- Rights consideration: As synthetic data based on the Apache License model, this avoids many copyright issues associated with real-world data.
Implications
- Comprehensive Data Management: By maintaining an inventory of all training and fine-tuning data, Leena AI demonstrates a commitment to responsible AI development and data governance.
- Legal Compliance: The use of internally generated data, publicly licensed data, and synthetic data suggests a careful approach to avoiding potential copyright or licensing issues.
- Diverse Data Sources: The combination of internal, public, and synthetic data likely provides a rich and varied dataset for training and fine-tuning, potentially leading to a more robust and versatile AI model.
- Transparency: Having a complete inventory allows for greater transparency in the AI development process, which can be crucial for audits, regulatory compliance, or client inquiries.
- Quality Control: An inventory system enables better tracking and management of data quality, allowing for easier identification and correction of any issues in the training data.
- Ethical Considerations: By carefully sourcing and tracking their data, Leena AI is better positioned to address ethical concerns related to data privacy and responsible AI use.
- Iterative Improvement: Maintaining a detailed inventory facilitates ongoing refinement of the model, as the team can easily identify which data sources contribute most effectively to model performance.
Updated 5 days ago
