Product Overview of LastMile AI
LastMile AI is a cutting-edge developer platform designed to facilitate the development, debugging, evaluation, and improvement of generative artificial intelligence (AI) applications, particularly focusing on large language models (LLMs). Here’s a detailed look at what the product does and its key features.
Core Purpose
LastMile AI addresses the “last mile” problem in generative AI development, which involves the lack of robust evaluation metrics to measure the performance, accuracy, safety, and reliability of AI applications. The platform is tailored for engineering and product teams to build, test, and deploy AI-driven apps efficiently.
Key Features and Functionality
AutoEval
AutoEval is a central component of the LastMile AI platform. It allows developers to create highly customizable “evaluator models” to test the capabilities of their generative AI applications. Here are some key aspects of AutoEval:
- Custom Metrics: Developers can design and fine-tune custom evaluation metrics tailored to their specific application needs. This includes metrics such as faithfulness, relevance, toxicity, summarization quality, and more.
- Synthetic Label Generation: AutoEval can generate synthetic data labels to augment the original training dataset, reducing the need for manual curation by subject matter experts.
- Fine-Tuning Service: The platform provides a fine-tuning service to develop custom evaluators using labeled datasets, which can be refined with human-in-the-loop feedback.
alBERTa Model
LastMile AI introduces alBERTa, a family of small language models optimized for evaluation tasks. These models are characterized by:
- Small Size: 400M parameters, allowing for efficient deployment on CPUs.
- Fast Inference: Inference can be performed in less than 300 milliseconds.
- Customizability: alBERTa models can be fine-tuned for specific evaluation tasks and can handle various context sizes, including up to 128k tokens.
Real-Time Guardrails
The platform offers real-time guardrails that act as fast online evaluators within the application runtime. These guardrails can check for hallucinations, toxicity, safety, or custom criteria, ensuring the AI application operates within defined parameters.
Evaluation Metrics
LastMile AI provides a suite of out-of-the-box evaluation metrics, including:
- Faithfulness: Measures how adherent an LLM response is to the provided context.
- Relevance: Measures semantic similarity between strings.
- Summarization Quality: Quantifies the quality of summarization responses.
- Toxicity: Quantifies the toxicity level in LLM responses.
Developer Tools and Integration
The platform supports various developer tools and integrations, such as:
- Notebook-like Environments: For engineers to build, test, and iterate on AI applications in a collaborative environment.
- Parametrized Workbooks: For template creation and model output chaining across different modalities.
- API Support: With quickstart guides for Python and Node.js, allowing developers to compute evaluation metrics within minutes.
Security and Privacy
LastMile AI ensures complete control over data by allowing deployments within a Virtual Private Cloud (VPC), maintaining data privacy and security.
Development Approach
The platform promotes an “Eval-Driven Development” approach, which mirrors traditional test-driven development processes. This involves establishing evaluation criteria, measuring baseline performance, and iteratively improving the AI application based on the initial evaluation metrics.
In summary, LastMile AI provides a comprehensive platform for developing, evaluating, and improving generative AI applications, with a strong focus on customization, efficiency, and real-time monitoring. Its innovative features and tools make it an essential resource for engineering and product teams aiming to deploy accurate, safe, and reliable AI-driven applications.