Sumatra - Short Review

Summarizer Tools



Product Overview: Sumatra

Sumatra is an innovative real-time data platform designed to simplify and democratize the deployment of real-time machine learning (ML) and analytics. Founded in 2020 by Greg Kuhlmann and Lucas McGrew, Sumatra aims to empower data scientists of all skill levels to build and deploy real-time data pipelines without the traditional dependencies on data engineering.



Key Features



1. Self-Service Tools

Sumatra provides self-service tools that allow data scientists to build their own streaming data pipelines. This eliminates the need for extensive data engineering, which often delays or blocks real-time ML and analytics deployments.



2. Real-Time Data Processing

The platform is built on a serverless cloud architecture, ensuring scalability and easy deployment. It enables real-time processing of event-based data, making it suitable for various applications such as fraud prevention, cybersecurity, conversion optimization, credit underwriting, and delivery logistics.



3. Aggregate Functions and Feature Engineering

Sumatra’s platform includes advanced aggregate functions that allow users to compute complex queries over past event data. These functions include exact, approximate, temporal, and collection aggregates, which can be expressed using a simple syntax in Scowl, Sumatra’s query language.



4. Machine Learning Integration

The platform supports the integration of machine learning models directly into the data pipeline. Users can train models in various packages (such as scikit-learn, MLlib, and dataiku) and upload them to Sumatra in PMML format. This allows for fast, self-service deployment of models without managing additional microservices. The platform also supports post-scoring business rules and consistent online-offline prediction.



5. Conversion Optimization and Experimentation

Sumatra Optimize, a component of the Sumatra AI Platform, is designed for conversion optimization and experimentation. It features an easy-to-use UI, a reliable visual editor, and native integration with tools like Framer. The tool automates the analysis and improvement of experiments using AI, making it easier to build and optimize experiments.



Functionality

  • Event-First Data Engine: Sumatra operates on an event-first data engine, which processes data in real-time, enabling immediate insights and actions.
  • Scalability: The serverless cloud architecture ensures that the platform can scale to meet the needs of various applications and organizations.
  • User-Friendly Interface: The platform offers a user-friendly interface that simplifies the process of building and deploying real-time ML and analytics services, even for data scientists with limited experience.
  • Collaboration and Use Cases: Sumatra’s tools are designed to be versatile, allowing teams to quickly leverage the platform for new use cases across the organization. For example, The Zebra, a leading insurance comparison site, has used Sumatra to gain real-time access to customer behavior data and deploy multiple real-time ML and analytics services.


Benefits

  • Democratization of Real-Time ML: By empowering data scientists to deploy real-time ML without relying on costly and over-stretched data engineers, Sumatra democratizes access to advanced analytics and ML capabilities.
  • Fast Deployment: The platform enables quick iteration and deployment of real-time ML and analytics services, which can significantly enhance the value created by these technologies.
  • Focus on Core Business: Organizations can focus on their core business while leveraging Sumatra to build and deploy real-time data capabilities, reducing the complexity and cost associated with in-house development.

In summary, Sumatra is a powerful real-time data platform that simplifies the deployment of machine learning and analytics, making it accessible to data scientists of all levels and enabling organizations to derive immediate value from their data.

Scroll to Top