Financial institutions are working hard to stop fraud before it occurs, but the sheer speed of digital transactions complicates this challenge. Traditional methods often require slow batch processing or the addition of separate streaming engines, which complicates operations and delays discovery. Databricks aims to simplify this with a new solution that combines Spark real-time mode and Lakebase for end-to-end fraud detection in a single platform.
Visual TL;DR. The speed of digital fraud leads to complex infrastructure. Complex infrastructure solves Spark real-time mode. Spark real-time mode is integrated with Lakebase. Lakebase leads to an integration platform. An integrated platform results in fraud detection in milliseconds. Spark real-time mode enables millisecond fraud detection. Lakebase simplifies operations. A unified platform simplifies operations.
The speed of digital fraud: Fraudsters exploit stolen card details in seconds, making real-time intervention critical
Complex infrastructure: Adding separate streaming engines duplicates systems and splits governance
Spark real-time mode: sub-second processing without the overhead of traditional streaming engines
Lakebase: Integrated Postgres for low-latency delivery of fraud detection results
Unified Platform: Integrate Spark RTM and Lakebase on a single platform
Fraud detection in milliseconds: Enabling financial institutions to stop fraud before it occurs.
Simplify operations: Eliminate complex infrastructure and reduce engineering burden.
Visual TL;DR
The core issues are speed and simplicity. Fraudsters can exploit stolen card details within seconds, making real-time intervention critical. However, building and managing a separate streaming infrastructure alongside your existing data platform creates duplication of systems, split governance, and increased engineering burden. This dual-system approach has historically forced a choice between speed and ease of operation.
Spark real-time mode: sub-second processing with no overhead
Spark Real-Time Mode (RTM) is an evolution of Spark Structured Streaming designed for latency-sensitive applications. It reportedly delivers sub-300ms stream processing, outperforms Apache Flink for key workloads, and enables companies like Coinbase to compute hundreds of ML functions with sub-100ms latency. Importantly, RTM works within the existing Spark engine, eliminating the need for a separate streaming stack. This integration allows the same code used for offline training to be applied to real-time scoring, preventing logic drift. It also consolidates operational tools and reduces on-call responsibilities.
As we discussed in the article Spark Streaming reaches millisecond latencies, this technology is an important step toward achieving low-latency processing.
Lakebase: Integrated Postgres to provide low-latency services
The solution also leverages Databricks Lakebase, a fully managed serverless PostgreSQL database built within the Databricks platform. Lakebase acts as a low-latency service layer for enhancements, providing context from merchant risk profiles and cardholder data. This avoids the delays typically associated with broadcast joins in streaming pipelines.
This architecture, demonstrated through a credit card transaction scenario, ingests data from Kafka and uses Spark RTM to process the data for parsing, velocity tracking, enrichment, and scoring. The decision is then sent to approve, flag, or block the transaction. End-to-end latency testing shows P99 performance between 215 and 392 ms, validating production readiness without external infrastructure.
Upgrade to machine learning
This solution goes beyond static rules and integrates machine learning models. This upgrade reduces false positives and allows us to adapt to evolving fraud patterns. MLflow’s experiment tracking and version control provides the model lineage needed for regulatory compliance. Lakebase is continuously updated with per-card features to enable dynamic model scoring.