Image credit: AapfDesign/Getty Images
Data has emerged as one of the world’s greatest resources, powering everything from video recommendation engines and digital banking to the burgeoning AI revolution. But in a world where data is increasingly distributed from databases to data warehouses, data lakes, and other locations, combining all data into a compatible format for use in real-time scenarios is It can be a huge amount of work.
Contextually, applications that do not require immediate real-time data access can simply batch data together at fixed intervals. This so-called “batch data processing” is useful, for example, for processing monthly sales data.However, in many cases the company intention I need to access the created data in real time. This is critical, for example, for customer support software that relies on up-to-date information on all sales.
Elsewhere, ride-hailing apps also need to process all sorts of data points to connect riders and drivers. This is not something you can wait a few days for. This kind of scenario requires something called “stream data processing” to collect the data and combine it for real-time access, which is much more complicated to configure.
This is what Dozer seeks to address by powering fast read-only APIs directly from any source via a plug-and-play data infrastructure backend.
the dozer is a hand job Vivek Gudapuri Matteo Pelati founded his Singapore-based company almost a year ago. This two of him assembled his 10-person team spread across Asia and Eastern Europe to create a fully monetizable version of the product’s currently available source (i.e. not fully open source) incarnation. We are preparing to expand to products.
Dozer has tested their products with a handful of private design partners, but today they’re out of stealth with developer access. The company also revealed that it has raised his $3 million seed funding from Sequoia Capital’s India arm (through its Surge program), his Gradient Ventures at Google, and January Capital.
dispersion
There are already a myriad of tools designed to transform, integrate, and leverage distributed data, including streaming databases and ETL (extract, transform, load) tools like Apache Flink, Airbyte, and Fivetran. Caching layer for temporary data storage such as Redis. And transfer data between systems using instant APIs powered by Hasura, Supabase, and more.
Dozer works across all of these different categories, taking what it deems the best and removing the friction associated with building the infrastructure and plumbing that underpins real-time data apps.
Users connect Dozer to their existing data stacks, including databases, data warehouses, and data lakes. Dozer handles real-time data extraction, caching, indexing, and surfacing via low-latency APIs. So while Airbyte, Fivetran, and others can help you get data into your data warehouse, Dozer focuses on the other side. It’s about “making this data accessible in the most efficient way.” Gudapuri explained to his TechCrunch.
Gudapuri said Dozer “takes a dogmatic approach,” which addresses a very specific problem and nothing more. Providing real-time data updates and APIs in the product solves many problems far beyond what Dozer offers.
“We solve the right amount of issues in each of these categories to give developers a quick build experience and out-of-the-box performance,” said Gudapuri. “Developers (currently) have to integrate several tools to achieve the same thing.”
As an example, your existing streaming database will probably try to provide users with the entire database experience, complete with query engine, data exploration, OLAP (online analytical processing), and more. Dozer deliberately does not provide these capabilities, instead focusing on what Pelati calls “precomputed views” using his SQL, Python and JavaScript, all of which are low-latency You can access it via his gRPC and his REST API.
This is why Dozer can promise better data query latency, according to Perati.
“With these design choices, Dozer provides much better query latencies that customer-facing applications require,” said Pelati. “A single developer can spin up an entire data app in minutes. This typically takes months of effort. and save money.”
(not perfect) open source factors
Dozer is advertised as an “open source” platform, but a quick look at its license on GitHub shows that it uses Elastic License 2.0 (ELv2). This is the same license that enterprise search firm Elastic adopted as part of his transition two years ago. over there From true open source. In fact, the Elastic license is not recognized as open source as it prevents third parties from taking the software and offering it as a hosted or managed service.
More precisely, ELv2 can be called a “source available” license. This means it effectively offers many of the benefits of more permissive open source licenses such as MIT, including codebase transparency, the ability to extend Dozer’s functionality, or tweaks. Features and bug fixes. This alone should be enough to win the hearts and minds of companies of all sizes, unless they are other cloud giants looking to monetize directly on AWS or Dozer.
However, the company says it intends to switch to dual licensing “soon” with all of its core Dozer projects being MIT-licensed except for “one core module.” Additionally, the company emphasizes that all its client libraries, including Python, React, and JavaScript, are already MIT licensed.
It’s worth noting that some companies, such as Netflix, which built Bulldozer a few years ago, have created in-house tools to solve problems similar to what Dozer is working on. Notably, his Ioannis Papapanagiotou, one of the main creators behind Bulldozer, now works as Dozer’s advisor.
Dozer is still in its early stages, but with $3 million in bank funding from a number of high-profile backers, the company has a lot of money to go commercial. Add-on feature. Gudapuri said it will be published in the next few months.
“The hosted service handles autoscaling, instant deployment, security, compliance, rate limiting, and a few additional features,” said Gudapuri.