Benjamin Cowen, Forward Deployment Machine Learning Engineer at Modal, recently spoke on the topic “What lies beneath the API” and explored the evolving landscape of AI model development and deployment. Cowen discussed the growing trend for enterprises to fine-tune their own models rather than relying solely on generic APIs, and how serverless platforms make this more accessible.
Benjamin Cowen on using modals to fine-tune AI models — from an AI engineer
Visual TL;DR. Frontier API leads to model spectrum. Scratch server leads to model spectrum. Domain-specific models will require fine-tuning. If you need fine-tuning, you need serverless infrastructure. Serverless infrastructure enables accessible and custom AI. The model spectrum indicates domain-specific models. The model spectrum highlights the need for fine-tuning. The primary fine-tune signal signals the need for fine-tuning.
Frontier API: Quick start, no infrastructure required, powerful pre-trained models
Scratch Server: Full control, precise fine-tuning to suit your specific needs
Model Spectrum: Moving from common APIs to custom solutions
Domain-Specific Models: A Growing Trend in Customized AI Performance
Need for fine-tuning: Customization delivers better and more predictable AI performance
Serverless infrastructure: Simplify the AI training and inference process.
Accessible custom AI: Easily fine-tune your enterprise
Key tweaking signals: Identify when custom models are beneficial
Visual TL;DR
Models range: from frontier APIs to custom solutions
Cowen introduced the concept of a “model spectrum” to describe the progression from using readily available “frontier APIs” to building and managing models on “scratch servers.” The Frontier API lets you get started quickly with no infrastructure overhead and provides access to powerful pre-trained models. However, the lack of customization can result in unpredictable performance.
Scratch servers, on the other hand, give you full control and the ability to precisely fine-tune your model to suit your specific needs. This approach allows maximum customization and allows for the definition of custom metrics. The trade-off is a heavy burden on infrastructure management, including cluster management and self-maintenance of the software stack.
The rise of domain-specific models and the need for fine-tuning
Cowen emphasized that as companies mature, they increasingly need to fine-tune their models based on their own data to achieve better performance, lower latency, and custom functionality. He cited examples such as Intercom’s Fin Apex, which reportedly outperformed GPT-5.4 at a fifth of the cost, and Pinterest CEO Ben Silverman’s statement that he achieved “order-of-magnitude cost savings” by tweaking the open source model compared to using Frontier APIs.
This trend signals a shift in the way we view AI. The model becomes the raw material, and the fine-tuned domain-specific system becomes the actual product. Cowen emphasized that this fine-tuning process is becoming more accessible.
Serverless infrastructure for AI training and inference
This presentation showed how serverless platforms like Modal bridge the gap between ease of use and control. Cowen explained that Modal’s infrastructure, including integrated GPUs and a sandbox environment, enables AI training and inference at scale with significantly less code and management overhead.
He demonstrated that fine-tuning models, such as large-scale language models (LLMs) and reinforcement learning (RL) tasks, can be achieved with surprisingly concise codebases (often achievable in as few as 300 lines of Python). This is facilitated by open-source libraries and serverless infrastructure that handle parallel hyperparameter sweeping and scaling.
Cowen provided a code example that shows how to set up a fine-tuning job and deploy a model efficiently. He pointed to the ability to scale containers on demand and the abstraction of infrastructure management as the main benefits of using such a platform. This allows developers to focus on model development and data curation rather than infrastructure plumbing.
Key signals for fine tuning
Cowen also outlined some signals that indicate it’s time for a product to move toward a fine-tuned, domain-specific model.
Despite the quick response, ratings have plateaued.
You need to reduce latency or increase throughput.
Unit economics are not scaled effectively.
Core features are still under development.
There is a lack of relevant data collected for rapid engineering.
He concluded by emphasizing that if your product already includes agent utilization, evaluation suites, AI engineers, and data collection, the hard part of building domain-specific models may already be done and moving to fine-tuning may be the logical next step.