ArcticTraining supports a wide range of techniques beyond full fine-tuning, including LoRA (Low Rank Adaptation) fine-tuning and Arctic Long Sequence Training. The new LLM Tweak Quickstart Guide includes recipes for both full tweaks and LoRA tweaks.
under the hood
ML jobs run on Snowflake Container Runtime, which is preconfigured with GPU drivers, ML frameworks, and Ray. ArcticTraining directly interfaces with Ray for multi-node training and coordinates batch distribution and gradient synchronization between workers. ArcticTraining also integrates with the DataConnector API to efficiently stream table data directly to training workers.
Each ML job mounts a stage volume that exposes the internal Snowflake stage as a filesystem path within the container. ArcticTraining discovers and uses stages mounted at model checkpoints to persist model weights within Snowflake for retrieval and evaluation.
Why LoRA?
Full fine-tuning updates all model weights. For a typical LLM, this means training tens or even hundreds of billions of parameters. This requires a large amount of GPU memory and can take several hours even on high-end hardware.
LoRA takes a different approach. Freeze the pre-trained model and inject a small trainable low-rank matrix into the transformer layer, typically training only 0.1% to 1% of the original parameters. The result is faster training, lower memory usage, and smaller adapter files (often tens of megabytes) that can be swapped during inference. You can train multiple adapters for different tasks and serve them from a single base model.
Evaluation of results
Assessing the quality of text generation is a difficult problem. Traditional text evaluation methods such as BLEU and ROUGE cannot account for paraphrases and synonyms, nor can they account for tone or semantic meaning. On the other hand, human evaluation is subjective and prohibitively expensive at scale. Although the state of the art in this regard is constantly evolving, the most successful approach currently is to use an LLM-as-judge approach, where a powerful supervised model evaluates the generated output against ground truth or predefined scoring criteria. This allows for more scalability than human annotation and more nuanced evaluation than simple text matching.
learn more
We are making significant investments to make advanced ML technology available to all Snowflake customers.
-
Integrated experiment tracking Compare runs, tune hyperparameters, and manage model versions
-
Seamless deployment From training to production inference within Snowflake
- reinforcement learning For further performance improvement
Fine-tuning LLM based on your own data doesn’t require a dedicated infrastructure team or risky data exports. Not so with ML Jobs and ArcticTraining.
Ready to try it for yourself? We’ve published a quickstart guide that walks you through the complete process from data preparation to training to evaluation. This pattern applies to any domain with unique text and structured output requirements, such as financial reports, legal documents, and customer support. Check out our guide.
Beyond the quick start
Our quickstarts prioritize accessibility. It can run in under an hour on mid-tier GPUs, making it easy to experiment and iterate. To fine-tune your production workloads, we’ve compiled a list of recommended optimizations you should consider to achieve more optimal performance in real-world scenarios.
-
Larger base model: We used Qwen3-1.7B for faster training and more memory efficiency. Scaling up the parametric model to 8B or 14B provides stronger baseline functionality and may result in better fine-tuned performance, especially for complex medical inferences.
-
Other training data: of
max_lengthTraining recipe settings exclude long dialogs to fit within GPU memory constraints. Increasing this limit (or using gradient checkpoints to handle longer sequences) retains more training examples and exposes the model to more diverse clinical scenarios. -
Extended training: We trained for just 1-2 epochs to quickly demonstrate the workflow. Running longer training runs with learning rate scheduling and early stopping based on validation metrics can help your model converge more completely.
-
Tuning hyperparameters: LoRA rank, learning rate, and batch size all affect the final quality. Systematic experimentation, easily enabled by the reproducible submission of ML Jobs, can yield meaningful benefits.
Forward-Looking Statements: This content contains forward-looking statements, including regarding future product features. These statements are not promises to deliver any material, code or functionality, and actual results may vary.
