Machine learning is no longer a niche tool. It drives decisions that affect billions of dollars and millions of lives. Whether you approve loans, forecast global demand, or propose the right seller strategy, the models behind these options must be accurate, fair and explainable.
That's where Hatim Kagarwara appears. In addition to being a data scientist, he is an expert in building machine learning solutions with margins of errors that are thin. As the first data scientist at Reliability Capital, he developed a credit model from scratch. At American Express, his predictions shaped regulatory strategies. At Amazon, his research on causal reasoning and credit scoring helped generate more than $500 million in incremental revenue.
In this interview, Hatim shares an approach to high-stakes modeling, what it takes to build trust in a data-scars environment, and why models are more important than ever when they are business.
Hatim, you have helped to build Amazon's credit risk features for customers in emerging markets. How did the company motivated to expand this direction and how did it get involved in the work?
In today's consumer economy, flexible and innovative payment options are not luxury. They are looking forward to it. Large retailers like Best Buy, Macy's and Target offer co-branded credit cards to build loyalty and boost purchasing power. Even small purchases can be made using the buy now, like food orders, and pay later services from companies like Klarna. These changes reflect a wider range of changes. Especially in a fast-growing economy, retailers need financial products that reduce friction and build customer trust to maintain competitiveness and access.
Amazon had a clear focus on expanding digital commerce in emerging markets. However, in many of these regions, traditional credit systems are either limited or unreliable. This has created an opportunity to design custom credit risk models that can safely expand purchasing power despite the lack of traditional financial data.
We participated in the initiative based on our previous experience with credit and fraud risks with American Express. Working with a cross-functional team, I helped build machine learning models tailored to the specific challenges of these markets. It was an opportunity to combine applied science with direct business impact and create something from the ground up. For example, one emerging market used alternative data sources such as mobile top-up behavior and delivery reliability to estimate creditworthiness.
What were the technical and operational challenges in developing credit risk models for markets with limited access to traditional financial data?
One of the biggest challenges in emerging markets is the lack of reliable data. Traditional credit bureaus do not exist or have very limited coverage, making it difficult to assess creditworthiness using traditional methods. We had to find creative solutions. We are looking for alternative signals that will help you make responsible lending decisions while managing risk.
Another major challenge was action. Consumers in these markets often respond to credit products very differently than consumers in more developed countries. Financial literacy can vary and there may be cultural nuances in how credit is perceived and used. For example, standard financial history often does not reflect the way consumers actually interact with credit, so we considered behavioral shopping data and mobile device metadata as proxy for credit behavior.
Furthermore, in many cases, there is a legacy of predatory lending practices in some areas, which brings deep distrust in the provision of new credits. So, beyond modeling and data work, the focus was on building trust, transparent, fair and designed products that suited local needs. For some of these projects, we have begun using causal machine learning, especially when we had to estimate long-term business outcomes. As a result, we were able to see not only what would happen, but what actually moved the needle.
Leverage advanced machine learning technology to rank customers by credit risk. What guided you through your modeling approach? Also, how did you measure its effectiveness?
When building credit models in a data-saver environment, solutions beyond binary classification were needed to provide a relative sense of credit across customers. The ranking approach provided a more flexible and nuanced way to prioritize decisions, especially when ground-truth labels are limited or noisy.
We investigated a variety of machine learning techniques that allow us to effectively learn patterns from alternative data, while maintaining interpretability and fairness. The focus has been on building models that are well generalized across regions and customer segments without relying heavily on traditional credit metrics.
To assess performance, we used a combination of ranking-specific metrics and business-aligned results. This includes how well the model distinguishes between low-risk customers and high-risk customers, and how the forecasts were translated into repayment behaviors and default rates. We also tracked fairness and explainability metrics to ensure that the model is in line with the broader principles of responsible AI and fair lending.
We mentioned the application of causal machine learning. Please tell me more about that.
Causal learning is fundamentally different from traditional predictive modeling. Traditional machine learning typically has ground truth results and evaluates the performance of a model based on how accurately they predict those results. However, causal inference estimates what will happen if a particular action is not taken.
For example, in a healthcare setting, this could mean estimating how a patient responded if he was not receiving treatment. In business contexts, we often measure the true impact of programs and interventions, such as marketing campaigns and policy changes, by comparing actual results with estimated results in counterfactual scenarios.
This is still an emerging field, but it is gaining momentum across the industry. Large companies like Amazon, Google, and Netflix are investing heavily in causal methods to help drive better decisions. Causal models not only predict what is likely to happen, but also help to prioritize what should be done to achieve the best outcome.
Amazon applies these techniques to assess the impact of key programs (both financial and behavior) and focuses on initiatives that are most effective when leaders face competing priorities. One such model is the potential sales lift, which uses causal inference to quantify how seller revenues change with different list actions.
American Express was involved in capital stress testing as part of a comprehensive capital analysis and review (CCAR). What was your contribution to this process? And why was it important to the company's financial stability?
American Express worked on statistical models used in the comprehensive capital analysis and review (CCAR) process led by the Federal Reserve. My specific contributions included developing a predictive model for credit card spending and 13 quarter horizon payment rates based on the Fed-defined macroeconomic scenario. These forecasts served as fundamental inputs to our forecast P&L and capital plans.
The main goal of this work was to ensure that American Express maintains sufficient capital buffer to withstand a severe economic downturn. This process is not only essential for internal risk management, but also serves as a public signal for the company's financial resilience. Failure to meet the Fed's standards could result in significant regulatory penalties and restrictions on shareholder distributions.
The importance of robust capital planning has become even more apparent in recent years. For example, the 2023 Silicon Valley Bank's failure highlighted how the lack of liquidity and interest rate risk stress testing plans lead to rapid losses in reliability and ultimate collapse. Institutions that take regulatory stress testing seriously are better at navigating uncertainty and maintaining trust with regulators, investors and customers.
Your career spans both fintech and e-commerce. What differences do you see in how data science applies to these two industries?
Both FinTech and e-commerce rely heavily on data science, but the interests and goals of these applications can be completely different.
Interests are very high in fintech, especially in areas such as credit risk and fraud detection. Decisions often have direct financial consequences for the individual, such as someone approving the loan or the amount they receive. These decisions must be easy to explain, fair and comply with a strict regulatory framework. For example, during my time with American Express, I was keenly aware that even small modeling errors could cause regulatory scrutiny or negatively affect the financial well-being of customers. It instilled me a deep sense of responsibility for model governance, fairness and transparency.
In contrast, e-commerce space is still data-driven and complex, but tends to allow for larger experiments. Amazon has worked on a wide range of machine learning initiatives, ranging from causal inference to credit models for customers with limited or no credit history. Many of these projects allow for rapid testing and iteration, allowing long-term results to be experimented, learned and optimized. Although models still need to be robust and responsible, they are generally more resistant to failures during early development, especially when testing new features and recommended strategies.
That said, my experience in both domains shows how data science skills can be transferred across industries. The goals may vary, but the underlying principles of fintech risk mitigation and e-commerce customer experience – responsible modeling, experimentation, and impact measurement remain the same. This crossover allowed us to apply risk perception thinking in rapidly moving environments and bring an experimental-driven approach to more regulated settings.
Many of your projects come with high business interests and significant uncertainty. How do you manage the pressure when millions of dollars or millions of users rely on model accuracy?
When working on projects that impact millions of dollars or millions of users, I manage the pressure by grounding everything with disciplined processes and clear communication. First, make sure your model development pipeline is strict. From data verification to feature selection, to interpretability. No matter how innovative a technology is, both technical and business stakeholders need to be auditable, repeatable and justified.
Second, we emphasize stress testing and scenario analysis early in the process. Understanding where a model breaks or how sensitive it is to certain assumptions is key to building trust and resilience in a high-stakes environment. It also relies on a causal inference framework where it is necessary to assess not only predictive performance, but also true business impact.
Finally, I believe in transparency. When interests are high, it is essential to clearly communicate trade-offs, risks, and leadership limitations. I have discovered that having ahead of time about what a model can and can't do leads to building reliability and better decisions. Pressure is part of the job, but the right tools, mindset, and collaboration make it easier to manage and even motivate.
