Goldman believes tracking AI usage misses the point

As American companies scramble to measure how their employees are using AI, Goldman Sachs is taking a different path.

Many companies are working on tracking individuals. JPMorgan monitors a dashboard that displays AI-related activity for tens of thousands of users, allowing employees to compare themselves to their colleagues. Social media giant Meta Inc. is installing software on the computers of its U.S. employees that tracks keystrokes and mouse movements to train its AI, Business Insider reported last month.

Marco Argenti, Goldman Sachs’ chief information officer, is focused on using AI tools to assess team velocity rather than focusing on individual user metrics, which he says can result in “not seeing the forest for the trees.”

Argenti, who oversees about 12,000 engineers, is steering the company through rapid change as AI reshapes the way developers create software. He focuses on how quickly Goldman engineers move from idea to product, whether their output is actually improving, and how long it takes to go from an innovative idea to a product ready for deployment.

While Goldman has access to data on individuals’ use of its tools, including its own AI products, the company is more focused on gaining a team-wide perspective to shorten project timelines, perform quality control evaluations, and track the consumption of AI tokens for budgeting. The bank hasn’t built a tracking dashboard to enforce the use of AI so developers can proactively compare their adoption rates with their peers.

I sat down with Argenti to discuss how Goldman defines developer success in the age of AI, and why he says monitoring developer activity in isolation risks missing the point.

Below is our conversation. Edited for length and clarity.

There is debate as to whether or not to track it. What do you think as a manager? Are some paths more effective than others in accelerating AI adoption?

Because work is typically done by teams, now hybrid teams of agents and humans, we tend to focus on team metrics. Most of the time, it’s the speed at which features are developed.

We focus on flow: how long it takes to go from idea to production. You know it when your team has a certain backlog and suddenly you see that backlog start to burn out.

Why is it more effective to consider things on a team level rather than on an individual basis?

When you look at individuals, you really can’t see the forest for the trees. It would be like watching just one player on the field.

Okay, this player is moving more, so why can’t I score more goals? Well, because they need to pass the ball.

What is the right way to analyze how AI is improving engineer productivity?

As we know, measuring developer productivity is something companies have been chasing forever. There is also no single magic metric, as some companies ask, “How many lines of code?” But that’s not a very good method. After all, what constitutes useful output isn’t necessarily the number of lines of code.

Let’s say you join a fitness training program. It’s probably more effective to look at changes in some of your vitals rather than looking at individual numbers. If your cholesterol is starting to drop or your blood sugar levels are starting to reach better levels than before, that may be a sign that you’re on the right track.

Another big topic on everyone’s minds is the rising cost of tokens. How do you measure whether your spending is producing demonstrable results?

If you’re using a lot of tokens but the output isn’t working, it means you’re probably still in the experimental stage at that point. We have identified a threshold. Below that, there was no change in the output metrics, but above that threshold, productivity began to improve.

Our research shows that people are going back and forth with AI for the plan itself, using tokens to document implementation plans and business requirements before diving into coding. This preparatory work is done before the developer begins writing code, so the coding output is not immediately available.

Therefore, although token usage is accelerating, there is no immediate change in output. Once the plan is created, the agent begins building the code. After that, we will see that the consumption of tokens will increase further and the result will appear in the form of coding output.

How do engineers feel about the usefulness of AI to complete tasks quickly?

We crossed a tipping point where excitement outweighed fear.

I actually just came out of a little showcase, an innovation type of conference that we have. The dominant emotion is actually a feeling of empowerment. People almost feel liberated. Of course, there was actually a little bit of doubt and fear a few weeks or months ago, but I associate it with people who aren’t actually using it.

How does that speed change the way they present their work to you? Are you seeing a shift from a “PowerPoint culture” to something more hands-on?

They come to us with very specific problems that they have solved. They start prototyping new products almost immediately. In some cases, before the idea is fully formalized. Today, prototyping can be done in near real time. Even if you’re in a meeting, you can just talk to them and have them make changes right in front of you.

In the past, they would have come in with a PowerPoint or a six-page handout. I had to imagine it. I saw the real thing today. You can literally say, “How about this?” And at the meeting you can make changes. The time from idea to prototype is zero. It’s like “3D printing” software.