In the author's Spotlight series, the TDS editor chats with community members about the career paths, writing and sources of inspiration for data science and AI. Today we are excited to share our conversation Ida Silfverskiöld.
IDA is a generalist, economist educated and self-taught in software engineering. She has a professional background in product and marketing management. That means you have an unusual blend of products, marketing and development skills. Over the past few years, she has taught and built in LLM, NLP, and the computer vision space, delving into areas such as agent AI, chain of thinking strategies, and the economics of hosting models.
You studied economics, then learned code, moved a building of products, growth and now practical AI. Does that generalist path give you something the expert sometimes misses?
I don't really understand.
People think of generalists as shallow knowledge, but generalists can also dig deeper.
I consider generalists to be motivated to understand people with multiple interests and not just parts, but the whole. As a generalist, you look at technology, customers, data, market, architecture costs, and more. It gives you the advantage to travel across topics and still does a good job.
I'm not saying experts can't do this, but generalists tend to adapt faster as they're used to picking things up quickly.
You've written a lot recently about agent systems. When will “agents” surpass the simpler LLM + lag patterns and when will things be reduced to excessive?
It depends on the use case, but in general, it throws AI at many things that probably don't need it. If you can control the system programmatically, it is necessary. LLM is great for translating human language into something computer understands, but it also introduces unpredictability.
As for RAG, adding an agent means adding a cost, so doing it just to have an agent is not a great idea. You can avoid using a small model as a router (but this adds work). I once added an agent to the RAG system because I knew there was a question about building it to “action.” Again, it depends on the use case.
When you say you need agent AIevaluation“What is your go-to list of metrics? And how do you decide how to use it?
I don't say you need avoidance all the time, but companies want them, so it's good to know which teams measure for the quality of their products. If the product is used by many people, make sure there are several locations. We have done quite a lot of research here to understand the frameworks and metrics defined.
But the general metrics probably aren't enough. Use cases require some customizations. Therefore, the avoidance varies from application to application.
For coding copilots, you can track the percentage of completions the developer accepts (acceptance rate) and whether the full chat has reached its target (integrity).
For commerce agents, you can measure whether the agent has chosen the right product and whether the answer is based on the store's data.
Security and safety-related metrics are also important, such as bias, toxicity, and how easy it is to break a system (jailbreak, data leak).
For more information about RAG, see my article breaking down regular metrics. Personally, so far I've only set up RAMG metrics.
It might be interesting to map in articles how AI app setup is avoided. For example, Shopify Sidekick for Commerce Agents and other tools such as Legal Research Assistant.
In you Agent RAG Application In the article, I built a slack agent that takes into account the company's knowledge (using Llamaindex and Modal). What design choices are more important than you would expect?
The search part is where you're stuck, especially the chunks. Using a RAG application splits the process into two. The first part is getting the correct information and getting it right is important. To make it accurate, the chunks need to be very small and related to the search query.
However, if you make a chunk too small, you risk giving LLM too little context. A chunk that is too large can cause inaccurate search systems.
I've set up a chunked system based on document type, but for now I have an idea to use context extensions after searching.
Another design choice that needs to bear in mind is that searches often benefit from hybrid search, but that may not be enough. Semantic search allows you to connect questions that answer without using accurate language, but sparse methods allow you to identify accurate keywords. However, sparse methods like the BM25 are token-based by default, so plain BM25 does not match substrings.
So if you want to search for substrings (part of the product ID, part of such things), you need to add a search layer that supports partial matches.
There's a lot more, but if I continue, I risk this becoming the whole article.
What issues have you encountered most frequently for your clients in your entire consulting project over the past two years? And how do you deal with it?
The problem I see is that most companies are looking for the best practices for consultants, but it's full of complexity within the company, especially for people who have never done it before. We found 95% of the numbers from MIT studies on the failure of the project, and we are not surprised. I think consultants need to get in specific use cases where they can quickly implement and tune products for their clients. But we will see what happens.
I write about so many different topics on TDS. Where do your article ideas come from? Client work, tools you want to try, or your own experiment? And what are the best topics and issues for you right now?
Frankly, all a little. The article may also root my own knowledge and fill in the lacking work, and have not yet studied myself. We are currently doing some research into whether agent systems, security, and how to improve rags in particular, smaller models (medium size, about 3B-7B) can be used.
Zoom out: What non-trivial ability teams should be truly productive, rather than just AI-Busy, to nurture over the next 12-18 months (technical or cultural)?
Perhaps learning to build in space (especially for businessmen): Doing something consistent with LLM is a way to understand how unpredictable LLM is unclear. It makes you a little more humble.
To learn more about IDA's work and stay up to date with the latest articles, you can follow her on TDS or LinkedIn.
