The litmus test for AI startups

AI For Business


Right now, there’s nothing cooler than saying you’re building a company powered by AI or GPT4. But being cool is not enough. Jack Dorsey didn’t raise money by saying Twitter was building a social media site on his Heroku.

When Spanning Labs (Drew’s company) was building the Web3 infrastructure, we realized that we always needed to explain that it was not the technology that looked for the problem, but rather the opposite. Too many people have spent years building and investing in Web3 technology instead of doing business. Similar patterns are emerging in generative AI today.

Generative AI is a powerful toolset.that do change the game. But this is not the end.

This is a double-edged sword. Building AI has never been easier. The (great) benefit is that previously impossible (or previously only linearly scalable) business ideas can now be turned into great, money-making investments. Incumbents are also leveraging AI to build new efficiencies to go farther and faster. All this is very exciting. The downside is that it creates a lot of waste that has to be sifted anew.

We recently attended a demo day with three promising companies. Then I looked at the open source library that the company had just built skins for. At this point, it can be difficult to determine what makes a weekend project built on top of an open source project more difficult, unique, or defendable. Founders and investors need to answer a few litmus paper questions when evaluating the next generative AI company, especially in a moment of rapid growth and consensus.

Luckily, there are ways that even a small team can build something if they know where to look. Harsh, Unique, Defendable in today’s AI. So we (Morgan and Drew, yes we’re brothers) pooled our brains over a recent family dinner, founders proposing and evaluating new startups each incorporating generative AI. and compiled notes and red flags for investors.

Only AI inside

The first note is simple and most important. Is this still good business if you remove references to AI from your proposal?

AI can be integral to solving problems and extending solutions, but we treat it as a black box in this test. If you’re building a product that users love, and you can defend it in ways other than AI, congratulations! (and come talk to us).

Notice how good AI inside is tied to your core business value proposition. “ChatGPT for X” is not a business. X is business. From there, X can be evaluated using the same systems and approaches already validated. Is X a large enough market? Are people willing to pay for X? Can X be defended in any way?

If you’re building a product whose quality relies heavily on generative AI, others with access to the underlying models (i.e. everyone) may shine as well. On the one hand, the end he understands the user better than anyone else, so if the user her experience is unparalleled, he might be achieving something.

Good business is good business.

Where are you on the spectrum of correctness?

AI models get things wrong. These are probabilistic solutions to problems. For most LLMs, guess the next most likely word. This means they hallucinate facts, draw false conclusions, and lie. As models get smarter and attention windows get bigger, this problem is getting better, but it will always be a problem.

For you, this means several things. First, consider how often your product can fail on a task and still be worth it. Chatbots that provide medical advice directly to patients need an inherently perfect answer bar. Chatbots that give medical advice to doctors are simply one of many tools, but the hurdles to clear are lower. Consider the location of your bar and make sure you are serving the right product to the right people with this in mind.

By focusing on the right industries and user personas, you can turn this problem into an advantage. In some creative fields where there is not always a “wrong” answer, these hallucinations can be a valuable expression.

The self-driving car industry has spent most of its time on this accuracy problem and the frameworks for solving it in complex models and systems. This is commonly referred to as the Operational Design Domain (ODD). By limiting his ODD of the AI, it becomes much easier to test and verify the correctness of the pipeline. For self-driving cars, this means confining the vehicle to very specific road conditions, maps and scenarios, simulating millions of miles driven.

The simplest example of this method is an LLM with no verified ODDs. If you ask the model a question or a task, it just says that it doesn’t know how to answer or how to proceed. “I don’t know” isn’t particularly helpful, but at least it’s not wrong.

Donald Rumsfeld said: So, we know there are some things we don’t know. But there are also unknown unknowns. What we don’t know, what we don’t know… It’s the latter category that tends to be the hard one. “

If we can define all the “knowns” and build guardrails that prevent the “unknown unknowns,” we can achieve a high standard of accuracy for certain use cases.

While this is not trivial and very case-specific, it is possible to define an ODD and restrict products to that ODD, making businesses that require a high level of accuracy for a wide range of applications fairly unique. can be made into

Differentiate your AI pipeline

We are moving further and further into a world where the underlying model is king. You can’t expect to build and maintain the cutting edge of AI research without billions of dollars.

In other words, we need to focus our time on what we can build around these basic models and how we can optimize them. of NFX A 5-layer generative tech stack is a great place to start.

Remember that foundation models are just foundations. There are many areas around basic model inference where you can build solid differentiation in your AI pipeline.

Some examples:

– Choose the right base model for the right task. Summarizing information like GPTNeox and entering it into the context window of GPT3 allows you to optimize performance and infrastructure costs.

– Rapidly engineer and preprocess data using custom embeddings to significantly improve performance for specific generation tasks.

– Post-processing of results to enforce ODD bounds.

– Fine-tune the base model for specific datasets for unique performance.

– A robust infrastructure that supports end-to-end validation pipelines to enable rapid testing, iteration, and refinement into your company’s pipeline.

Evaluate AI startup teams

First, don’t hire on hype.

Founders, think it might be more important to hire an infrastructure engineer with 10 years of experience at Cisco rather than a hot newcomer ML PhD from Stanford University. A lot of what we’ve talked about so far in terms of building differentiation doesn’t actually require a ton of research-focused ML engineers.

Don’t get me wrong. You need in-house experts who can read the latest papers and incorporate that technology into your product. And we don’t lie. At this point, hiring an ML PhD will probably make early-stage funding a little easier. But how quickly and well these changes can be incorporated usually boils down to having the right infrastructure. Hiring leading infrastructure makes it easier to build and maintain cutting-edge products and sustainable businesses.

The balance between flashiness and sustainability is a difficult one. It’s great to be able to do new research and contribute to advances in AI capabilities, but without a very large research team, turning it into a repeatable process is difficult.

The key lesson here for both founders and investors is that good infrastructure engineers should be considered some of the best production ML engineers. In fact, ML engineers who have experience working on production systems can spend a lot of time working on infrastructure.

Another lesson is that in the age of generative AI, startups don’t need as many people as you think. And you might be surprised at the kind of talent you really need.

some good litmus paper

Here are some specific questions you should ask yourself and investors will no doubt ask.

Q: Do foundation models eat your lunch?

  • What are you spending your time doing that would be even better with the introduction of the new base model?
  • What are you building in your AI pipeline around calls to patentable base models?

Q: If someone else had the same idea and had access to the same model (which they do), what would they build that would be unique and difficult that they wouldn’t be able to do?

  • Is differentiation reflected in pre-processing, post-processing, test pipelines, etc.?
  • Or does it exist outside the tech stack in terms of user experience, business model, or problem focus?

Q: How often must the user give the right answer for the product to be viable?

  • How do you define ODD and build guardrails around the known unknown and the unknown unknown?
  • How do you measure correctness?

Q: How long does it take users to use ChatGPT+ to perform the same tasks as the core product?

  • Is it just a matter of knowing the right prompts to add?
  • Need a fair amount of configuration, like copying and pasting into other documents?
  • Is it even possible?

Q: How do you validate that changes to your AI pipeline have improved your product, and how long does that validation take?

  • What metrics do you use to measure improvement, and what are the inherent biases in it?
  • Are your processes manual or automatic?
  • If automated, does the validation pipeline make semantic sense? Is the reason for the metric change obvious?

Q: What makes you the right team to do this?

  • How well do you know your customer personas?
  • Do you have experience setting up and optimizing a production ML infrastructure?
  • Do you understand how to evaluate and test the strengths and weaknesses of AI models?





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *