ai -native Office Suite-AI can work for you?

Applications of AI


AI is no longer just a feature, it's becoming a teammate! From drafting drafts to designing slides, researching markets, or building financial models, a new layer of “agents” tools are emerging, similar to AI-Native Office Suite.

But here is the challenge. For now, the landscape is fragmented and new tools are emerging each week. Anthropic launched Claude's “Create and Edit Files” feature this week! Consumers wonder: which tools should you actually use, and in which scenarios can you embed agent tools in your daily work?

To find out how these tools actually work, we mapped the market and benchmarked AI-native tools across a variety of everyday office tasks. Our benchmark found impressive performance with many generalist tools, some outstanding vertical applications, and some tips on how the market is developing.

Two paths to agent productivity: generalist vs specialist

The market is split into two approaches to agent productivity. On one side there is a “all-in-one” horizontal tool built to handle all the apps and tasks. Another is a vertical specialist designed to go deep into a single workflow, such as email, slides, and spreadsheets. Both have evolved quickly, and both have trade-offs.

Generalist – Horizontal Tool
Generalist Tools are designed Flexibility. You can move across a variety of contexts, apps and tasks, but often at the expense of polish and accuracy. Three forms stand out within this camp.

  1. General assistants: Typically, horizontal web tools, which are multimodal, prompt-based, and sometimes memory-enabled, perform several types of tasks.
    • Examples: operator, manus, Genspark.
  2. Agent Browser: Autonomous browsing and task execution throughout the web. Several options, such as Comet, add more sophisticated features, such as shortcuts that replay workflows when triggered by keywords.
    • Examples: dia, perplexity comet, browserbase.
  3. Browser extension: Lightweight helper overlays existing workflows and interfaces.
    • Examples: Makisai, Merlin, Monica.

Specialist – Vertical Tools
Specialist tools are built Depth and Reliability. Instead of trying to do it all, these tools focus on structured workflows where trust, polish and user control are key. Today's vertical landscape is anchored by tools that cover the core professional workflow.

  1. Email Assistant: Assistant with structured drafts responding, managing inbox triage and handling scheduling tasks.
    • Examples: Fyxer, Serif, Jace.
  2. Presentation Tools: An AI-powered tool that slides with an emphasis on visual design, speed and editing.
    • Examples: gamma, chronicles, beautiful.
  3. Notes and Documentation Tools: Tools for structured writing, note acquisition, knowledge capture, and collaborative editing.
    • Examples: MEM, concepts, granola.
  4. Spreadsheet Tool: Applications that handle data extraction, formatting, and analysis. It can be extended in the direction of research and workflow.
    • Examples: paradigms, shortcuts, meridians, Julius.

Benchmark: Do these products actually work?

To see how these tools work in real-world tasks, we tested them against benchmarks to measure where they were successful and missing.

The prompts are designed to span six core dimensions: abstract, communication, file understanding, research, planning, and execution.

Use Case 1: PowerPoint
prompt:
Design a visually heavy 7-slide deck on the trends in Gen Z internet behavior in 2025.

Chart showing how different AI companies work in tasks: 7-slide deck design

Gamma acts as a vertical AI-powered presentation tool with built-in templates and design features that can generate decks in less than 2 minutes. As a complete presentation editor, it offers a wide range of controls for editing after generation. Users can adjust layouts, change visuals and fonts, add charts, and encourage AI to suggest text or designs.

Genspark and Manus operate as general assistants, and tend to produce decks with more content, often closer to research reports. Their outputs take longer to produce, but tend to exhibit deeper analysis and stronger, faster alignment. The CHATGPT agent created a simpler deck similar to text-based reports. This has weak design capabilities and much longer generation times.

Humanity began creating and editing files in Claude this week. The presentation generation task is the fastest general purpose agent tested, but the design needs improvements.

Overall, gamma is the best choice if you need a presentation for external use where visual quality and post-generation control is critical. If you're looking for a deck with a lot of content to encourage research and analysis, Genspark is the better choice.

Use Case 2: Spreadsheet
prompt: Extract all data from this PDF and calculate the operating margin.

Chart showing how different AI companies work in tasks: extract data from PDFs and calculate the behavioral margin.

Spreadsheets are a sophisticated use case. Their complexity is particularly pronounced in outputs such as complex financial models where both format and precise accuracy are important. Still, AI spreadsheet tools are beginning to show signs of working on more basic and medium sized tasks, such as extracting data from PDFs and performing basic financial calculations.

In this test, I uploaded a page for S-1 filing and asked the tool to calculate the company's sales margin. Among the horizontal agents, Manus performed best. Data was extracted into a structured spreadsheet format and immediately returned accurate results. Claude was also the fastest in spreadsheet tasks and produced the correct answer, but its output was limited. We were unable to provide minimal analysis and pull the complete set of data into the sheet.

The vertical Excel-focused shortcuts as an agent provided more comprehensive analysis in the native Excel environment, but it took longer to run and extract only the data related to the calculation rather than the complete dataset.

Use Case 3: Email
prompt: Email to schedule dinner next Thursday

Chart showing how different AI companies work on tasks: Email to schedule dinner

Fyxer, Serif, and Jace act as vertical assistants for emails. Each can generate a competent draft and maintain context across the thread. Serif stands out at its level of customization. Supports playbooks, email labels and preference settings. It provides a way for users to encode best practices and apply consistent workflows to similar scenarios.

The scheduling approach is forked, but everything could be done with simple scheduling tasks.

  • Serif allows asynchronous adjustments. You can copy email agents to handle pre- and post-scheduling and send calendar invitations.
  • Fyxer generates Calendly style links so that others can reserve their time.
  • Jace generates an event, but waits for user approval before sending it.

In contrast, Comet brings general assistant functions to email. You can follow the prompts to schedule a meeting, send invitations, and search for inboxes. However, I feel that drafting has less adjustments compared to dedicated email assistants, as it incorporates customization features such as playbooks, labels, and preferences.

Use Case 4: Investigation
prompt: Compare the latest quarterly cloud revenue growth from Microsoft, Amazon, and Google in a table using sources, and analyze the results in a short report.

A chart showing how different AI companies work in tasks: Summary and compare quarterly cloud revenue growth in tables.

AI tools allow consumers to generate deep, research-based analyses in seconds.

All of the products we tested were able to extract the correct number of cloud revenue growth and organize them into a table. The difference came from the nuance and speed. It reflects the underlying optimizations and constraints of each product.

Comet and Dia were the fastest two AI-Native browsers. They returned the results within 20 seconds, but their output was lighter in analysis, with less structure compared to Manus, providing a more comprehensive table and deeper explanation of the driver behind the numbers.

The quality of the sauces also vary. The agents of Comet and ChatGpt stood out for drawing directly from prestigious sources such as revenue reports and Yahoo Finance.

Overall, the trade-offs are clear. If you prioritize deeper analysis and are not sensitive to process time, Manus is the strongest choice. If you value speed and need a quick and decent answer, comets are better.

Use Case 5: Get a Note
Notepads will be held during meetings

Chart showing how different AI companies work in tasks: take meeting notes

Taking meeting notes is one of the most natural AI applications, saving consumer energy by focusing on conversation rather than typing. Tools in this category usually work in Notepad format and automatically transcription and structure discussions, but ChatGpt's record mode offers a lighter alternative. All products support searching for benchmarked support through keyword search, but their strengths are divided into note quality, customization, and collaboration across collaboration.

MEM creates the most thorough records and captures discussions and action items in detail, but ChatGpt's record mode provides a high-level summary that is easy but not perfect for skim. Granola distinguishes it from customizable templates that adapt to different meeting types, allowing users to have more control over their structure and output.

Granola, Mem, and concepts all allow users to prepare notes in advance, add guidance during meetings, and follow real-time transcription. The concept stands out in collaboration. Tasks are assigned directly within the notes and synchronized to the concept calendar, along with a wider team workflow.

Overall, if you want a comprehensive capture, MEM is the best option. Granola is excellent for its construction and customization. And for team coordination, the concept is the strongest choice.

Observations from the test

After running the tests on these use cases, several patterns have emerged.

  1. The patterns of differentiation are already clear. Vertical products stand out through design and workflow polish. Focus on the work “surface” or canvas and embed deeply into your professional workflow. This makes it particularly strong in external use cases where refinement and presentation are important. In contrast, horizontal products emphasize width. Compete for an “all-in-one” entry point by stacking adjacent tasks. Manus, for example, already spans research, presentations and spreadsheets, positioning him as a single place where work begins.
  2. Competition for horizontal products from model companies is intensifying. Typical assistants and agent browsers are taking part in the race to become the core UI of the job. Given the importance of both speed and accuracy, companies closer to model development may be more likely to offer. Major Research Labs are still taking part in the race: Humanity recently launched Browser co-pilot Claude is expected to see more attempts from Openai and other players.
  3. Convergence is coming. The sharp lines between vertical and horizontal agents are beginning to blur as vertical products try to “jump” to new categories and horizontal platforms that double down on vertical products and popular use cases. If you are building vertically, make sure you keep up with the latest model primitives and build them. If you are building horizontally, the workflow and iteration loop should be deep enough to prevent vertical players from carving out use cases.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *