29 de maio de 2026Ler em português

Between hype and cynicism: why we need to test AI better

Between shallow enthusiasm and lazy skepticism, the most honest way to understand AI is to test it better.

As much as I try to write something timeless, this text could go stale at any moment. So keep in mind that it was written in May 2026.

When the subject is AI, I mostly see two big arguments:

It's useless
It's going to replace even your dog

The two sound like opposites, but they come from the same problem: these people aren't actually testing AI tools before reaching a definitive conclusion.

On one side, you have people who opened ChatGPT once, a year or more ago, asked a vague question, got a generic answer, and concluded that the thing only spits out nonsense. On the other, you have people who watched an impressive demo, heard promises of infinite productivity, saw Claude browsing the internet or generating code, and concluded that we're a few months away from entire companies running without humans.

But research, benchmarks, and productivity studies show a much more "boring" reality (I put that in quotes because, for anyone who actually digs into it, it's all very fascinating).

I saw this play out in a very concrete way when I mentioned on Threads that you could use AI as support for filing your income tax. Not to delegate everything blindly, nor to replace an accountant, but to help organize things on the last day before the deadline.

View on Threads

The reaction came fast. Some accountants showed up in a panic saying it would all go wrong. Some hobbyists were certain it would trigger an audit flag. And the curious part is that, in many cases, the criticism didn't seem to come from someone who had actually tested, or at least tried, the limits of the tool. It looked more like an automatic reaction: if AI is involved in anything serious, then it can only be irresponsible.

But shortly after, a friend sent me a snippet of his tax return, prepared by an accountant. There was an error in it, and ChatGPT spotted it easily. I'm not saying this proves AI replaces a professional. Quite the opposite. The point is different: a tool that helps you find an inconsistency, frame a question, or review a detail can already have value, as long as there's method, context, and human review.

That difference seems small, but it's central. Using AI as support is not the same as handing an important decision over to it. And a lot of the discussion about AI ignores exactly this middle ground.

What real AI adoption in companies looks like

AI is already being adopted at scale. McKinsey's The State of AI 2025 survey reports that 88% of the organizations surveyed use AI regularly in at least one business function. Stanford's AI Index 2025 also shows an important acceleration: in 2024, 78% of organizations reported using AI, up from 55% the year before.

Source: McKinsey, The State of AI 2025.

In other words: you can no longer treat AI as a niche curiosity, limited to a handful of people on Twitter, LinkedIn, or tech communities. It has already entered companies, banks, customer service systems, healthcare, and much more. The truth is there's probably far more AI involved in your daily life than you suspect.

One detail from McKinsey seems more important to me than the adoption number itself: 80% of respondents say their companies use AI for efficiency, but the organizations that capture the most value tend to combine efficiency with growth and innovation. That changes the reading quite a bit. The companies that get the most out of it seem to look at AI less as cost-cutting scissors and more as a way to actually rework how they operate.

The Klarna case became the symbol of this: in 2024 the company bragged about having replaced around 700 customer service agents with AI; in 2026 it backtracked and started rehiring, with the CEO himself admitting that focusing only on efficiency and cost dragged down the quality of service.

After bragging about a massive layoff, Klarna had to backtrack.

So the real gain seems to lie in testing ideas and changing processes that nobody had time to touch before, not in replacing entire departments. Lots of companies are using AI. Few companies know exactly what to do with it.

Not just companies: workers face the same dilemma

The same goes for individuals. The Pew Research Center showed that a large share of American workers rarely or never use AI chatbots at work and are genuinely worried about their future. Every technological revolution reshapes capitalism and, almost always, redistributes opportunities in a way that's painful for workers. But, looking more closely, in many cases the competitive pressure doesn't come from some abstract AI taking everything all at once. It comes from people and companies learning to incorporate these tools better into real work.

Being yourself is harder than it looks.

That same Pew Research survey shows this. Nearly 80% of those who use AI reported some improvement in the speed of their work, and a small percentage even in quality. But that's in the short term, and there's a part this survey doesn't cover.

A creative person who's excellent at their job can benefit from automating the mechanical parts of it (like processing pages and pages of bank statements or reports), while gaining more time to study, practice, and improve what really matters. And, with that, gain quality.

Source: Pew Research Center.

Between shallow enthusiasm and lazy skepticism

While only about 20% of respondents say they use AI routinely, it seems like absolutely everyone already has a formed opinion about it. And that is, to say the least, curious (I see it in practice interacting with people every day, actually).

It turns out a lot of opinions about AI are still being formed by people who had superficial contact with the technology. And superficial contact with most things already tends to produce bad conclusions. With AI it's worse, because it's very recent knowledge that changes almost daily. A prompt that worked yesterday may not work today. And work again tomorrow.

I say this from a very specific place: I test AI every day. My first public repository down this path is from March 2023, Chat-With-Your-Docs, an exploration of how to integrate AI and long documents back when context windows were still far too small to simply dump an entire PDF into the conversation and hope for the best. Since then, my GitHub has turned into a lab notebook: my own MCPs, skills, video analyzers, timeline tools, and experiments that sometimes become products and sometimes just serve to understand where the tool breaks.

GitHub screenshot showing recent AI contributions and experiments

The giant woke up. This year I decided to make my experiments open source.

One of the most important studies I've found on the topic, How People Can Create - and Destroy - Value with Generative AI, run with BCG consultants and Harvard researchers, describes AI as a technology with a "jagged frontier." The idea is simple: AI can greatly improve performance on some tasks and make it worse on others. It's increasingly important to understand which tasks are which, by understanding better how LLMs actually work.

And an interesting bit about that BCG study: it's from 2023! It was done with GPT-4. If that model was already capable of narrowing the knowledge gap between professionals and producing very strong innovation results, imagine the current models. It's the generalists' turn (spoiler for my next article).

Source: BCG.

Leading means finding where AI helps and where it gets in the way

And that's why testing and using it, with intention, matters. Testing a tool isn't opening it once a week and throwing some random question at it. Really testing AI is discovering where it increases your capability and where it creates risk. It's comparing results with and without AI. It's measuring time, quality, rework, clarity, creativity, consistency, and error. It's understanding in which tasks it works as an assistant, in which it works as an accelerator, in which it works as a copilot, and in which it simply gets in the way.

Don't be that kind of CEO. Source: Instagram, @sindpdsp.

For people in leadership roles, this is essential; otherwise, the fear of not knowing the tool well leads to banning its use in the workplace and, as a consequence, a clear competitive disadvantage. The difference is in leaders who test alongside their teams, build good practices, and develop method, instead of simply telling the team to "use AI."

Banning AI out of fear can look like prudence, but it's often just a lack of strategy. Between panic and hype, there's real work: defining usage criteria, good practices, human review, data protection, and measurable experiments.

Source: McKinsey, The State of AI 2025.

In December 2025, a survey by FGV/Sebrae/Google also indicated growing familiarity with AI among companies of different sizes, showing that even among the groups with the least familiarity, around 87% of companies had already had some contact with AI. I found it interesting that, in this case, sole proprietors (MEIs) did worse. This shows that, without leadership focused on understanding and bringing this technology into the environment, the individual worker can end up falling behind, widening the competitiveness gap between large, medium, and small businesses more and more.

Chart on AI familiarity in businesses in Brazil

Source: FGV/Sebrae/Google.

How not to jump to premature conclusions

The first step is not refusing the debate.

Data from IBGE/PINTEC show that the percentage of Brazilian industrial companies with 100 or more employees using artificial intelligence rose from 16.9% in 2022 to 41.9% in 2024.

That was in 2024, imagine today. I hope we get more and more research in this area to confirm it, but it's reasonable to expect that number has grown. So anyone who refuses to at least open the debate on the subject is going to fall behind.

The next step is experimentation. Before saying AI is useless, test it on different tasks. Not just asking trivia, but applying it to real problems: reviewing a text, analyzing financial statements, explaining a piece of code, building a checklist... Think about the bottleneck you have today. A real bottleneck! Tip: today's LLMs are excellent at interpreting images. So drop a screenshot into the prompt and ask for help with whatever it is.

And before saying AI replaces an entire team, run the same test on tasks that require a higher degree of responsibility. See where it looks convincing but is wrong. Believe me, LLMs are excellent at sounding confident even while talking nonsense. See where the cost of reviewing is greater than the cost of just doing it yourself.

Then document. Yes, document the process. Evaluate above all which processes AI helped with and how. How you modified the process so it would fit better and produce better results. And if it made things worse, document that too, and understand which of your and your team's skills boost the results. This will be important so you can repeat the tests when new models are released.

Screenshot of my first Notion for AI studies, from 2023

My first attempt at documenting my studies, back in 2023. This Notion was shared with my friends, but I was the only one contributing. Today I keep a mix of a private Notion, my GitHub (public), and this site, where I write up some research.

Repeat the tests when new models come out! There's no point in doing all this work with GPT-3 and not even testing GPT-5. Please.

Image of the Will Smith eating spaghetti benchmark

The famous Will Smith eating spaghetti benchmark. Source: BBC Bitesize.

The future is uncertain, but the present doesn't have to be

People want a simple definition: is AI good or not? Is it going to replace them or not? Should I use it or not?

The most honest answer is: it depends. It depends on the task, the person, the context, and the method of use.

The best option is to test, test a lot, and intentionally. And to analyze those tests.

If you still haven't figured out how to use it intelligently, I'll leave here a video by Jeremy Utley, a creativity professor at Stanford and a specialist in the practical use of AI. He'll know how to convince you.

That's about it: AI isn't magic, nor is it a miracle. But running away from understanding how it really works is maybe one of the fastest ways to fall behind.

So here's my challenge to you: pick a real bottleneck of yours from this week and test AI on it, try to solve the problem together with Claude or ChatGPT. If you test something this week and discover a use that worked (or one that failed badly), share it and tag me. I want to put together a collection of these real cases.

If you want to test things with me and join the conversation (and occasionally talk about games), follow me on Twitter and/or Threads =D