Is Agentic AI’s Perfect Home Your Browser?

Is Agentic AI’s Perfect Home Your Browser?

Summary

  • Agentic AI in browsers can automate tasks like buying event tickets, booking hotels, and filling forms.
  • ChatGPT’s remote Operator runs on a server, while Opera’s version operates directly in your browser.
  • Despite some rough edges, a refined Operator could potentially be a daily use tool for various browsing tasks.

Those who know me know that I’m a skeptic of a lot of stuff that comes with the word “AI” attached. A lot of it is just gimmicks, and some companies are definitely doing AI as an easy way to get investor cookie points.

Agentic AI is going down the same route, but its most realistic application so far might actually be one you’ve overlooked—browsers.

How Agentic AI In Browsers Works

I was recently invited to Opera’s Browser Days event in Lisbon, Portugal. There, I got to see, among other things, a live demo of the company’s new Operator feature. It’s, in a way, an extension of the browser’s built-in Aria chatbot, and the way it works is that it can perform actions in the browser and within websites from your text prompts.

You can tell it to buy something for you on a website, to find and book a hotel or an Airbnb, fill in a form, buy plane tickets… Anything that’s “tedious” in your day-to-day web browsing experience, you should be able to offload it to the Operator whenever it’s live.

All you need to do is tell it exactly what you need it to do, and give it as much detail as possible. For one, if you want to buy tickets for an event, you should tell the Operator exactly which website to go to, where you want to sit, and how much you’re willing to spend.

Similar to how you shouldn’t doze off while driving a self-driving car, you should keep an eye and be ready to take control at any time while the Operator is doing something. If it gets to the checkout screen, and it can’t go through because it’s missing your credit card details, you should just wait for it to give up and input them yourself, rather than just giving the AI your credit card number, for obvious reasons. Still, it’s pretty neat.

Opera Browser Days Demo Lisbon
Arol Wright / How-To Geek

During the live demo shown to How-To Geek, the Opera team made the Operator go to a flower delivery website, pick out some yellow flowers, buy them, and deliver them to the hotel room of one of the journalists in attendance. The flowers were delivered the next day, just like they would’ve been if a human had bought them.

According to the company, the Operator goes deep into a website’s underlying structure and strips it down internally rather than just looking at the front-end layout and buttons and trying to guess what they do. It “reads” the page structure to figure out how to perform actions like clicking, typing, and navigating.

Now, mind you, this Operator still has a lot of rough edges to polish, which is probably why Opera isn’t committing to a specific release timeframe for this just yet. The demo itself hit snags a few times that made it either not do something very well or fail at a specific task. This is kind of a larger problem with agentic AI in general right now—more on that later. But when it’s actually something that’s ready for prime time, I feel like a browser is a good stop for this technology.

How It Compares To Other Agentic AI

OpenAI's Operator AI agent running automaton on the TripAdvisor website.
OpenAI

As much as this is cool, the keen-eyed among you probably know that this isn’t exactly “new.” Another almost identical application of agentic AI is ChatGPT’s identically-named Operator, currently only available on the chatbot’s $200/month tier.

Like Opera’s implementation, ChatGPT’s can browse the web and perform actions on the web for you. But there is a key difference, and that’s the fact that ChatGPT’s runs on a remote server, while Opera’s runs right in your browser, with the same cookies and browsing data you already have—and the company says that this data never leaves your browser while you’re using this, either.

Related

ChatGPT’s ‘Operator’ Browses the Web for You

Don’t expect miracles though.

It should be noted that ChatGPT’s implementation also tends to mess up a lot. The best way to get these kinds of agents working well is probably to train them on specific websites, which is why pilot programs such as Amazon’s Buy For Me feature or Microsoft Copilot’s Actions work only with a handful of websites at the time. I would guess that the eventual goal is to get everything into the same “catch-all” mode ChatGPT and Opera currently use, but if you allow full functionality from the get-go, it’s way easier to poke holes in it.

Agentic AI is a pretty broad term, too. Agentic AI is just AI that can autonomously make decisions and perform tasks without necessarily requiring user intervention. Having established that, we also have endless different agentic AI applications—”Operators” that can perform tasks for you within a browser are just one type of agentic AI. This is also what I meant at the beginning of the article when I said it was going the same gimmicky route other AI have gone on—some of it is useful, but a lot of it is just not something people will use more than once or twice.

There are also ways to run local AI models that can perform autonomous actions as well, such as with AnythingLLM. Still, the vast majority of people probably won’t go so far as to install a local LLM on their computer or smartphone, so this is probably a good middle point.

Related

Agentic AI Is the Next Big Thing but I’m Not Sure It’s What I Want

AI can take action for us, but should we let it?

Is It Actually Worth It?

Whether it’s “worth it” or not will depend on how the final implementation of this feature ends up looking. After all, all we have now are prototypes. Opera hasn’t committed to a specific release timeframe for this just yet—while it might be released over the coming months, it’s clear it still needs some time in the oven. But I think it can become something people will use on a daily basis.

In a faster and more accurate implementation of this, you could have an Operator fight with a concert tickets website to get concert tickets at regular sale prices, or an older person could use one to perform tasks they wouldn’t know how to do properly themselves. This, of course, is contingent on it actually improving.

I believe this has the potential of becoming an actually useful tool if developed properly, and I wouldn’t rule out other browsers—at least some of the minor players—trying to implement a version of this in the future. I wouldn’t say that it’s a true game-changer for me, though, at least in its current implementation. It doesn’t do anything faster than I would do it myself, and if anything, I would spend more time getting through the frequent snags it would hit. But it has potential. Hopefully, by the time this is actually out, it’s a more polished product.

Leave a Comment

Your email address will not be published. Required fields are marked *