AI Focus

AI Assisted Web Development

Wed, 04 Jun 2025 16:00:06 +0000

AI assisted programming is becoming more and more frequent amongst developers. Its usage ranges from asking AI tools like Gemini and pasting answers into the IDE, to using AI as a glorified auto complete, to full on vibe coding, delegating most of the code written to AI agents.

Models are getting more capable and, but still not perfect. Developers are still ultimately responsible for the code produced, and should review it before publishing to avoid nasty surprises like leaking API keys. But it’s amazing to see how models like Gemini 2.5 Pro are capable of building entire (even if reasonably simple) web applications from a one shot prompt, like this example where my prompt was Build a force directed graph demo.

One thing to note, however, is that AI models are not equally performant for all programming languages and stacks. Because model performance depends on the availability training data for the model, AI models tend to be quite good for web development, and have the potential to boost the productivity of web developers more than other stacks.

Maybe that’s because the web has been around for such a long time and the open nature of web implementations, there is plenty of training data available for training models for web development. Another characteristic that favours the web is that it excels in backward compatibility (remember spacejam.com, from ‘96? Its still up an running), so even when a model produces slightly outdated code, it will generally still work, without issues.

Another point that makes the web a great platform to build with AI tooling is low-friction deployment. Because the web doesn’t have review queues, or requirements, developers can easily deploy and their creations, iterate, and share them.

The combination of AI models capabilities on web development with lack of friction for deploying them is spurring a new type of developer, one that is first and foremost a vibe coder and frequently doesn’t know other types of development.

This became clear to me when chatting with the founders of usecurling.com, a Brazilian startup aiming at differentiating themselves by delivering products faster by using AI. The team there also hosts a closed vibe coding community where the vast majority of participants don’t have a software development background, but are using tools like Replit and Lovable to build projects.

What those developers can achieve is, of course, limited to what the AI model can do. The team reported that those developers will often run into brick walls, either because they aren’t able to get the AI tooling to implement what they want or because they run into issues, like security or scalability, when productionizing them.

And that’s where their community comes in, it’s a forum where the Curling AI team, who are experienced developers, can help vibe coders to overcome those issues.

AI Tooling does indeed lower the barrier of entry for web development, allowing people who otherwise wouldn’t have the time or resources to learn, to effective realize their ideas. It also doesn’t mean there will be less work for traditional developers. In fact, my view is quite the opposite. With a lower barrier of entry, the need for developers who can take over when AI tooling hits a brick wall will only grow.

Another interesting trend are developers who moved into leadership or management position, like CTOs, VPs, directors and managers, starting to build again, because AI tooling gives them the opportunity to quickly put together projects within the time they have left after dealing with other responsibilities that comes with those roles.

While, at one hand, this may be contributing to a dissonance between leadership and developers on generative AI, it’s also great to see AI bringing those developers back to building things, and excited about the future of web development.

An URL is a powerful tool that allows sharing web projects around. With more people becoming developers and publishing their creations on-line, I wonder if we will start seeing more directories where those developers can showcase their applications, and users can discover new ones - sort of an itch.io, but for web applications.

Maybe this won’t make sense in today’s world, but the concept of more people being able to transform their ideas into web applications, with reduced friction to build and deploy, reminds me of the early days of the web - distribution of native applications was always an issue, but the web allowed developers to share their creations with a simple link. AI Tooling is changing that by allowing more people to become developers, and that’s great!

embedding

Wed, 28 May 2025 10:59:06 +0000

No, not that one. The E in SLICE is for embedding content…. Oh wait, that E was for ephemeral. Hmm, never mind. I guess this is the other E which is actually the C in SLICE. “Composable”. After linking composability is one of the most critical features of the medium that is the web. It’s the only platform that I know of that enables expression by integrating live content from nearly any other site or service directly into the UI. Yes, we have APIs, but it’s the <iframe> and <embed> that have helped to make the web unique.

There’s a story that I heard before joining Google, that our very first developer API was a way to embed Google Maps into your website. It wasn’t that we invented an API, it was that the web made it easy to pull content from other sites and embed it into your own and lots of people wanted to do this. While embedding of maps is declining (according to BuiltWith), anywhere between 12-15% of the top 1 million sites still embed Google Maps.

In “A link is all you need” and super-apps I touched on the ability for LLMs to create or recall content on the fly and how it’s potentially a huge shift for how we think about the web and in AI-powered site mashups Andre Bandarra suggests that Agents will be able to create the ultimate mashup or sites and services because they attempt to solve for the user’s goal.

If the only thing really stopping this is the latency of the LLMs to generate the UI then it is a “when” and not “if” question. We really need to think about some of the downstream implications of this.

One extreme is where there is a super-app and it is the agent that can do everything for the user, generating content and UIs on the fly to fulfil a goal. Where does the web sit in this? The web is a legacy fallback and it’s not the web I want to see. Is there a possibility that an exchange of value could happen between the site and super-app? I believe that many site owners and businesses would want some way of keeping their brand, or enabling specific actions like up-sell on checkouts, so could we embed some functionality that brings my brand or service in front of the user in any site or app, including the super-app? Maybe it’s a checkout form, or a registration page, or, well anything that needs a user-action.

Fictional example of embedding existing web functionality into a ‘chat app’

In 2020, while on my team, Jason Miller documented the Island architecture (first proposed by Katie Sylor-Miller). At the time if felt a logical extension to “AppShell”: Here’s some static code and here’s the dynamic bit — which on a technical level is what it describes, but at an architectural level it is something rather different. Islands are a way to think about how to compose your web app in to different bits of functionality. While still nascent, frameworks like Fresh and Astro have adopted this idea, it’s still just a framework-level concept and not a platform-level primitive.

When I look at the extreme that is “the super-app”, it feels like embedding and composability need to be key parts of the future of the web, and it needs to something that developers and businesses can opt-in to and control to their brand and experience to as much as an extent as possible.

Now there is a natural reaction: Well, I don’t want super apps or LLMs. The technology is now here and it’s being used for good and for bad, and as I learnt from the desktop to mobile transition, the answer is to differentiate and not follow. Lean in to the areas that other platforms can’t compete on.

So, how does the web differentiate then?

One area that is ripe for innovation is the act of hyper-linking. We should actively investigate hyper-embedding (also known as transclusions). That is, we need to go beyond just being able to embed a site in a page (iframe) or an API (fetch()), or just merely linking to something (<a>), but instead enable the seamless embedding of functionality that is useful and composable and secure.

The boundary between functional components as described in islands offers so many opportunities for the web. By exposing islands/components/widgets to the browser in a way that it understands that a) there is something that is embeddable, b) what it can do, and c) how to talk to it, all while ensuring there can be security and privacy boundaries between the islands if required, could enable:

A cleaner separation for site authors for use across their site. Islands and functionality across the current sites, and then render them in the page. Because the browser understands the intent of islands it enables page-level actions, automations, and chat-bots by the browser to help the user interact with the page.
Deeper integrations across sites. Developers have a habit of injecting any and all 3P JS into the page. A new primitive could separate the pages, ensure memory safety, and data-leakage while enabling even more composability across sites.
Native apps, or other agents to load these islands from other sites, and then render just the island inside their app.
There could be a marketplace and discovery mechanism for functionality for any given island’s intent and any given contract.

This might sound like Web Components, but we don’t have clear contracts or cross-platform embed-ability. It’s something that I started to think about in Custom Elements Ecosystem but at the time there wasn’t a clear need for it. Now there is.

It might also sound like an <iframe>, but these are too heavy. I might just want to embed a small bit of functionality like a checkout form, or a map that has all my own branding.

It might also sound like the <portal> element which was meant as a more privacy-preserving <iframe> element, but again it’s too high-level and doesn’t allow for the embedding of functionality at a level that smaller than a page.

It might also look like Web Intents but this was a page level and not at a component level (and it got pulled out of Chrome).

We are in the start of an era where the web will be headless and we don’t have the correct primitives to enable the web to be composable in a way that is useful and for it to thrive.

It’s too early to prescribe solutions, but I do believe that a way to define ‘islands’ or web components as embeddable and a way embed sub-trees and components (islands) is needed and as we do that, I think it’s time again to think about exposing intents and contracts on sites, pages and components and make them discoverable.

My hope is that the designers of the web-platform, that is browser vendors and participants of the W3C, should be imagining what the platform could and should look like and how it should continue to differentiate itself in the future, and these are just a couple of early opportunities that I see.

Mashups 2.0

Sat, 24 May 2025 16:00:06 +0000

Paul Kinlan recently wrote about how latency of AI models to generate UIs with HTML, CSS and JavaScript is decreasing significantly, and how that can lead to user UIs that are ephemeral, dynamically generated, and specialized to the user need at hand.

To me, the direction of travel is clear. UI generation to service user-goals is going to happen.

Combining this with Model Context Protocol (MCP) has got me thinking about the good old site mashups, and how AI agents can unleash a new, modernized version of them. If you haven’t heard about mashups before, here’s what Gemini has to say about them:

A site mashup combines data, functionalities, or applications from two or more distinct web sources into a single, integrated user experience. This is typically achieved by leveraging publicly available APIs (Application Programming Interfaces) or RSS feeds, allowing developers to extract and re-present content in a new and innovative way, often without owning the original data sources. For example, a real estate mashup might combine map data from Google Maps with property listings from a real estate website to visually display homes for sale in a specific area.

One of the challenges with site mashups was that, while combining functionality from different sites led to unique experiences, those were also frequently niche, sometimes specific to individuals, and the effort to build them meant they would rarely pay off beyond developers building toy applications for themselves.

AI models solve this particular problem by removing the need for a developer to create the UI, and MCP server provide a standardized description of APIs that AI Agents can call or use in the application they are building.

The ability of AI Agents to use tools, standardized with the MCP protocol, allows AI Agents to integrate services into the conversation with the user. However, many user interactions don’t work well on a chat interface, an having an UI can be a better way to show structured information or ask for user input, and that’s were mini-apps with dynamic UIs and MCP servers come in.

Imagine planning a holiday trip, the user may want to find suitable flight, a hotel that matches their preferences, create an itinerary of local attractions, dinner at restaurants that matches their taste and, finally, book their reviews - this usually requires interacting with different services, and keeping track of your own itinerary.

An AI agent, through the MCP protocol, can use different sources to check reviews for hotels, attractions and restaurants, check their prices and availability, and finally, create all bookings as needed, taking into account the user’s own preference. In this workflow, some bits of information can work well in a chat interface, like showing the summary of the reviews of a restaurant. Others might work better with a UI, like showing the location of available hotels in a map, or asking the user to pick attractions they are interested from a list, but checking them.

Being able to create UIs dynamically and instantly would allow those integrations to happen with the best UI possible to the task at hand, and aligned with that user’s preferences.

It’s possible to imagine entire businesses that only provide services via MCP, being effectively UI-less, and relying on AI agents to drive business to them.

While the performance of the current models is incredible, they are not instant (yet), with that being a significant blocker for this kind of mashup.

Another important blocker is figuring out the monetization model for applications providing content to AI Agents - while for a flight, restaurant, or hotel booking services the benefit is clear, since the AI Agent is directly generating business for them, services like review sites will need a good way to monetize the content they are providing. Maybe the AI Agent would pay for access to those services, on behalf of their user.

latency

Thu, 22 May 2025 19:59:06 +0000

I spent an evening in a fictitious web. The faux-browser window hosted at WebSim.ai gave me a view into a virtual world that didn’t exist, but one that felt like it did. Every page that I visited was created in the moment that I requested it, willed into existence by a generative AI model.

It was like the early days of the web. Every page felt fresh and unique. Some were high quality, some were low-fi. All were incredibly slow to load. I was on a dial-up connection in 2024. Even when my college’s shared connection in ‘98 was on a slow leased line, sites frequently took minutes to load, but at the time it didn’t matter, I had this new world to explore.

It wasn’t until much later in my career that I learned about the importance of latency in web applications. The speed at which a page loads and responds to user interactions can make or break the user experience. In 1993, Jakob Nielsen published his first paper on the topic of response times and how they affect user experience. He identified four key limits for response times:

0.1 second is about the limit for having the user feel that the system is reacting instantaneously, meaning that no special feedback is necessary except to display the result.

1.0 second is about the limit for the user’s flow of thought to stay uninterrupted, even though the user will notice the delay. Normally, no special feedback is necessary during delays of more than 0.1 but less than 1.0 second, but the user does lose the feeling of operating directly on the data.

10 seconds is about the limit for keeping the user’s attention focused on the dialogue. For longer delays, users will want to perform other tasks while waiting for the computer to finish, so they should be given feedback indicating when the computer expects to be done. Feedback during the delay is especially important if the response time is likely to be highly variable, since users will then not know what to expect.

— Jakob Nielsen - 1993

This was written at the dawn of the web and was later refined in 2014 for web applications and I think that it’s interesting that streaming of responses has been used as a way to keep people engaged with LLMs. Yes, streaming of responses has been an interesting hack to improve the perception of speed, and yes, fundamentally these models are doing trillions upon trillions of calculations to get us an answer, but it doesn’t change the fact that the underlying model is slow to generate the content.

We are at the early days of the web again. The content or the “apps” are currently slow to generate and sometimes the experiences like those created with the “Canvas” apps can feel a little low-fi too, but we are in a transition and it’s because these tools feel valuable we are happy to put up with the latency to get a complete response. Seeing these responses generate and stream in feels like the progressive loading of HTML on a slow-connection when you could see the page UI progressively load and JPEGs slowly unblur into full view. It seems to me that we are in the modem phase right now waiting for the broadband transition to happen.

It’s not clear to me that the current “chat” interface are the future — It can be tiring to engage when all I want to do is prod buttons and swipe on things — I’d argue that if the future of computing is through tools like LLMs, be it a superapp or any existing app, chewing through arbitrary tasks that the user requests we are going to need goal-based generative UI.

Ben Thompson has frequently noted that if there is to ba a future in VR/XR based experiences, the sheer amount of content that needs to be created combined with the complexity to create that content, there will need to be a massive shift to systems that generate UI to service a users need based on context and intent.

AI, however, will enable generative UI, where you are only presented with the appropriate UI to accomplish the specific task at hand. This will be somewhat useful on phones, and much more compelling on something like a smartwatch; instead of having to craft an interface for a tiny screen, generative UIs will surface exactly what you need when you need it, and nothing else.

Where this will really make a difference is with hardware like Orion. Smartphone UI’s will be clunky and annoying in augmented reality; the magic isn’t in being pixel perfect, but rather being able to do something with zero friction. Generative UI will make this possible: you’ll only see what you need to see, and be able to interact with it via neural interfaces like the Orion neural wristband. Oh, and this applies to ads as well: everything in the world will be potential inventory.

— Ben Thompson - Stratechery - Meta’s AI Abundance, October 2024

Last year, I wrote a little experiment for a goal-based UI generation using the reactive-prompts library. Given a goal and the data that you already have to solve that goal it would create a user-interface that captures the rest of the information. I was surprised that even 12 months ago it was possible for simple data collection based goals to be generated.

Data collection feels like a good first step in LLMs because we don’t need full-applications to get a job done, and the parameters seem to be more easily knowable to our tools. It raises a fundamental question about the concept of an application as we know it today might not exist in the future given that Chain of Thought tools are breaking down a goal (an app in the old context) into finite tasks, and then only requiring intervention when it can’t progress.

Today, these UIs can take many seconds to create, and because of the progressive nature of HTML, you can see the UI incrementally load. This might be ok, given that people seem quite happy to wait while the models “think” or stream their response, but we will see a step-change in engagement and interaction when these UIs start getting to Jacob Nielsen’s thresholds for interaction.

Ben Thompson also noted in ‘Sora, Groq and Virtual Reality’ "which means the speed of token calculation is at an absolute premium." 100%. How away are we from getting truly instant UIs generated?

Naively, you have to generate HTML, CSS and JS and by estimating the number of tokens generated via tcnt the following form is 251 tokens.

<form>
 <input type="text" name="name" />
 <input type="text" name="email" />
 <button type="submit">Submit</button>
</form>
<script>
 document.querySelector("form").addEventListener("submit", (e) => {
 e.preventDefault();
 const name = e.target.name.value;
 const email = e.target.email.value;
 console.log(name, email);
 });
</script>
<style>
 form {
 display: flex;
 flex-direction: column;
 }
 input {
 margin-bottom: 10px;
 }
 button {
 background-color: blue;
 color: white;
 border: none;
 padding: 10px;
 cursor: pointer;
 }
 button:hover {
 background-color: darkblue;
 }
</style>

I found this LLM speed benchmark to be a good indicative reference for the current state of play. Ranging from 50 tokens per second for the slower but higher quality models to 350 tokens per second for the faster models, potentially lower quality models. Obviously a lot has changed since 2024, but the order of magnitude is the same.

My first reaction (and probably yours) is “Hey, it should only take 1 second to generate that form… what’s the problem?”

But this is not a realistic example because it was hand-crafted by me for a contrived scenario. When building UI with a prompt, there are a number of other things we have to consider:

What is the prompt? We have to include the prompt in the token count and processing time.
Is there “thinking” required, or is there error correction required? This is a non-linear process and can take a long time to get right.
The latency induced by the network request. Setting up a TLS connection can take 200ms.

A more realistic scenario might be a checkout form with a number of items pre-populated, that you need to get the users confirmation for a purchase.

import React, { useState } from "react";

// Main App component
const App = () => {
 // State for form fields
 const [formData, setFormData] = useState({
 fullName: "",
 email: "",
 address: "",
 city: "",
 zip: "",
 cardNumber: "",
 expiryDate: "",
 cvv: "",
 });

 // State for shopping basket items
 const [basketItems, setBasketItems] = useState([
 { id: 1, name: "Wireless Headphones", price: 129.99, quantity: 1 },
 { id: 2, name: "Smartwatch", price: 199.99, quantity: 1 },
 { id: 3, name: "Portable Bluetooth Speaker", price: 79.99, quantity: 1 },
 ]);

 // Handle input changes for form fields
 const handleInputChange = (e) => {
 const { name, value } = e.target;
 setFormData({ ...formData, [name]: value });
 };

 // Handle deleting an item from the basket
 const handleDeleteItem = (id) => {
 setBasketItems(basketItems.filter((item) => item.id !== id));
 };

 // Calculate total price of items in the basket
 const calculateTotal = () => {
 return basketItems
 .reduce((total, item) => total + item.price * item.quantity, 0)
 .toFixed(2);
 };

 // Handle checkout button click
 const handleCheckout = () => {
 // In a real application, you would send formData and basketItems to a server
 console.log("Checkout initiated!");
 console.log("Form Data:", formData);
 console.log("Basket Items:", basketItems);
 alert("Checkout successful! (This is a demo)"); // Using alert for demo purposes
 };

 return (
 <div className="min-h-screen bg-gray-100 flex items-center justify-center p-4">
 <div className="bg-white p-8 rounded-xl shadow-lg w-full max-w-4xl flex flex-col lg:flex-row gap-8">
 {/* Customer Information Section */}
 <div className="flex-1">
 <h2 className="text-3xl font-extrabold text-gray-800 mb-6 text-center">
 Checkout
 </h2>

 {/* Contact Information */}
 <div className="mb-6">
 <h3 className="text-xl font-semibold text-gray-700 mb-4">
 Contact Information
 </h3>
 <div className="grid grid-cols-1 md:grid-cols-2 gap-4">
 <div>
 <label
 htmlFor="fullName"
 className="block text-sm font-medium text-gray-600 mb-1"
 >
 Full Name
 </label>
 <input
 type="text"
 id="fullName"
 name="fullName"
 value={formData.fullName}
 onChange={handleInputChange}
 className="w-full p-3 border border-gray-300 rounded-lg focus:ring-2 focus:ring-blue-500 focus:border-transparent transition duration-200"
 placeholder="John Doe"
 />
 </div>
 <div>
 <label
 htmlFor="email"
 className="block text-sm font-medium text-gray-600 mb-1"
 >
 Email
 </label>
 <input
 type="email"
 id="email"
 name="email"
 value={formData.email}
 onChange={handleInputChange}
 className="w-full p-3 border border-gray-300 rounded-lg focus:ring-2 focus:ring-blue-500 focus:border-transparent transition duration-200"
 placeholder="john.doe@example.com"
 />
 </div>
 </div>
 </div>

 {/* Shipping Address */}
 <div className="mb-6">
 <h3 className="text-xl font-semibold text-gray-700 mb-4">
 Shipping Address
 </h3>
 <div className="grid grid-cols-1 gap-4">
 <div>
 <label
 htmlFor="address"
 className="block text-sm font-medium text-gray-600 mb-1"
 >
 Address
 </label>
 <input
 type="text"
 id="address"
 name="address"
 value={formData.address}
 onChange={handleInputChange}
 className="w-full p-3 border border-gray-300 rounded-lg focus:ring-2 focus:ring-blue-500 focus:border-transparent transition duration-200"
 placeholder="123 Main St"
 />
 </div>
 <div className="grid grid-cols-1 md:grid-cols-2 gap-4">
 <div>
 <label
 htmlFor="city"
 className="block text-sm font-medium text-gray-600 mb-1"
 >
 City
 </label>
 <input
 type="text"
 id="city"
 name="city"
 value={formData.city}
 onChange={handleInputChange}
 className="w-full p-3 border border-gray-300 rounded-lg focus:ring-2 focus:ring-blue-500 focus:border-transparent transition duration-200"
 placeholder="Anytown"
 />
 </div>
 <div>
 <label
 htmlFor="zip"
 className="block text-sm font-medium text-gray-600 mb-1"
 >
 Zip Code
 </label>
 <input
 type="text"
 id="zip"
 name="zip"
 value={formData.zip}
 onChange={handleInputChange}
 className="w-full p-3 border border-gray-300 rounded-lg focus:ring-2 focus:ring-blue-500 focus:border-transparent transition duration-200"
 placeholder="12345"
 />
 </div>
 </div>
 </div>
 </div>

 {/* Payment Information */}
 <div>
 <h3 className="text-xl font-semibold text-gray-700 mb-4">
 Payment Information
 </h3>
 <div className="grid grid-cols-1 gap-4">
 <div>
 <label
 htmlFor="cardNumber"
 className="block text-sm font-medium text-gray-600 mb-1"
 >
 Card Number
 </label>
 <input
 type="text"
 id="cardNumber"
 name="cardNumber"
 value={formData.cardNumber}
 onChange={handleInputChange}
 className="w-full p-3 border border-gray-300 rounded-lg focus:ring-2 focus:ring-blue-500 focus:border-transparent transition duration-200"
 placeholder="**** **** **** ****"
 />
 </div>
 <div className="grid grid-cols-2 gap-4">
 <div>
 <label
 htmlFor="expiryDate"
 className="block text-sm font-medium text-gray-600 mb-1"
 >
 Expiry Date
 </label>
 <input
 type="text"
 id="expiryDate"
 name="expiryDate"
 value={formData.expiryDate}
 onChange={handleInputChange}
 className="w-full p-3 border border-gray-300 rounded-lg focus:ring-2 focus:ring-blue-500 focus:border-transparent transition duration-200"
 placeholder="MM/YY"
 />
 </div>
 <div>
 <label
 htmlFor="cvv"
 className="block text-sm font-medium text-gray-600 mb-1"
 >
 CVV
 </label>
 <input
 type="text"
 id="cvv"
 name="cvv"
 value={formData.cvv}
 onChange={handleInputChange}
 className="w-full p-3 border border-gray-300 rounded-lg focus:ring-2 focus:ring-blue-500 focus:border-transparent transition duration-200"
 placeholder="123"
 />
 </div>
 </div>
 </div>
 </div>
 </div>

 {/* Shopping Basket Section */}
 <div className="flex-1 bg-gray-50 p-6 rounded-xl shadow-inner">
 <h3 className="text-2xl font-extrabold text-gray-800 mb-6 text-center">
 Your Basket
 </h3>
 {basketItems.length === 0 ? (
 <p className="text-center text-gray-500">Your basket is empty.</p>
 ) : (
 <div className="space-y-4">
 {basketItems.map((item) => (
 <div
 key={item.id}
 className="flex items-center justify-between bg-white p-4 rounded-lg shadow-sm border border-gray-200"
 >
 <div className="flex-grow">
 <p className="font-semibold text-gray-800">{item.name}</p>
 <p className="text-gray-600 text-sm">
 ${item.price.toFixed(2)} x {item.quantity}
 </p>
 </div>
 <button
 onClick={() => handleDeleteItem(item.id)}
 className="ml-4 p-2 bg-red-500 text-white rounded-full hover:bg-red-600 focus:outline-none focus:ring-2 focus:ring-red-500 focus:ring-opacity-50 transition duration-200"
 aria-label={`Delete ${item.name}`}
 >
 <svg
 xmlns="http://www.w3.org/2000/svg"
 className="h-5 w-5"
 viewBox="0 0 20 20"
 fill="currentColor"
 >
 <path
 fillRule="evenodd"
 d="M9 2a1 1 0 00-.894.553L7.382 4H4a1 1 0 000 2v10a2 2 0 002 2h8a2 2 0 002-2V6a1 1 0 100-2h-3.382l-.724-1.447A1 1 0 0011 2H9zM7 8a1 1 0 012 0v6a1 1 0 11-2 0V8zm6 0a1 1 0 012 0v6a1 1 0 11-2 0V8z"
 clipRule="evenodd"
 />
 </svg>
 </button>
 </div>
 ))}
 </div>
 )}

 <div className="mt-8 pt-4 border-t-2 border-gray-200 flex justify-between items-center">
 <p className="text-xl font-bold text-gray-800">Total:</p>
 <p className="text-xl font-bold text-blue-600">
 ${calculateTotal()}
 </p>
 </div>

 <button
 onClick={handleCheckout}
 className="mt-6 w-full py-4 bg-blue-600 text-white text-lg font-semibold rounded-lg shadow-md hover:bg-blue-700 focus:outline-none focus:ring-2 focus:ring-blue-500 focus:ring-opacity-50 transition duration-200"
 >
 Proceed to Checkout
 </button>
 </div>
 </div>
 </div>
 );
};

export default App;

4982 tokens. At 150 tokens per second just for the response we are looking at 33 seconds to generate the UI, and this is still a relatively simple UI.

Latency is across the full stack and we’re going to need a step change in performance to get to the 0.1 second threshold.

There seem to be multiple approaches to improve this performance and reduce latency. On one hand you have Groq making custom hardware and then you have algorithmic changes like Text Diffusion (showcased at Google I/O 2025), with both appearing that you show between 1000-2000 tokens per second. The checkout form above would be generated in about 2-3 seconds.

That’s an order of magnitude improvement to generation in the space of 2 years, but to get sub-second it looks like need another order of magnitude improvement, so something in the 10,000 tokens per second range.

To me HTML, CSS and JS feel like the right level of abstraction for generating UI inside LLMs, firstly we can generate them for any platform, Web or Native app, but given the languages’ relative verbosity it does raise the question to me if it will be better to have an intermediate representation of UI that is more compressed and quicker to generate might be a better approach - for example, I could imaging a constrained set of “Web Component interfaces”, or maybe we just use smaller “lower quality” models, or maybe we just wait for another step change to happen in the models and hardware.

To me, the direction of travel is clear. UI generation to service user-goals is going to happen.

A link is all you need

Sat, 17 May 2025 19:59:06 +0000

I’ll keep playing here while the rest of you flirt with apps. I’ll be here when you come back. I know it’s going to happen. Here’s why.

Linking.

Dave Winer - 2011

The web has a lot going for it. We coined the term SLICE (Secure, Linkable, Indexable, Composable, and Ephemeral) to describe its benefits, but at its purest essence the hyperlink is the thing that makes the web the web. It’s a thing of beauty. It’s why I fell in love with the Web. Click. Something new! It’s why I still love the web and it’s the thing that is unique to the medium because the web platform in a lot of cases has a thing at the end of it, yes there is link-rot, but you don’t have to install anything to get it running.

But there’s something bugging me. It’s the over-confidence of the industry that the web will weather any storm. “The web will always win” is often quoted whenever someone poses that there is an existential threat to the web and the other side doesn’t think any of the proposals to address the challenge are needed.

Mobile and the rise of native apps was one of those challenges that I believe was a potential extinction-level event for the web. The web as experienced by people didn’t work well on mobile and people looked for Apps (they were told it, “There’s an app for that”). At the same time billions of people were getting their first computing experience through a mobile device and the web wasn’t something they had grown up around, it just didn’t even occur to them as a thing that they should do.

Yes, Apple through Safari introduced many technologies that would let the web work well on mobile (e.g viewports, touch, multi-touch, media queries are probably the biggest innovations at the time) but web developers just didn’t shift to match the expectation people had from their mobile devices.

We needed a change in how we thought about the web and how it should be used it for this new context. From my own personal experience, it wasn’t until a Google “mobile-first” push in 2015 that you really started to see a change in how the web was experienced on mobile.

As I look at my own usage of LLMs today, there is a change in how I use the web and I am uneasy, but it wasn’t until recently that I was able to put my finger on it. Yes, as the cost of creating content drops because people use LLMs it enables a lot more low-quality content to be created and at the same time it is also enabling a lot of good experiences to be easily created (the structure of this very blog was made using an LLM), but the only way you get to discover and experience content is if you can navigate to it.

I was chatting with a colleague about the intersection of AI tooling and the web and the following thought popped into my mind “If you had a machine that could instantly recall or create any facet of information, do you need a link?”

It’s the link that I am worried about and I now think about this constantly.

The way that we — the people who build sites — create links is as a way of saying “I think this is important”. “I think this is related or has more context”. “I think you should look at this”. And what we know as a hyperlink, a thin blue line underneath some text made by wrapping it in an <a> tag connecting two documents together via a directed graph structure is just a construct of the technology that renders web pages.

What I am experiencing doesn’t feel a million miles away from the original definition of hypertext. The idea that you could have a machine that could recall any piece of information and then connect it with any other piece of text. Specifically, Transclusions feel pretty close to where I see LLMs going.

In computer science, transclusion is the inclusion of part or all of an electronic document into one or more other documents by reference via hypertext. Transclusion is usually performed when the referencing document is displayed, and is normally automatic and transparent to the end user.[1] The result of transclusion is a single integrated document made of parts assembled dynamically from separate sources, possibly stored on different computers in disparate places. The result of transclusion is a single integrated document made of parts assembled dynamically from separate sources, possibly stored on different computers in disparate places.

Emphasis mine — Transclusions

It feels like the directed edge that defines a link in this massive directed graph that we know as the web is changing as LLM’s seem to be able to connect concepts across many documents and just merge them into the response.

This directed graph nature of the web has been fundamental to how we experience the web. It enables things like Page Rank to exist, which to my understanding has the link imply some level of authority. Its unclear to me if this is of any importance to a LLM. Is the link just a way to point a web-crawler to another page so it can be ingested? If so, then is the only way to have an LLM not ingest your content is to make it undiscoverable? Maybe it’s to put it behind a login wall. This feels like a big step backwards on both fronts, however with things like Substack and Medium, the latter seems to be the ways it’s going.

The link can still point to content in the open or content that is private behind a login and enable instant access to any experience. So we still have that for now, but in is a world where LLMs become the super app because they can recall and generate content and functionality in an instant, then what next?

I’m not sure if it’s a technology problem (i.e, links need to change) or if like the mobile-first push we just need to work out where the web fits in the grand scheme of things, but I don’t believe that the “Web will always win” is a good enough answer.

Maybe we always just need to ask LLMs to include citations. Maybe we need to redefine what a link is. Maybe we need to rethink the capabilities of the platform. Maybe we need to create more incentives for putting content on the open web and linking to it directly.

This is something that I want to explore more in the future and I don’t want to sit idly by and be react in 5 years like we did with mobile. I want to be proactive and help shape the future of the web.

super-apps

Mon, 12 May 2025 12:05:38 +0000

I’ve spent two weeks in April wandering around Japan with my wife, daughter and parents - it was incredible. I used a browser twice. Most of the time that I spent with my phone wasn’t on the web or in traditional apps, it was in a LLM.

I would give the LLM photos of packets of food and ask what it is and it would handily tell me the brand, and then I could follow up and with a photo of the back of the packet and ask if there was milk in it (my daughter can’t drink milk), and it would explain the ingredients and potential allergens…. Given my non-existent ability to read Japanese I had to trust the LLM.

We went to Kyoto and I would ask the LLM what was written on the noticeboard of the shrine and what the cultural importance of it was, and it would tell me. I sat on a tourist train trundling through a valley pointing my camera at something that looked like a nuclear reactor - turns out it was a stadium.

While waiting outside a pharmacy I would ask the LLM to tell me the latest news about the local area and it would present a quick overview of what had happened in English from Japanese sources.

As I wandered around Himeji castle and had questions, I could just check the LLM. As I saw the fish gargoyles that adorn each of the roofs on the castle, I could ask what they were and their cultural relevance and get a comprehensive answer. A lot more than I could from the placards I dotted around the site.

As we were riding the Shinkansen back to Tokyo I wondered how many sites were blocking LLM’s User-Agents, so I asked Gemini to build a script that would check (and copied it into my Android Linux terminal - it didn’t work, python wasn’t installed, but it was so close).

I rarely left these tools and it’s been on my mind a lot.

A couple of months ago I had two ideas running around my head. The first was me musing if it was possible to have a future programming language built around prompts, and the second was will it be possible to build UIs based on a goal. Combining the two ideas I created a little toy-library called f and a demo that would build a UI for a site based on a request to a JSON API using as plain-english as I can currently get. I was blown away by how far you can get by describing your goals. Want a form that collects data? Great, just describe it! Want a UI built for a random API? Just point it at the data and ask it to build the UI.

const getSpaceData =
 await f`fetch JSON from https://api.spaceflightnewsapi.net/v4/articles/`;

const news = await getSpaceData();

// Describe the data structure so the the UI prompt has a better idea of what to build.

const generateSchema =
 await f`Return a JSON Schema for a given object. The schema should be in the format defined in https://json-schema.org/understanding-json-schema/reference/object.html and should include all the properties of the object. The schema should include the type of the property, the format of the property, the required status of the property, and the description of the property. The schema should include all the properties of the object. The schema should include the type of the property, the format of the property, the required status of the property, and the description of the property.`;

// Describe the data
const schemeDescription = generateSchema(spaceData);

const buildSpaceUI =
 await f`Using the data defined in <output> create a UI that will best display the space flight information. The developer will provide the data as a parameter and it will be in the format defined in <output>.

<output>${JSON.stringify(schemeDescription)}</output>`;

document.body.appendChild(buildSpaceUI(spaceData));

Dynamically generated UI from prompts

So why is Japan and ‘f’ in the same article?

I had to work for two days on this vacation and I was relating my use of Gemini and ChatGPT to a friend about how every time I go to China show me how pervasive WeChat is across people’s lives. Gemini and ChatGPT were my super-app. Yes, on this trip I wasn’t ordering food, cars, laundry or anything else, but for my needs both Gemini and ChatGPT gave me everything I needed. Translations, background information, local-news, and even a bit of work that I needed to think about.

In that conversation, I was describing the experience of how I took a photo of a nuclear reactor and ChatGPT built a little program that scanned and panned the image to find where I was and what I was looking at. It built a mini application to solve the problem (see below) and it hit me….

Chat GPT thinking and building - part 1

Chat GPT thinking and building - part 2

We’re not far away from tasks, be it expressed via text replies and “thinking tokens” or dynamic UI’s and applications that are built to service a single user requests from directly inside the LLM.

I started to describe this in “The Disposable Web”, that it is becoming easier to create software that solves one problem once. And when I compare what can be created today in the canvases of these tools against many of the run-of-the-mill CRUD style experiences that operate inside WeChat and it feels easy (for me at least) to draw a connection that we are not far away from getting these applications built dynamically inside a LLM to service the need for the user, and when that happens, what’s next for the web?

Many of the apps inside WeChat are not incredibly complex, they are run-of-the-mill CRUD style experiences. We’re really not far away from getting these built dynamically and when that happens, what’s next for the web? I can see a straight line between WeChat and the experience I had in Gemini and ChatGPT. Yes, the experience in ChatGPT took a while to create (according to the “thinking” timing, it was over 90 seconds) and today it is far too slow for applications as we know it. If we use the Jacob Nielsen research “0.1 seconds (100 ms) creates the illusion of instantaneous response” (2025) as a target, then it looks like we have to make a 100 fold improvement to token generation to build blocking-free UI. How many Tokens/s do we need to make things feel instant? I think that might come sooner rather than later, model improvements seem inevitable and hardware improvements like Groq are showing that there is already a path.

HTML, CSS and JavaScript are the most expressive languages available today to render a UI and LLMs are pretty good today at generating them, so to me there is a world where this will be the easiest route to build UI that will service the specific needs of a user request directly in one of these LLMs and you rarely ever need to leave.

If you combine this with Agent communication protocols like MCP which are changing the way that we chain more complex apps and tasks together, it really feels like we are at the start of a major transition in computation and user interaction.

I argue pretty strongly that the web’s super power is the link. It lets anyone click on it and then navigate to an experience. App platforms would kill for this power because today their restrictive review process and restriction on what can run in their sandboxes (there are limits to what they allow developers to run dynamically - e.g, iOS relatively recently allowed none browsers to run dynamic JS). Tools like ChatGPT are doing an end-run around this restriction and to me the power of the link is in question. Who needs a link anymore when you can recall any text or will any experience into existence?

HTML and JavaScript are the most expressive languages to render a UI. It’s not a stretch for a UI to be created to service the specific needs of a user request, or when an Agent wants some form of human input. If any application can do this, then what is the future for web apps? Will apps just live inside these apps like mini-apps live inside WeChat? Should the Web Platform engineers at browser companies invest more in in-app experiences (i.e, WebView)? Should we as an industry do more to enable any website in the browser to do the same?

There’s a lot the web can already do to be the primary platform for this type of experience… We have sand-boxing for arbitrary code execution be it JS or any other flavour with WASM. We have the ability to run arbitrary code in a webview, and we have the ability to run arbitrary code in a web worker away from the UI. I wonder if there will be a future where we pull in small parts of existing web app’s DOM and run them inside a new super-app, or even define custom-element contracts that will enable us to load widgets from other sites and enable “my” UI to be surfaced in the app.

I actually don’t know where the future will go, but the recent experiences that I’ve had lean me towards Gemini or ChatGPT being a new type of headless web and I don’t think we are far away from having the everything app for the west?

Who needs a browser anymore?

transition

Fri, 09 May 2025 19:05:38 +0000

I remember the exact moments when major transitions in my life happened. The first time I got a BASIC program working. The first time someone used one of my programs. My first email. The first time I visited a website. The first time I made a website. The first time I saw my future wife. The first time I held each of my children.

In each of these moments, I knew things changed and at the same time I had no comprehension of how they would change the direction of my life. It is like there was a fog in front of me, I could see vague outlines of things.

When I look back on my career as a Web Developer, I’ve been involved in many major transitions: The Web being a thing that you had to care about; Dial-up to Broadband enabling a step-change in the types of experiences we could interact with; The Desktop to Mobile transition and all the change that this brought for the web - At Google we felt that we had to go bring the web to mobile, so I focused on Mobile-first as a primary motivator, then a push on Progressive Web Apps as the way that all apps should be built and experienced.

Through these later transitions I felt I could clearly see the path that I or my teams should take. Getting my hands on an iPhone and instantly it felt clear to me that the web needed to work well on mobile because this is where the future will be. Later given the growth of mobile and the fact that for billions of people it was their first experience with a computer it was quickly obvious. It felt clear that there was an existential risk to the web and that there would need to be a solutions for the centralization of Apps - which our solution would become “Progressive Web Apps” - it all felt pretty clear.

But each of these transitions to the web wrought a lot of change.

Dial-up to Broadband - The web became a desktop-class hosting platform for applications and fundamentally changed how Windows worked, leading to the rise of Web aggregators like Google, and the creation of services like YouTube, all because the increase in bandwidth and reduction in latency meant that you could do surf more and developers could deliver more. Many of the services we once relied on (anyone remember MapQuest?) were obliterated nearly over night by more interactive and engaging services. Always being on meant always having access to email, instant messaging, social, Flash games, audio and video.
Desktop to Mobile - You could see the web just not working well on iPhone-like mobile devices, and while it took a bit of time for Apps to find their mobile-first footing (i.e, not just a port of the Desktop experience) the web clearly took a lot longer to be mobile-first, or even responsive. Yes, Responsive design had been around prior to the launch of the iPhone, but it wasn’t until 2015 when Google search started a mobile-first push that the ecosystem really started to move. My own involvement in this project started a lot earlier as I could see an explosive growth of mobile because of reductions in price of the device and massive improvements to connectivity in India and China meant that people’s first computing devices might never highlight the web.

Some people see the transition far ahead of time. I had friends who built a web browser for Windows CE because they saw the rise of Nokia phones and could see the next jump to a more powerful experience and wanted to ensure that web worked well on these devices. I didn’t see that then, I just saw terrible WAP sites. However, I could feel the change with the iPhone.

It was the same with LLMs. The first time that I used an LLM was in the OpenAI playground… I believe it was GPT 2, and I just didn’t think that this was going to be a fundamental shift in the way that we interact with computers. I couldn’t see how this was going to change the way that we interacted with the web. I didn’t see how this would change how I build software almost overnight. I just couldn’t see how this was could change the way that we interacted with our devices.

I remember thinking: “That’s a neat trick”. And then I got on with my life.

Like many people, I think this changed when we first played with ChatGPT. It wasn’t perfect, it was slow, but I got it building a simple web app in a few minutes and I could feel that this had the potential to change the way that I work and the way that people interact with computers. I remember clear as day, sitting next to my wife on the couch saying that everything is going to change.

It feels clear that we are in the midst of another major transition, and I’m at a personal transition. I’m at a point where I was thinking about what I want to do next with what I think will be the next huge shift for computing. I’ve been working on the web nearly 30 years and working at Google for 15 of those and my fundamental question is: What is going to happen to the web?

What I do know is that I love the web, I think it’s the best platform to write once and reach everyone. I love seeing that the web is the place where people are experimenting with the entire range of delivery of AI experiences to people, be it access to Large Language Models, Image Segmentation, Video analysis, or even new flavours of search.

I want to be at the forefront of the medium that is the web, and the potential new platform that is “AI” (we need a much better name) and aifoc.us is my own personal place to muse on a lot of questions about this transition.

ps, Blame Barry Pollard for this domain name.