Interfaces

Interfaces are the access points of augmented intelligence. They determine not just how you use AI, but what you can use it for. A chatbox constrains you to text. A voice agent frees your hands. A visual interface reveals patterns invisible in words. Does the interface shape the cognition? That is the question to test.

Advanced · 9 min read

A question to carry: is the interface a window onto AI, or part of how you think with it? Notice whether reaching for a chatbox, your voice, a visual canvas, or your physical space changes which problems you even attempt and how good the result can be. See whether fitting the interface to the task makes the right one feel less like a tool and more like an extension of your own thinking — and decide for yourself.

Does the interface decide what's possible?

Most people's experience of AI is a chat window. You type a message, the AI types back. This is the dominant paradigm, and it is profoundly limiting — not because chat is bad, but because it represents a single, narrow channel through which human-AI collaboration must squeeze.

Consider a parallel from history. The first automobiles were designed to look like horse-drawn carriages without the horse. It took decades before designers realised that a self-powered vehicle did not need to replicate the form of a horse-drawn one. The same mistake is happening with AI interfaces. We have a technology capable of processing language, vision, audio, and structured data simultaneously — and we have given it a text box.

The interface is not a neutral conduit. It actively shapes what problems you think to solve, how you frame them, and what kinds of answers are possible. A surgeon would not operate through a keyhole if the door were open. Yet that is essentially what a chat-only interface imposes on AI collaboration.

This is why interfaces are a core element of the augmented intelligence framework. AI's ability to help you is bounded not just by the model's capability but by the interface through which you access that capability.

Worked example: cropping a photo, two interfaces

Same task — crop a photo so a person sits in the right third of the frame — done two ways. Watch how the interface, not the model, decides the effort and the outcome.

Through a chatbot

You type a request and wait for the AI to interpret it:

You: crop this so the person is on the right, leave some headroom, keep it landscape AI: Here is a version cropped to the right third with headroom. Let me know if you'd like it tighter or shifted. You: a bit higher, and the person feels too close to the edge AI: I've nudged the crop up and added margin. How is this?

Every adjustment is a sentence. You describe what you see, the AI guesses, you describe the gap, it guesses again. You never touch the actual frame.

Through direct manipulation

You drag the crop box on the image itself:

[drag corner handle inward] [reposition box: person → right third] [nudge box up to add headroom] done

You see the result while you move. There is no describing, no guessing, no round-trip. Your hand and your eye close the loop directly, and "feels too close to the edge" becomes a one-second drag instead of a paragraph.

The chatbot turned a spatial judgment into a translation exercise: feeling into words, words into the AI's guess, guess back into feeling. The visual interface let you make the judgment where it lives — on the image. Neither interface is smarter; one simply fits the shape of the task, and that fit is the whole difference in effort and result.

Is an AI interface a neutral conduit that simply passes your request through to the model?

No. The interface actively shapes what problems you think to solve, how you frame them, and what answers are possible.

Natural Language as an Interface Paradigm

Natural language is the oldest and most intuitive interface humans possess. We have been communicating in natural language for tens of thousands of years. The fact that AI can now understand and generate natural language is revolutionary — it removes the need to learn specialised query languages, programming syntax, or complex menu systems to interact with powerful computation.

But natural language has specific properties that make it both powerful and problematic as an AI interface:

Ambiguity is a feature, not a bug. Human language is inherently ambiguous. "Make this better" means different things to a writer, a designer, and an engineer. In human conversation, we resolve ambiguity through context, tone, and shared understanding. AI must resolve it through inference, which means it frequently resolves it incorrectly.
Implicit context is invisible. When you ask a colleague to "draft the proposal," you share context about which proposal, what format, what audience, what tone. When you ask AI the same thing, all of that context must be made explicit — or the AI will guess, and guessing is where errors begin.
Natural language is lossy. Complex ideas lose precision when expressed in everyday language. A data structure, a mathematical formula, a visual layout — these can be described in words, but the description is always less precise than the thing itself.

The paradox: Natural language is the most accessible AI interface and the most error-prone. The skill of working with AI through natural language lies in knowing when to be precise, when to provide context, and when to switch to a different modality entirely.

What is the paradox of natural language as an AI interface?

It is simultaneously the most accessible interface and the most error-prone one.

Why does asking AI to "draft the proposal" cause errors that asking a colleague would not?

The implicit context a colleague shares (which proposal, format, audience, tone) is invisible to AI, so it must guess — and guessing is where errors begin.

Voice Agents and Conversational AI

Voice interfaces add a dimension that text cannot: they free your body. You can interact with AI while driving, cooking, walking, or working with your hands. This is not a convenience — it is a fundamental expansion of when and where AI collaboration can happen.

Voice also changes the nature of the interaction. Speaking is faster than typing for most people. It encourages longer, more exploratory prompts. It naturally invites back-and-forth dialogue rather than one-shot queries. People tend to explain more context verbally than they would type, which often produces better results from the AI.

The challenges are real, however. Voice is ephemeral — once spoken, the words are gone unless recorded and transcribed. It is difficult to edit a voice prompt the way you would revise a written one. It is poorly suited for precise, structured input. And voice interfaces require robust speech recognition, which still struggles with accents, background noise, and domain-specific terminology.

The most effective voice AI systems are not trying to replicate human conversation. They are designed around the strengths of voice — speed, hands-free operation, natural exploration — while compensating for its weaknesses through confirmation, summarisation, and seamless handoff to visual interfaces when precision is needed.

What fundamental capability does a voice interface add that a text interface cannot?

It frees your body, letting you collaborate with AI hands-free while driving, cooking, walking, or working.

Visual Interfaces and Information Display

Humans are visual creatures. We process visual information faster than text, we detect patterns in images that are invisible in tables of numbers, and we remember visual information more reliably than verbal information. Yet most AI interfaces present their output as text.

Visual AI interfaces open powerful possibilities:

Data visualisation. AI that can generate and manipulate charts, graphs, and dashboards transforms analysis from "read the numbers" to "see the pattern." This is not decoration — visual pattern recognition is a cognitive capability that text cannot replicate.
Direct manipulation. Instead of describing what you want in words, you show the AI. Drag elements, draw boundaries, highlight sections. Direct manipulation interfaces reduce the translation gap between your intent and the AI's understanding.
Generative visual tools. Image generation, design systems, and layout tools that let you iterate visually — sketch an idea, have the AI refine it, adjust, iterate. The feedback loop is faster and more intuitive than describing visual changes in text.
Annotation and overlay. AI that can annotate existing content — highlighting key passages in a document, flagging anomalies in an image, marking areas of concern in a codebase — adds a visual layer of intelligence to your existing workflows.

The challenge is design. A bad visual interface is worse than no interface at all — it creates confusion, hides information, and adds cognitive load rather than reducing it. The principles of meta-cognition apply directly: the interface should help you think, not distract you from thinking.

How does direct manipulation reduce errors compared with describing what you want in words?

By letting you show the AI — dragging, drawing, highlighting — it shrinks the translation gap between your intent and the AI's understanding.

When is a visual interface worse than having no interface at all?

When it is badly designed: it then adds cognitive load, hides information, and creates confusion instead of reducing it.

AR, VR, and Spatial Computing

Spatial computing represents the frontier of AI interfaces — the possibility of embedding AI assistance directly into your physical environment. Instead of switching to a screen to ask AI a question, the information appears where you need it, when you need it, anchored to the physical objects and spaces you are working with.

This is not science fiction. Augmented reality headsets can already overlay information on physical objects. A technician repairing equipment can see step-by-step instructions floating next to the component they are working on, generated by an AI that understands the equipment model, the specific fault, and the technician's skill level. A surgeon can see real-time data overlaid on the patient's body. An architect can walk through a building that does not yet exist.

The implications for augmented intelligence are profound. Spatial interfaces dissolve the boundary between "using AI" and "doing work." You are no longer switching between your task and your AI assistant — the assistance is woven into the task itself. This reduces cognitive switching costs, preserves flow state, and enables forms of human-AI collaboration that are impossible on a flat screen.

The technology is still maturing. Current devices are expensive, bulky, and limited in field of view. Where the trajectory leads is still an open question — but the paradigm it points to, AI as an invisible layer of intelligence over your physical reality, is one candidate for the next generation of augmented intelligence tools. Whether it gets there is worth watching for yourself.

What boundary does spatial computing dissolve that flat-screen interfaces preserve?

The boundary between "using AI" and "doing work" — the assistance is woven into the task instead of living on a separate screen.

The UX Challenge: Making AI Accessible

The most capable AI in the world is useless if people cannot figure out how to use it effectively. This is the core UX challenge of augmented intelligence, and it is harder than it appears.

Traditional software has discoverable interfaces — menus, buttons, tooltips. You can explore and learn by clicking around. AI systems, particularly those driven by natural language, have an "empty text box" problem: the interface gives no indication of what the system can do, how to ask for it, or what good input looks like. This is why so many people underuse AI — not because the capability is missing, but because the interface fails to reveal it.

Good AI interface design addresses this through:

Progressive disclosure. Start simple, reveal complexity as the user's skill grows. Do not present every capability at once.
Suggested actions. Show the user what they can do, not just wait for them to figure it out. Pre-built prompts, template galleries, and contextual suggestions lower the barrier to effective use.
Transparent limitations. Make it clear what the AI cannot do, where it is likely to make errors, and when the user should verify output. Hiding limitations does not make them go away — it makes them dangerous.
Feedback loops. Let users indicate when output is good or bad, and use that feedback to improve future interactions. The interface should get better the more you use it.

A question to put to any interface: does it make your knowledge more powerful, or replace the need for knowledge? One test: does it amplify your existing capability, or create a dependency? You decide which side a given tool lands on.

What is the "empty text box" problem in natural-language AI interfaces?

The interface gives no indication of what the system can do or what good input looks like, so people underuse capable AI.

What test does this page offer for judging an AI interface against your knowledge?

Ask whether it makes your existing knowledge more powerful or replaces the need for it — whether it amplifies your capability or creates a dependency.

The trap: "A chatbot is the only — or best — way to use AI"

Because the chat window was most people's first encounter with AI, it is easy to treat it as the natural home for every task. It is not. The chatbox is just one interface among many, and for spatial, visual, hands-busy, or in-the-world tasks it is often the worst fit — it forces you to translate everything into prose and back again. Reaching for chat by reflex quietly caps what you can do. The skill is not "prompt better"; it is noticing when the task wants your voice, your hands on a canvas, or information laid over the thing in front of you.

Why is treating the chatbox as the default way to use AI a trap?

Chat is just one interface among many. For spatial, visual, hands-busy, or in-the-world tasks it forces you to translate everything into prose and back, so defaulting to it quietly caps what you can do — the skill is matching the interface to the task.

Try it on your own work

Pick one task you currently do with an AI chatbot — something you keep typing back and forth about.

Name the task and notice what you are really doing: are you describing something spatial or visual, working with your hands full, or moving around in the world? That tells you which sense the task lives in.
Match an interface to that sense: a visual or direct-manipulation tool for anything you'd rather show than describe, voice for hands-busy or exploratory thinking, spatial/overlay tools for work tied to physical objects.
Do the task that way once, end to end, and compare it against the chat version — count the round-trips and notice where the effort went.

If the new interface let you make the judgment directly instead of translating it into prose, you have found a place where switching modality, not prompting harder, is what makes AI an extension of your thinking.