How to Work With AI Effectively: A Practical Guide to Prompts, Memory, and Token-Efficient Conversations

The single biggest predictor of whether you get a useful answer from an AI assistant is not the model — it is how you drive the conversation. The same system that produces a vague, hedging response for one person produces a sharp, usable one for another, because the second person framed the request well, managed the context, and knew when to start over.

This guide covers the four levers that matter most, in order: how you phrase requests, how you control the model’s “memory,” when to start a new conversation, and how to do all of it without burning tokens. The advice draws on the published prompting guidance from Anthropic and OpenAI, distilled into habits you can apply in any chat window.

1. Phrase requests so the model can’t misread them

Most weak answers trace back to a weak prompt. A model cannot read your intent — it can only read your words — so the goal is to remove ambiguity, not to be polite or elaborate.

Lead with the goal and the output format. Say what you want and how you want it back (“Give me a 5-bullet summary,” “Return valid JSON,” “Write 3 subject lines”). Format instructions do more work than any adjective.
Give the minimum context that changes the answer. Audience, constraints, tone, and the decision you’re trying to make — not your entire backstory. Relevant context sharpens the answer; irrelevant context dilutes it (and costs tokens).
Show one example when the format is specific. A single input/output example (a “few-shot” example) communicates a pattern far better than describing it. Examples beat adjectives.
Assign a role. “You are a senior security engineer reviewing this code” sets vocabulary, standards, and what the model treats as important.
Structure long prompts. Separate your instructions from your data with headings or tags so the model never confuses the two. Anthropic recommends XML-style tags; OpenAI recommends clear sections — identity, instructions, examples, context.
Ask for reasoning only when it helps. For genuinely hard problems, “think step by step” improves accuracy. For simple lookups it just wastes tokens.
Say what not to do and how long to be. “No preamble, under 150 words” prevents the wall of text most models default to.

2. Control the model’s “memory” — don’t leave it to chance

A chat model has no memory in the human sense. Each turn, it re-reads the visible conversation from scratch. What feels like “memory” is really two separate things: the rolling context window (everything currently visible in the thread) and the product’s explicit memory or custom-instructions features. Manage both deliberately.

Put durable rules where they persist. System prompts, custom instructions, “project” instructions, or a standing instructions file are re-applied on every turn. That is where standing preferences belong — not buried in message #3 of a long chat.
Lock the key facts by restating them. Models weight recent tokens heavily, so re-state the two or three constraints that must not be violated right before a critical step, even if you said them earlier.
Use the memory feature on purpose. If your tool can save facts across chats, save the ones you actually want remembered — and periodically review and prune what it has stored, so stale facts don’t quietly steer future answers.
Keep a reusable context block. Maintain a short paragraph you can paste into any new chat: the goal, the hard constraints, the decisions made so far, and the current state. This is your portable memory.
Watch for “context rot.” Once a thread fills with detours, corrections, and abandoned ideas, the model starts honoring stale or contradictory instructions. That is your signal to reset.

3. Know when to start a new conversation

Long threads are not free, and they are not always better. A fresh conversation is often the highest-leverage move you can make. Start over when:

The topic has materially changed. Old context now adds only noise and cost, and can bias the answer toward the previous subject.
Quality is degrading. The model repeats itself, forgets instructions you just gave, contradicts earlier turns, or loops on a fix that doesn’t work. A bloated context dilutes its attention — a reset usually beats arguing with it.
You’ve pivoted after many dead ends. Don’t drag the wreckage forward. Carry a clean summary instead.
The thread has grown huge. Every turn re-sends the entire history, so a long chat gets slower and more expensive with each message (more on that below).

The clean way to reset: ask the model for a tight handoff summary — decisions made, current state, open questions — start a new conversation, and paste that summary as your context block. You keep the signal and drop the noise.

4. Get the best result with the fewest tokens

Tokens are the unit AI models read and bill in, and the whole visible conversation is re-read and re-charged on every turn. That means token efficiency and answer quality usually move together: a lean, focused context produces both cheaper and better responses.

Send the relevant slice, not the whole thing. Paste the one function, not the entire repository; the section, not the whole document.
Summarize and prune. Replace a long pasted history with a five-line summary, and clear dead branches by starting fresh rather than scrolling past them.
Reference instead of re-pasting. “Using the spec above” is cheaper than pasting the spec a second time.
Batch related questions. One well-structured message with three questions beats three separate round-trips, each of which re-reads the whole thread.
Front-load stable content. If your tool supports prompt caching, putting reusable instructions and context at the very start of the prompt lets them be cached and reused cheaply across calls — a tactic both OpenAI and Anthropic call out explicitly.
Match the model to the job. Use a smaller, faster model for routine work and reserve the frontier model for genuinely hard reasoning. Anthropic notes that cost and latency are often better solved by choosing a different model than by prompt engineering.
Constrain the output. If you don’t need an essay, cap the length. Output tokens cost too.

The 60-second cheat sheet

Prompt: goal first, format explicit, minimum context, one example, a role, no ambiguity.
Memory: durable rules go in system/custom instructions; restate critical constraints; keep a reusable context block.
Reset: when the topic shifts, quality drops, or the thread balloons — summarize, start fresh, paste the summary.
Tokens: send the relevant slice, reference don’t repaste, batch questions, cache stable content, right-size the model.

None of this requires a more powerful model or a paid course. It is a handful of habits — be explicit, manage context on purpose, reset early, and keep the conversation lean — that compound into noticeably better and cheaper results every time you open a chat window. And if your work lives in a workspace instead of a chat box, the same habits carry over to Notion AI.

How to Work With AI Effectively: A Practical Guide to Prompts, Memory, and Token-Efficient Conversations

1. Phrase requests so the model can’t misread them

2. Control the model’s “memory” — don’t leave it to chance

3. Know when to start a new conversation

4. Get the best result with the fewest tokens

The 60-second cheat sheet

Want AI news before everyone else?

Best Open-Source LLMs in 2026: The Models and How to Run Them

DeepSeek V4: The Complete Guide to the Open-Weight Frontier Model

Grok by xAI: The Complete Guide to Models, Features, and the API

Perplexity: The Answer Engine That Replaced Google Searches for AI Users