How versim builds a persona. The methodology.

What a persona actually is, here

A versim persona is a typed role-play of a specific real person from a community we have permission to model. Not a generic "senior X" stereotype. One person's career, tools, tones, opinions, and signature phrasing, compressed into a structured profile and made queryable.

The real person never sees your question. You never see the real person. The anonymiser is the only path between you.

the pipeline

Six stages, archive to answer

Archive in, anonymized when we load it. We start with a community's archive: messages, threads, timestamps, social context. We only work with archives we have permission to use. Identifiers are stripped at load time, so the rest of the pipeline never sees raw names, handles, or contact information. Communities can request deletion at any time.
A dossier per person. For each real person in the archive, we build one structured profile. It covers their career, the tools they use and how deeply, their speaking style, their tone in conversation, rough seniority and current goals, who they listen to, personal details, and a short summary. Every claim points to the exact message that supports it, checked against the source text at build time. Made-up content is a bug. We reject it in tests.
A team of agents builds it, not one big agent. Different aspects of a person need different lenses. We use a team of small specialised agents: one for career, one for tools, one for speaking style, one for tone, and so on. Each agent reads the archive and returns its specific slice with sources. One big monolithic agent would lose specificity and make things up. Many small agents stay focused.
The anonymiser is a one-way valve. Nothing crosses to you without going through it. The anonymiser strips real user IDs, message IDs, handles, display names, exact source-text spans of fifteen words or more, and any blocked phrases on the per-archive list. What passes through: anonymized response text, public labels like persona_7_of_8 that don't link back to anyone, and a count of how many spans were removed. You never see a dossier, a real name, or a long exact quote.
A panel builder picks who answers. You don't ask one persona. You ask a panel. The builder takes typed filters: size, role tier, activity level, tone exclusions, tool requirements, location, current goals, and picks a balanced sample matching them. The build is repeatable: the same filters with the same starting point give you the same panel twice.
The panel answers, a summarizer combines. Each persona in the panel reads the question and answers in their own voice, grounded in their dossier plus a balanced sample of their real messages for cadence. Each claim is linked to evidence. The per-persona answers go through the anonymiser, then to a summarizer agent that builds the panel's collective answer, surfaces themes, and counts the sentiment split. That's what reaches you.

stage 2 · zoomed in

What a dossier looks like

A dossier is the long-form output of the team of agents. A JSON document with typed fields. The current shape covers:

Career timeline

Roles, how long, transitions, role types.

Tools and depth

Which tools they actually use, and how deeply, based on how they talk about them.

Speaking style

Formal, casual, terse, verbose; how it shifts with topic.

Conversation tone

Friendly-casual, helpful-terse, sarcastic-but-helpful, complaining, self-deprecating, and so on.

Seniority + goals

Rough seniority and what they're currently trying to do: looking for info, switching jobs, hiring, investing, selling.

Who they listen to

Whose advice they take, who they push back on, who they ignore.

Personal details

Hobbies, opinions, location, signature humor, idioms, emoji.

Summary

A short summary that ties everything together.

Every entry in every field points to the exact messages that support it. If the dossier claims someone is a value investor, there is a real message backing that claim, with the exact text saved and checked against the source. The dossier itself stays private. It never leaves our infrastructure.

stage 5 · zoomed in

Building a panel

A panel is a typed sample drawn from the dossier set. The builder accepts filters like these:

Size — number of personas (typically 5 to 30).
Role tier — for example, only senior individual contributors.
Activity level — light, standard, or heavy posters.
Tone exclusions — drop toxic, gatekeeping, or other tones you don't want.
Tool requirements — must use a specific tool, optionally at a given depth.
Locations — markets or regions.
Goals — looking for info, investing, hiring, selling, and so on.

The build is repeatable. Two runs with the same filters and the same starting point give you the same panel. Two runs with the same filters and different starting points give you different but statistically matched panels. Either way the build is repeatable and inspectable.

stage 6 · zoomed in

How a persona answers

When a question comes in, each persona in the panel:

Reads its full structured dossier (career, tools, tone, personal details, summary).
Reads a balanced sample of its real messages, for voice cadence. The sample is picked for spread across topics, not concentrated on one thread.
Receives the question exactly as you wrote it.
Produces an answer in the source person's voice, grounded in dossier facts and message phrasings.
Links each claim to evidence: which dossier entries and which message spans support the answer.

The persona answer goes through the anonymiser before reaching the summarizer. The summarizer never sees the dossier or the raw answer. You never see them either.

the one-way valve

The trust boundary, in detail

The anonymiser is the only path from real-person-grounded text to anything you see. It runs on every response, every time, with no toggle to turn it off.

Removed at the boundary

User IDs and message IDs from the source archive
Real names, display names, and handles
Exact source-text spans of fifteen words or more
Blocked phrases on the per-archive list
Contact information of any form

Passes through to you

Anonymized answer text in the source community's voice
Public persona labels (e.g., persona_07_of_30) that don't link back to any real person
A count of how many spans were removed

You never receive a real name, a real handle, or a long exact quote. Per-account question limits prevent reconstruction by repeated querying. The anonymiser is enforced by code and tested on every release.

honest scope

What this doesn't yet cover

We're committed to publishing the methodology and to being clear about its current scope.

One archive today, more in progress. The pipeline runs end-to-end on one curated community archive. Replication across more archives is in progress, and each new archive gets its own validated build.
Same model family across stages. The team of agents that builds dossiers, the persona that answers, and the judge that scores all currently use the same LLM family. Cross-model checking (a different model as the judge) is on the roadmap.
Static dossiers. Dossiers are built once and updated on a schedule. They do not learn from your questions, by design. That's a leakage vector we don't want.
Multi-turn behaviour is exploratory. Panel questions (one question, one answer) ship with a fidelity number. Multi-turn simulations (the Sim surface) are coming, but we don't yet have a fair test that maps cleanly to multi-turn behaviour.

How we measure fidelity — the fair test that scores a persona's match to the real source. 17.80/20 grounded vs 10.60/20 ungrounded on the same model.
How we treat the data — the privacy posture. What stays private, what you ever see, how communities can opt out, and what we will never do.
Frequently asked questions — plain answers to common questions about versim, the data, pricing, and the waitlist.
Back to versim — the landing page, with the sample panel run and the demo.

← Back to versim