privacy · what we do with the data
How we treat the data.
versim turns real online communities into queryable personas. This page lays out exactly what we do with the underlying data, what we don't do, and what communities can ask of us at any time.
Our commitments at a glance
private
sold
any time
where the data comes from
We only work with archives we have permission to use.
We don't scrape private channels. We don't buy or trade scraped archives. We don't work with data of unclear origin.
The archives we work with come from one of two paths:
- Open community archives that the community has itself chosen to make publicly accessible.
- Direct partnerships with community owners who have invited us in to model their community.
Either way, we have a clear legal basis to use the archive before any of the rest of the pipeline runs.
what stays private, what you see
You never see the source data.
The trust boundary is the anonymiser. It's the only path from real-person-grounded text to anything you ever receive.
Stays with us, always
- The source archive (raw messages)
- Dossiers (structured profiles per person)
- Real names, handles, display names, email addresses
- Contact information of any form
- Anything that could identify a real person
Visible to you
- The panel's combined answer (anonymized)
- Per-persona answers, labeled
persona_NN_of_MM - Themes the summarizer surfaced
- Sentiment count (positive, neutral, negative, mixed)
- The fidelity score and the panel filters that were used
how the anonymiser protects the source
The anonymiser is enforced by code, not policy.
What it removes from any output before it reaches you:
- User IDs and message IDs from the source archive
- Real names, display names, and handles
- Exact source-text spans of fifteen words or more
- Blocked phrases on the per-archive list
- Contact information of any form
It runs on every persona answer and on the summarizer's output. It is tested on every release. A failure to anonymize is a hard release blocker, not a warning to be reviewed later.
For more on the engineering, see the methodology page.
community rights
Communities can opt out at any time.
-
Right to be removed. A community can ask to be removed at any time, for any reason. We don't ask why. Email hello@versim.ai from an address connected to the community.
-
What gets deleted. The source archive. All dossiers built from it. Any cached panel output that draws on it. Any fidelity scores computed from it. We delete derived artefacts too, not just the raw data.
-
How fast. Promptly. Days, not months. We confirm completion in writing once the deletion is done.
-
Audit on request. We can show you exactly what was held and when it was deleted. The deletion runbook is exercised on a synthetic archive in our tests, so we know it works.
commitments to the source community
What we will never do.
Hard "no"s
- Sell, share, or license the underlying archive to anyone
- Extract contact information from the archive
- Enable advertising or marketing automation against the source community
- Train other models on the source data
- Publish the source archive or its raw contents
- Surface real names, handles, or contact data in any output
- Pretend a synthetic answer is from the real person
safeguards in code
Enforcement, not policy.
Privacy commitments only matter if they're enforced. We enforce ours in the code path, not just in this document.
- Anonymiser is a hard gate. Every persona answer and every summarizer output passes through it. Bypass is not a configurable setting.
- Test on every release. A regression test runs the anonymiser against a synthetic-but-realistic test panel. If any identifier or long exact source-text span slips through, the build fails.
- Per-account question limits. When the hosted product launches, every account will be capped on questions per panel and questions per day. The cap exists to prevent reconstruction-by-repeated-querying.
- IP addresses are stored as salted hashes. We never log or persist the raw IP of anyone who hits our waitlist or eventual product. The salt rotates.
- Audit logs of account activity. Useful for abuse detection. They don't store account data beyond what is needed to enforce the question limits.
honest scope
What we're still figuring out.
We're committed to publishing the privacy posture and to being clear about what isn't yet locked down.
- Per-archive variation. Different partnerships may have different terms. Some communities may ask for stricter rules than this baseline. We respect the strictest applicable rules per archive.
- Cross-archive panels. A panel that draws from more than one community archive enforces the strictest per-archive rule across all included sources. We are working out the consent flow for partners who want their archive included in cross-archive panels.
- Compliance certifications. SOC 2, ISO 27001, and similar certifications are on the roadmap once the hosted product launches. Today our claim is structural (anonymiser, code-enforced, testable), not certified.
- Regulatory regimes. Specific compliance (GDPR, CCPA, sectoral laws) varies by jurisdiction. We are designing toward the strictest regimes by default. If you have a specific compliance requirement before signing on as a customer, get in touch.
Related
- How we build a persona — the end-to-end pipeline. The anonymiser sits between every persona's grounded answer and you.
- How we measure fidelity — the fair test that scores a persona against real source messages. The score is a published number, but the source data behind it stays private.
- Frequently asked questions — plain answers to common questions about versim, the data, pricing, and the waitlist.
- Back to versim — the landing page, with the sample panel run and the demo.
Get in touch
Questions? Write to hello@versim.ai.