Pocket Socrates: Privacy Architecture

Identity resolution · PII detection · abstraction pipeline · field-level encryption · right to erasure

External Services (not part of PocketSoc platform)
Auth provider
Clerk
Issues signed JWTs containing real userId. Handles authentication, session management, and MFA. PocketSoc receives the JWT; it never stores the raw userId in its own database.
JWT · userId · session
Billing provider
Stripe
Manages subscription state and payment processing. Stripe's customerId is a separate identifier that does not map directly to Convex user records or pseudoIds. No PII flows between Stripe and the reflection data layer.
customerId · subscription tier ↕ Clerk and Stripe do not communicate with each other
AWS Lambda · Isolated AWS account
Bridge: Identity Resolution Service
On first auth, the Bridge generates a cryptographically random UUID as the user's pseudoId. It is not derived, hashed, or computed from any identifier; it has no mathematical relationship to the Clerk userId or Stripe customerId. It is looked up, never calculated.
The Bridge writes the three-way mapping (pseudoId ↔ Clerk userId ↔ Stripe customerId) to DynamoDB, then returns only the pseudoId to the application layer. Subscription tier is returned as a metadata claim alongside the pseudoId. No Stripe identifier is ever passed to Convex.
crypto.randomUUID() · Node.js · AWS us-east-1
AWS DynamoDB · KMS encrypted at rest
Three-Way Identity Mapping Table
Stores: { pseudoId ↔ clerkUserId ↔ stripeCustomerId }. This table must exist because the pseudoId is randomly generated; it cannot be computed from either external identifier. Clerk and Stripe also issue different, unrelated identifiers, so neither can be derived from the other.
Why clerkUserId? Every auth request arrives as a Clerk JWT containing the real userId. Without the mapping, there is no way to resolve which Convex pseudoId that user's session belongs to.
Why stripeCustomerId? Stripe webhooks (subscription changes, payment events) arrive with only the Stripe customerId. Without the mapping, subscription status cannot be applied to the correct pseudoId in Convex.
Accessible only to the Bridge Lambda, with no IAM path from the application layer. Encrypted at rest via AWS KMS (CMK). On deletion: entry is nulled, severing both links while preserving the slot.
AES-256 · CMK · isolated account
pseudoId only. No real identity crosses this boundary.
Entry point
User Input: The Crucible
Raw text entered during a Thread. May contain names, dates, locations, religious or political identifiers, medical information, or other PII. Keyed only to pseudoId from this point forward.
plaintext · pseudoId-scoped
Real-time scan · every message
Pocket Soc: PII Detection Layer
Pocket Socrates scans each message before it reaches Soc's context window. Detects: person names, location, date, medical, religion, political affiliation, phone, email, government ID. Returns structured JSON with flagged excerpts and entity types.
Presidio (Microsoft NLP) is used separately in the offline analytics ETL pipeline, not in real-time message processing.
claude-haiku-4-5 · structured JSON output
no PII detected
PII flagged
Passthrough
Forwarded to Soc
No PII detected. Input enters Soc's context window directly and the Thread continues.
In-app · User-facing
Context Card
An inline privacy card surfaces in the chat stream, visually distinct from Soc's messages. All flagged items listed in one card. User authors their own abstraction first (therapeutic intent: isolating the variables). Soc suggests an alternative if requested.
Single confirm writes all abstractions to memory. Card collapses to "Context saved." Soc's response below the card naturally references the abstraction.
user-authored first · Soc-suggested fallback
Pre-write · Silent · All artifact surfaces
Pocket Soc sanitizeForWrite: Secondary Scan
A second pass fires silently before every write to Records, Roots, Echoes, and Context Documents. Catches any PII that survived conversational abstraction. Auto-abstracts without surfacing a Context Card; this is a backend guard, not a user interaction.
sanitizeForWrite · piiGateway.ts
Convex DB · pseudoId-keyed only
Reflection Data Store
AES-256-GCM field-level encryption no real identity stored
threads · messages
Conversation history. Solo Thread messages are client-side only, not persisted to Convex.
records
Completed Thread outputs. Soc-generated titles, summaries, and Named Doors. Abstracted content only.
roots · echoes
Persistent memory artifacts. PII-free guaranteed; Haiku double-pass enforced before write.
contexts · xpEvents
Thematic domains and XP ledger. Ledger is append-only; entries are never mutated or deleted.
abstractions
Stored user-authored and Soc-suggested abstractions. Encrypted at field level.
piiAbstractionLog
Audit record of each detected event: entity type, surface, resolution. No raw PII stored.

The offline analytics pipeline runs entirely outside the application layer on a scheduled basis. It never touches the Bridge or the DynamoDB mapping table. Its input is already pseudoId-keyed, AES-256-GCM encrypted Convex data. Its output is a PostgreSQL analytics store with no individual-level identifiability. This data is used for analytics and training purposes, such as when users submit feedback on Soc's responses.

ETL trigger · scheduled
Fetch & Decrypt
Authenticated batch export from Convex via a gateway secret. AES-256-GCM field-level encryption is decrypted in the ETL process; plaintext never leaves the pipeline boundary.
analyticsGateway · batchFetch
Microsoft Presidio · NLP batch scan
Presidio Scrub
Presidio Analyzer detects PII entities in decrypted message text. Presidio Anonymizer replaces detected entities with type tags: <PERSON>, <LOCATION>, <DATE>, etc.
Threshold: 0.75 standard · 0.5 strict for messages already pre-screened by Haiku at write time. A has_pii audit flag is written if residual PII is detected despite upstream Haiku passes.
Analyzer + Anonymizer services · spaCy NER
Salt re-anonymization · monthly rotation
anonId Generation
Every pseudoId and record ID is re-anonymized before entering the analytics store: anonId = SHA-256(ANON_SALT + pseudoId).slice(32).
The salt rotates monthly. Monthly datasets therefore cannot be cross-joined to each other, and the analytics store cannot be linked back to Convex records even if both were compromised simultaneously.
SHA-256 · rotating salt · one-way
Content reduction · structural only
Data Minimization
Thread rows store structural metadata only: type, stage, status, root/echo counts. No free text.
Message rows store: word count, scrubbed text (Presidio-cleaned), PII entity types found (not excerpts), and a boolean for whether Haiku had pre-screened the message at write time.
no raw content · no excerpts · counts only
PostgreSQL · analytics store
Anonymized Output
Tables: anon_threads, anon_messages, etl_runs. No real identifiers. No pseudoIds. All IDs are salted SHA-256 hashes.
Message and thread rows include feedback columns: feedback_rating, feedback_comment_scrubbed (Presidio-cleaned). User-submitted feedback is routed through the same anonymization pipeline; it is never reviewed using raw Convex data.
no real identity · no pseudoId · anonId only
Trigger
Deletion Request
User initiates from Account Settings. deletionRequestedAt timestamp written. Account access immediately suspended.
Grace period
30-Day Hold
Cancellation window for accidental requests. Data intact but inaccessible. Cancel resets deletionRequestedAt to null.
Convex cron · daily
processPendingDeletions
Identifies accounts past the 30-day threshold. Executes cascading wipe. Calls Bridge to null the DynamoDB mapping entry.
threads · messagesfull delete
records · roots · echoesfull delete
contexts · xpEventsfull delete
abstractions · piiLogfull delete
Bridge mapping entrynulled · KMS
users rowtombstoned