phase-1 · email-scanner · 2026-06-25

Reading mail without touching it

The email scanner must never alter mailbox state. A scanner that marks mail as read — or worse, moves it — is indistinguishable from the attacks it is supposed to catch. So every fetch is read-only: BODY.PEEK throughout, on a read-only SELECT.

The fetch itself is two-phase. Phase one pulls the headers and the BODYSTRUCTURE — the full MIME tree — with zero body bytes transferred. Phase two fetches only the text parts the feature extractor actually needs.

Hard caps bound the worst case: any single MIME part over 5MB is skipped and recorded rather than fetched, and a running total stops the fetch entirely past 25MB.

The privacy boundary holds here too: bodies are parsed for structure and indicators. Raw content never leaves the machine.

49421bc