<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[StepToCyber]]></title><description><![CDATA[Security engineering leader at a Fortune 500 company. Actively pursuing an MS in AI. Writing real-world lessons on securing application and AI adoption at enterprise scale."]]></description><link>https://www.steptocyber.ai</link><image><url>https://substackcdn.com/image/fetch/$s_!e2u-!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7f45848-286d-4aa4-a95c-2b56c7ae763d_600x600.jpeg</url><title>StepToCyber</title><link>https://www.steptocyber.ai</link></image><generator>Substack</generator><lastBuildDate>Sat, 13 Jun 2026 03:01:31 GMT</lastBuildDate><atom:link href="https://www.steptocyber.ai/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[StepToCyber]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[steptocyber@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[steptocyber@substack.com]]></itunes:email><itunes:name><![CDATA[StepToCyber]]></itunes:name></itunes:owner><itunes:author><![CDATA[StepToCyber]]></itunes:author><googleplay:owner><![CDATA[steptocyber@substack.com]]></googleplay:owner><googleplay:email><![CDATA[steptocyber@substack.com]]></googleplay:email><googleplay:author><![CDATA[StepToCyber]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[One Filter Is Not a Safety Strategy: What the Grok Failure Teaches Every Security Leader About AI Safety]]></title><description><![CDATA[Why model-level filters keep failing, what the research says, and what to build instead.]]></description><link>https://www.steptocyber.ai/p/one-filter-is-not-a-safety-strategy</link><guid isPermaLink="false">https://www.steptocyber.ai/p/one-filter-is-not-a-safety-strategy</guid><dc:creator><![CDATA[StepToCyber]]></dc:creator><pubDate>Sat, 18 Apr 2026 13:06:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!e2u-!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7f45848-286d-4aa4-a95c-2b56c7ae763d_600x600.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Four months ago, xAI promised to stop Grok from generating nonconsensual sexualized images of real women. This week, NBC News reported it is still happening. The bypasses are not sophisticated. Users pair a photo of a real person with a stick-figure pose diagram and tell Grok to &#8220;match the pose.&#8221; Or they ask Grok to swap the clothing between two images. Or they upload a photo and ask for a video transformation. The filters xAI promised do not catch any of it.</p><p>One independent analyst now believes Grok produces more nonconsensual synthetic nudity than every comparable tool combined.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.steptocyber.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>xAI&#8217;s publicly described controls amount to model-level filtering &#8212; and the company is now arguing in a Dutch court that it cannot stop all abuse and should not be penalized when malicious users bypass those controls. That is the opposite of defense in depth.</p><h2>Defense in Depth Is Not a Slogan</h2><p>Defense in depth is a design principle. It assumes every control will fail.</p><p>You layer perimeter, network, endpoint, identity, monitoring, and response so that when one layer is breached, the next catches what got through. Each layer is a different control against a different failure mode, often from different tools or different teams. That is the architecture. A single filter is not.</p><p>xAI&#8217;s Dutch court argument fails a basic test of secure design. CISA&#8217;s Secure-by-Design principles place responsibility for safety on the system&#8217;s operator, not on end users. Arguing that malicious users are responsible when controls are bypassed does not meet that bar.</p><h2>Grok and the OWASP LLM Top 10</h2><p>The OWASP Top 10 for LLM Applications (2025) is the industry reference for critical risks in LLM-based systems. Grok&#8217;s public behavior maps directly onto one category &#8212; and exposes a gap in the framework itself.</p><p><strong>LLM01: Prompt Injection.</strong> Prompt injection has held the top spot in the OWASP list for two consecutive editions because LLMs process instructions and data in the same channel without clear separation. The model cannot tell them apart.</p><p>This is not a Grok-specific problem. A 2025 paper introduced Cross-modal Adversarial Multimodal Obfuscation (CAMO), a black-box attack framework that splits harmful instructions into benign-looking textual and visual clues. Each component looks harmless on its own. The model reconstructs the attack intent through cross-modal reasoning. CAMO achieved attack success rates of 81.82% on GPT-4.1-nano and 93.94% on DeepSeek-R1 using 12.6% of the tokens required by older attack methods.</p><p>The Grok bypasses exploit the same vulnerability class that paper documented &#8212; individually benign inputs that become harmful in combination. The difference is that CAMO uses automated adversarial optimization. Grok&#8217;s users did not need any of that. They combined unmodified photos with hand-drawn diagrams and plain-language instructions. The filters failed against a basic, manual version of a well-documented attack class. The class itself was publicly documented before xAI shipped these features &#8212; Shayegani, Dong, and Abu-Ghazaleh published compositional cross-modality attacks at ICLR in 2024, based on work from 2023.</p><p><strong>Where the framework stops.</strong> The Grok case also involves insufficient content filtering on generated output and capabilities shipped without proportionate controls. These are real failures, but they do not map cleanly onto a second OWASP LLM Top 10 category. LLM05 (Improper Output Handling) addresses output passed to downstream systems without sanitization &#8212; XSS, SQL injection, remote code execution &#8212; not harmful content shown directly to users. LLM06 (Excessive Agency) addresses agents calling functions and extensions, not generative models producing content. The OWASP LLM Top 10 was designed for LLM applications integrated into software systems. Consumer-facing generative AI &#8212; where the output is the product &#8212; sits partially outside the framework's current scope.</p><h2>The Stakes Get Higher with Agents</h2><p>Grok generating an image is the low-stakes version of this problem. The failure mode is bad output. When this class of model gets <em>agency</em> &#8212; tools, memory, authority to take action &#8212; the failure mode stops being bad output and starts being bad actions.</p><p>The OWASP Top 10 for Agentic Applications (2026), released in December 2025, is the framework for that next-stage problem. It was built with dozens of security experts from industry, academia, and government and is based on real attacks observed in production.</p><p><strong>Agent Goal Hijack (ASI01).</strong> An attacker changes an agent&#8217;s objectives through malicious content. The same prompt injection that bypassed Grok&#8217;s image filters can hijack an agent into sending an email, modifying a record, or calling an API on an attacker&#8217;s behalf.</p><p><strong>Identity and Privilege Abuse (ASI03).</strong> An AI agent acts with the full authority of every key, token, and service account assigned to it. A single agent merges multiple permissions into one execution point. Compromise the agent, and you inherit every non-human identity it holds. Identity runs through most of the top risks in the OWASP Agentic Top 10.</p><p><strong>Cascading Failures.</strong> A compromised agent does not produce one bad output and stop. It chains actions across connected systems. It exfiltrates through the same channels it was authorized to use.</p><p><strong>The model is not your security boundary. The model &#8212; and everything you let it do &#8212; is the thing being contained.</strong></p><p>When the model shares its blast radius with production systems, shares its identity with the user, shares its network egress with sensitive data &#8212; you have not deployed AI safely. You have deployed a Grok-class failure waiting for the right prompt.</p><h2>Five Categories of AI Safety Controls &#8212; And Why No Single Category Is Enough</h2><p>Defense in depth requires controls at different layers, using different methods, catching different failure modes. In AI safety, those controls fall into five categories. Whatever xAI deployed, the publicly visible bypasses on X confirm it was not enough. Here is what the full surface looks like, and where each category fails when the inputs are multimodal.</p><h3>Category 1: Model-Level Controls</h3><p>Safety training built into the model itself. RLHF alignment, refusal training, Constitutional AI, concept erasure &#8212; techniques that modify the model&#8217;s weights to make it refuse harmful requests or suppress harmful outputs.</p><p>This is what most people mean when they say &#8220;the model won&#8217;t do that.&#8221;</p><p>Model-level controls are useful but have the best-documented failure rates of any category. A 2026 survey of LLM jailbreaking found that automated attacks achieve 90&#8211;99% success on open-weight models, and 80&#8211;94% on proprietary models. The model cannot reliably separate instructions from content. That limitation is structural. RLHF hasn't fixed it.</p><p>Model-level controls are deliberately absent from the six-layer architecture below. The architecture assumes this layer will fail and builds everything else to catch what gets through.</p><h3>Category 2: Input Inspection</h3><p>Everything that evaluates the prompt before the model processes it. Prompt injection classifiers, jailbreak detectors, topic deny lists, PII detection on inputs, input format validation.</p><p>Available implementations include Azure Prompt Shields, Meta Prompt Guard, NVIDIA NeMo jailbreak detection rails, and Amazon Bedrock&#8217;s prompt attack filtering.</p><p><strong>Where this category breaks in multimodal:</strong> Input inspection for text is a maturing control. The multimodal version is not. The problem is compositional attacks &#8212; inputs that are individually benign but harmful in combination. The &#8220;Jailbreak in Pieces&#8221; paper showed that pairing adversarial images with generic textual prompts breaks model alignment using only the vision encoder &#8212; no access to the LLM required.</p><p>The Grok bypasses are a simpler version of this attack class. The research attacks use adversarially optimized images. Grok&#8217;s users did not need that &#8212; they combined unmodified photos with hand-drawn diagrams and simple instructions. The filters failed against an unsophisticated version of a well-documented attack.</p><h3>Category 3: Output Evaluation</h3><p>Everything that evaluates the model&#8217;s response before it reaches the user. Content harm classifiers, LLM-as-judge implementations, NSFW image classifiers, PII redaction, groundedness checks, output format validation.</p><p>Content harm classification is the most widely deployed control in this category &#8212; present in every major platform. Azure AI Content Safety monitors four harm categories with adjustable severity thresholds. Amazon Bedrock Guardrails reports blocking up to 88% of harmful content. These classifiers detect harmful outputs when the harm is visible in the output itself. They do not detect harm that was invisible in the inputs and only emerged during generation.</p><p>Groundedness checks &#8212; verifying that the model&#8217;s output is based on provided source material &#8212; are shipped by Azure and Bedrock. These address accuracy, not content safety.</p><p><strong>Where this category breaks in multimodal:</strong> For text, LLM-as-judge works well when the judge is purpose-trained for safety evaluation. For images, the judge needs to be vision-capable and safety-trained on visual content. Few purpose-built visual safety judges exist &#8212; Llama Guard 3 Vision and ShieldGemma 2 are among the first. The effectiveness gap is measurable &#8212; the best-performing vision classifier in benchmarking studies shows F1 scores below 0.5 on categories like harassment and self-harm.</p><p>For video, the problem gets worse. The judge has to evaluate motion, context, and transformation across frames. This is the modality where Grok generates its most harmful output &#8212; photo-to-video transformations that are publicly visible on X, meaning whatever output evaluation exists in xAI&#8217;s pipeline did not prevent them from reaching users.</p><p>Three failure modes in LLM-as-judge are documented.</p><p>First, <strong>shared blind spots</strong>. When the judge and the generator share training lineage, they share failure modes. Research by Fu and Liu (EMNLP 2025 Findings) evaluated five models across 25 languages and found average inter-judge agreement at a Fleiss&#8217; kappa of approximately 0.3 &#8212; barely above chance. Liu et al. (ICLR 2025) found that some guard models flag responses as &#8220;unsafe&#8221; based on the user input alone, even when the model response is a single space token &#8212; meaning the guards are classifying the prompt, not the response.</p><p>Second, <strong>judge vulnerability</strong>. The judge is still a model. The same prompt injection techniques that compromise the primary model can compromise the judge. A 2026 survey found that automated judge agreement varies 70&#8211;93% depending on implementation.</p><p>Third, <strong>incomplete coverage</strong>. If cost constraints lead to evaluating a sample of outputs rather than all of them, the result is a statistical defense, not a security defense. An attacker who knows that not every output is checked can adjust accordingly.</p><h3>Category 4: Infrastructure Controls</h3><p>The controls around the model, not on it. Blast radius containment, network segmentation, identity federation, credential scoping, sandboxed execution, API rate limiting, data loss prevention, egress filtering.</p><p>This is the category where existing security expertise applies directly to AI deployment. Zero-trust architecture, least-privilege access, tenant isolation &#8212; these are not AI-specific. They are the same controls enterprises have used for decades, applied to a new class of system.</p><p><strong>In multimodal:</strong> Infrastructure controls are modality-agnostic. They do not care whether the model generates text, images, or video. They care whether the model has access to systems it should not, and whether a compromise propagates to connected systems.</p><h3>Category 5: Observability</h3><p>Runtime monitoring, behavioral detection, logging, audit trails, alerting, and incident response.</p><p>This category assumes the first four have failed. Runtime monitoring watches for anomalous model behavior &#8212; outputs that deviate from baselines, unusual tool invocations, data access patterns outside the agent&#8217;s scope. Logging makes incident reconstruction possible. Alerting and incident response make it actionable.</p><p><strong>In multimodal:</strong> Observability for AI systems is less mature than for traditional infrastructure. Most enterprises have monitoring for network traffic, endpoint behavior, and application logs. Few have equivalent monitoring for AI agent behavior or output distribution anomalies. The telemetry exists &#8212; model inputs, outputs, tool calls, guardrail triggers &#8212; but it is not routinely fed into SIEM platforms or monitored by security operations centers. The data is available. The pipelines to use it are not built yet.</p><h2>A Six-Layer Architecture</h2><p>The six-layer architecture is built from these five control categories, plus one precondition. Model-level controls (Category 1) are not a layer. The architecture assumes they will fail and builds everything else to compensate.</p><p><strong>Layer 1: Supply chain visibility (AIBOM).</strong> You cannot secure what you cannot inventory. Model provenance, training data origin, fine-tuning history, embedded safety controls, evaluation artifacts. A precondition for evaluating every layer that follows. Maps to LLM03.</p><p><strong>Layer 2: Input defense.</strong> Category 2 applied. Pre-model classifiers that flag bypass patterns, adversarial inputs, and known-bad prompts. For multimodal systems, classifiers that evaluate the composition of inputs &#8212; not each input in isolation. Maps to LLM01 and Agent Goal Hijack.</p><p><strong>Layer 3: Output defense.</strong> Category 3 applied. Post-model classifiers for every modality the system produces. This layer must use a different detection method than Layer 2. If both share training data or vendor lineage, they share blind spots. Output filtering should be structurally independent: a different model family, a rule-based policy engine, or an LLM-as-judge from a separate provider.</p><p><strong>Layer 4: Blast radius and exfiltration controls.</strong> Category 4 applied. The model does not share identity with the user. It does not share network egress with production data. It does not share credentials with other agents. Tools are scoped. Permissions are least-privilege. Agent actions are sandboxed. Maps to Identity and Privilege Abuse and Cascading Failures.</p><p><strong>Layer 5: Runtime monitoring.</strong> Category 5 applied. Layers 1 through 4 try to prevent bad outcomes. Layer 5 assumes they failed. It watches for anomalous behavior, logs everything, and alerts on deviations. This is the layer that catches attacks no classifier was trained on. Logging here is not optional &#8212; it is what makes incident reconstruction possible.</p><p><strong>Layer 6: Human oversight and incident response.</strong> Category 5 extended into action. For high-risk outputs &#8212; image generation involving real people, video generation, agent actions that modify production systems &#8212; a human review gate belongs in the pipeline. Not on every output. On outputs that cross a defined risk threshold. Behind that gate sits an incident response process: escalation paths, containment procedures, credential revocation, system isolation.</p><p>Every layer is imperfect. That is the point. If you can't answer what happens when one layer fails, you don't have defense in depth.</p><h2>What the Industry Is Shipping Today</h2><h3>Purpose-Built Multimodal Safety Classifiers</h3><p><strong>Llama Guard 4</strong> (Meta, 2025) is a multimodal safety classifier that evaluates prompts and responses across 14 hazard categories plus code interpreter abuse. <strong>Llama Guard 3 Vision</strong> (Meta, late 2024) was the first safety classifier built for LLM image understanding, evaluating prompt text and images together. <strong>ShieldGemma 2</strong> (Google) classifies images for sexual content, violence, and gore, and uses its own classifier in reverse to generate adversarial test images &#8212; red-teaming-as-training. <strong>NVIDIA NeMo Guardrails</strong> supports multimodal content safety with GPU-accelerated parallel execution, adding roughly half a second of latency for five parallel guardrails.</p><p>Every one of these is a single-layer control. None cover compositional cross-modal attacks. They are pieces of a stack, not the stack.</p><h3>Enterprise Guardrail Platforms</h3><p><strong>Azure AI Content Safety</strong> provides multimodal moderation, prompt injection detection, groundedness checks, and PII filtering &#8212; Microsoft&#8217;s documentation notes it is probabilistic and should be treated as a risk reduction tool, not a guarantee. <strong>Amazon Bedrock Guardrails</strong> filters harmful text and image content, blocks prompt injections, and redacts PII &#8212; AWS reports it blocks up to 88% of harmful content. <strong>Microsoft Foundry Guardrails</strong> applies classification at four intervention points: user input, tool call, tool response, and output &#8212; the tool call and tool response points are significant for agentic systems because they let guardrails inspect what an agent is about to do before it does it.</p><p>Every one of these platforms is built on classification models tuned for known harm categories. None cover compositional cross-modal attacks. They are a layer. The defense-in-depth architecture has to be built by the enterprise deploying them.</p><h2>Emerging Research and Open Problems</h2><h3>In-Generation Detection</h3><p>Current safety tools inspect the prompt or the output. A 2025 preprint introduced In-Generation Detection (IGD), which monitors the model&#8217;s internal state during the image generation process itself. It reads the predicted noise during diffusion denoising steps &#8212; a signal that reflects the evolving visual meaning of the prompt &#8212; and trains a lightweight classifier to detect NSFW intent before the image is fully generated.</p><p>IGD achieved 91.32% detection accuracy across seven NSFW categories, including adversarially crafted prompts. Because it reads internal model state rather than the prompt surface, it has the potential to catch adversarial inputs that are designed to look benign to external classifiers.</p><p>Currently demonstrated only for diffusion-based image generation. Does not extend to video, text, or multimodal-to-multimodal systems. Not shipping in any enterprise product.</p><h3>Proposed Directions for Compositional Attack Defense</h3><p>Security researchers have described three architectural directions that would address the compositional cross-modal attack class:</p><p><strong>Evaluate combined intent, not individual inputs.</strong> Safety systems should reason over the cumulative meaning of a full prompt sequence &#8212; &#8220;photo + stick figure + match the pose&#8221; as a single semantic intent, not three benign inputs evaluated separately.</p><p><strong>Share context across safety layers.</strong> The image classifier should see the original user request. The prompt guard should see the generated image. Without this, attackers can route harmful content through one modality to exploit blind spots in another.</p><p><strong>Decompose compositional inputs.</strong> Classifiers should identify compositional elements &#8212; diagrams, reference images, pose guides &#8212; within a larger input, and evaluate their meaning separately from the overall scene.</p><p>None of these are shipping in enterprise products. They represent where the field needs to go, and security architects should be asking vendors whether their roadmaps address them.</p><h3>The Swiss Cheese Model for AI Safety</h3><p>Researchers have proposed multi-layered runtime guardrails modeled on the Swiss Cheese Model from aviation and healthcare safety engineering. Each layer has holes. The principle is that no two layers have the same holes in the same place. The architecture decouples safety authority from any single model so each layer can be tested and updated independently.</p><h2>What a Security Architect Should Implement Today</h2><p>The research is ahead of the products. The products are ahead of most deployments. Here is what you can do now, mapped to the six layers, using tools that exist today.</p><p><strong>Where you start depends on what you are shipping.</strong> If the business needs an internet-facing chatbot, input defense comes first &#8212; you need prompt injection detection before the system goes live. If the system handles legal or regulated content, output filtering on specific terms comes first &#8212; you need to block what cannot be said before anything else. The layer numbers are not a priority order. They are a completeness checklist. Build what the use case demands, ship it, then add depth.</p><p>I'm building this stack in production &#8212; some layers are live, others are in progress. Start anywhere, but don't stop at one layer &#8212; the gap you skip is the one that gets exploited.</p><p>The tooling is also not static. Security vendors are building AI capabilities into their products at the same pace enterprises are adopting AI. The guardrail platform you evaluated last quarter may have shipped new capabilities since. Reassess continuously. And check what you already have &#8212; if your organization runs DLP, content filtering, or compliance tooling, some of these controls may already be partially in place. You do not always need to build from scratch.</p><p><strong>Layer 1: Supply chain visibility.</strong> Maintain an AIBOM for every model in your environment. Document provenance, training data sources, fine-tuning history, safety controls, and evaluation results. For third-party models, document what the vendor discloses and what they do not.</p><p><strong>Layer 2: Input defense.</strong> Deploy prompt injection and jailbreak detection on all inputs before they reach the model. For multimodal systems, use classifiers that evaluate the composition of inputs, not just individual components. Meta Prompt Guard, Azure Prompt Shields, and NVIDIA NeMo jailbreak detection are available options. None fully solve compositional attacks, but their absence is what lets those attacks scale. Run them in parallel with other guardrails to minimize latency.</p><p><strong>Layer 3: Output defense.</strong> Deploy a structurally independent output classifier. If your input classifier is from Vendor A, your output classifier should not be from Vendor A. Use a purpose-built multimodal safety classifier &#8212; Llama Guard 4, ShieldGemma 2, or comparable &#8212; rather than a general-purpose vision model. If your system generates images or video, the classifier must be trained on AI-generated content, not benchmarked against real-world photos. Test it against adversarial inputs, not just known harmful content.</p><p><strong>Layer 4: Blast radius and exfiltration controls.</strong> Apply your existing zero-trust and least-privilege architecture to AI systems. The model runs in a sandbox. It does not share identity with the user, network egress with production data, or credentials with other agents. Tools are scoped and explicitly enumerated. Rate limits, DLP rules, and egress filtering apply to AI-initiated requests the same way they apply to human-initiated requests.</p><p><strong>Layer 5: Runtime monitoring.</strong> Log all inputs, outputs, tool invocations, and guardrail triggers. Establish behavioral baselines and alert on deviations. Feed guardrail trigger data into your SIEM. If your SOC monitors network anomalies and endpoint behavior, it should also monitor AI agent behavior.</p><p><strong>Layer 6: Human oversight and incident response.</strong> Define risk thresholds for human review. Build incident response playbooks for AI-specific scenarios: model compromise, agent hijack, data exfiltration through authorized channels, classifier bypass. Include the ability to revoke agent credentials, isolate the model, and preserve audit logs.</p><p><strong>Architecture-level:</strong> Run guardrails in parallel, not in series. Five parallel guardrails add roughly half a second of latency. Use risk-based routing &#8212; low-risk queries get lightweight checks, high-risk queries get deeper evaluation with human review gates.</p><h2>The Takeaway</h2><p>If the safety story for any AI system you build or deploy is &#8220;the model won&#8217;t do that,&#8221; that is a red flag. Ask what catches the prompt the model missed. Ask what catches the prompt that does not look like a prompt. Ask what the classifier&#8217;s detection rate is on AI-generated content specifically. Ask what the judge does when the judge is the target.</p><p>If your AI adoption strategy treats the model as the security boundary, you are one creative composition away from the Grok headline. Not the same incident. The same failure class.</p><p>The security team has decades of defense-in-depth experience. The AI safety field is still building theirs. We have done this before. We know what happens when a single control fails without a second layer behind it.</p><p>The answer is not &#8220;we could not prevent all abuse.&#8221;</p><p>The answer is the next layer.</p><div><hr></div><p><em>Subscribe to StepToCyber for frequent analysis on securing GenAI at enterprise scale.</em></p><p><em>Views are my own.</em></p><div><hr></div><h3>References</h3><p><strong>Primary news coverage</strong></p><ul><li><p>Ingram, D. (2026, April 14). Elon Musk&#8217;s AI chatbot Grok continues to produce sexualized deepfakes despite xAI&#8217;s pledge to stop. <em>NBC News.</em> <a href="https://www.nbcnews.com/tech/tech-news/musks-ai-chatbot-grok-xai-making-sexual-deepfakes-imagine-rcna265855">https://www.nbcnews.com/tech/tech-news/musks-ai-chatbot-grok-xai-making-sexual-deepfakes-imagine-rcna265855</a></p></li></ul><p><strong>OWASP frameworks</strong></p><ul><li><p>OWASP GenAI Security Project. (2025). <em>OWASP Top 10 for Large Language Model Applications 2025.</em> <a href="https://genai.owasp.org/llm-top-10/">https://genai.owasp.org/llm-top-10/</a></p></li><li><p>OWASP GenAI Security Project. (2025, December). <em>OWASP Top 10 for Agentic Applications 2026.</em> <a href="https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/">https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/</a></p></li></ul><p><strong>Secure-by-Design</strong></p><ul><li><p>CISA. Secure by Design. <a href="https://www.cisa.gov/securebydesign">https://www.cisa.gov/securebydesign</a></p></li></ul><p><strong>Compositional cross-modal attack research</strong></p><ul><li><p>Shayegani, E., Dong, Y., &amp; Abu-Ghazaleh, N. (2024). Jailbreak in Pieces: Compositional Adversarial Attacks on Multi-Modal Language Models. <em>ICLR 2024.</em> <a href="https://openreview.net/forum?id=plmBsXHxgR">https://openreview.net/forum?id=plmBsXHxgR</a></p></li><li><p>Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models (CAMO). (2025). <em>arXiv preprint.</em> <a href="https://arxiv.org/html/2506.16760v1">https://arxiv.org/html/2506.16760v1</a></p></li></ul><p><strong>Image safety classifier benchmarking</strong></p><ul><li><p>Qu, Y., Shen, X., He, X., Backes, M., Zannettou, S., &amp; Zhang, Y. (2024). UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images. <em>arXiv preprint.</em> <a href="https://arxiv.org/html/2405.03486v3">https://arxiv.org/html/2405.03486v3</a></p></li></ul><p><strong>LLM jailbreaking survey and judge reliability</strong></p><ul><li><p>Bin Hakim, S., Gharami, K., Farhady Ghalaty, N., et al. (2026, January). Jailbreaking LLMs: A Survey of Attacks, Defenses and Evaluation. <em>TechRxiv.</em> <a href="https://www.techrxiv.org/users/1011181/articles/1373070">https://www.techrxiv.org/users/1011181/articles/1373070</a></p></li><li><p>Fu, X. &amp; Liu, W. (2025). How Reliable is Multilingual LLM-as-a-Judge? <em>EMNLP 2025 Findings</em>, pages 11040&#8211;11053. <a href="https://aclanthology.org/2025.findings-emnlp.587/">https://aclanthology.org/2025.findings-emnlp.587/</a></p></li><li><p>Liu, H., Huang, H., Gu, X., Wang, H., &amp; Wang, Y. (2025). On Calibration of LLM-based Guard Models for Reliable Content Moderation. <em>ICLR 2025.</em> <a href="https://arxiv.org/abs/2410.10414">https://arxiv.org/abs/2410.10414</a></p></li></ul><p><strong>Purpose-built multimodal safety classifiers</strong></p><ul><li><p>Meta. (2025). Llama Guard 4-12B Model Card. <em>Hugging Face.</em> <a href="https://huggingface.co/meta-llama/Llama-Guard-4-12B">https://huggingface.co/meta-llama/Llama-Guard-4-12B</a></p></li><li><p>Meta. (2024). Llama Guard 3-11B-Vision Model Card. <em>GitHub.</em> <a href="https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Guard3/11B-vision/MODEL_CARD.md">https://github.com/meta-llama/PurpleLlama/blob/main/Llama-Guard3/11B-vision/MODEL_CARD.md</a></p></li><li><p>Google. ShieldGemma. Referenced in: 19 Large Language Models Redefining AI Safety. <em>InfoWorld.</em> <a href="https://www.infoworld.com/article/4140809/19-large-language-models-redefining-ai-safety-and-danger.html">https://www.infoworld.com/article/4140809/19-large-language-models-redefining-ai-safety-and-danger.html</a></p></li></ul><p><strong>Enterprise guardrail vendor documentation</strong></p><ul><li><p>Microsoft. (2026). Azure AI Content Safety overview. <a href="https://azure.microsoft.com/en-us/products/ai-services/ai-content-safety/">https://azure.microsoft.com/en-us/products/ai-services/ai-content-safety/</a></p></li><li><p>Microsoft. (2026). Guardrails and controls overview in Microsoft Foundry. <a href="https://learn.microsoft.com/en-us/azure/foundry/guardrails/guardrails-overview">https://learn.microsoft.com/en-us/azure/foundry/guardrails/guardrails-overview</a></p></li><li><p>Amazon Web Services. (2026). Amazon Bedrock Guardrails. <a href="https://aws.amazon.com/bedrock/guardrails/">https://aws.amazon.com/bedrock/guardrails/</a></p></li><li><p>NVIDIA. NeMo Guardrails for Developers. <a href="https://developer.nvidia.com/nemo-guardrails">https://developer.nvidia.com/nemo-guardrails</a></p></li></ul><p><strong>Guardrail architecture research</strong></p><ul><li><p>Designing Multi-layered Runtime Guardrails for Foundation Model Based Agents: Swiss Cheese Model for AI Safety by Design. (2024). <em>arXiv preprint.</em> <a href="https://arxiv.org/html/2408.02205v3">https://arxiv.org/html/2408.02205v3</a></p></li><li><p>Modular Safety Guardrails Are Necessary for Foundation-Model-Enabled Robots in the Real World. (2026). <em>arXiv preprint.</em> <a href="https://arxiv.org/html/2602.04056">https://arxiv.org/html/2602.04056</a></p></li></ul><p><strong>In-generation detection research</strong></p><ul><li><p>Seeing It Before It Happens: In-Generation NSFW Detection for Diffusion-Based Text-to-Image Models. (2025). <em>arXiv preprint 2508.03006.</em> <a href="https://openreview.net/forum?id=SFHjSDIMKn">https://openreview.net/forum?id=SFHjSDIMKn</a></p></li></ul><p><strong>Proposed architectural defenses</strong></p><ul><li><p>Decodes Future. (2026, March). Grok Jailbreak Prompts: Multimodal Reasoning Vulnerability Analysis. <a href="https://www.decodesfuture.com/articles/grok-jailbreak-prompts-multimodal-reasoning-vulnerability-analysis">https://www.decodesfuture.com/articles/grok-jailbreak-prompts-multimodal-reasoning-vulnerability-analysis</a></p></li></ul><p><strong>Parallel guardrail orchestration</strong></p><ul><li><p>Authority Partners. (2026, March). AI Agent Guardrails: Production Guide for 2026. <a href="https://authoritypartners.com/insights/ai-agent-guardrails-production-guide-for-2026/">https://authoritypartners.com/insights/ai-agent-guardrails-production-guide-for-2026/</a></p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.steptocyber.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Controlled Openings: How Enterprises Should Actually Let AI Crawlers In]]></title><description><![CDATA[A three-tier playbook for Enterprise AI, app owners, and security teams &#8212; closed by default, opened deliberately, governed on a cadence.]]></description><link>https://www.steptocyber.ai/p/controlled-openings-how-enterprises</link><guid isPermaLink="false">https://www.steptocyber.ai/p/controlled-openings-how-enterprises</guid><dc:creator><![CDATA[StepToCyber]]></dc:creator><pubDate>Sat, 11 Apr 2026 11:23:57 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/8b17b245-7fd2-49f4-b311-51a39b04bee8_1600x840.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="callout-block" data-callout="true"><p><strong>The Controlled Openings Model</strong></p><p><strong>Tier 1 &#8212; The Declaration.</strong> Owned by Enterprise AI, Marketing, and Security. Legal reviews. The robots.txt file is the meeting minutes &#8212; every AI crawler named, every posture stated, closed by default.</p><p><strong>Tier 2 &#8212; Active Enforcement.</strong> Owned by app teams. Security audits. The WAF or CDN baseline that turns the declaration into a control on every property.</p><p><strong>Tier 3 &#8212; Governance. </strong>Owned by the AI governance organization. The allowed list register, named owners, and the quarterly review that keeps every opening on a clock.</p><p><em>Closed by default. Every opening is scoped to one application, owned by a named human, and expires on a date. Everything else is a wish.</em></p></div><p>A marketing director buys an ad placement inside ChatGPT. The placement requires OpenAI to crawl the landing page as part of ad onboarding &#8212; and the crawl fails. OAI-SearchBot can't reach the page. The ad can't go live.</p><p>Here's the conflict. Marketing needs the crawler in. Every enterprise with a public web estate has a legitimate reason to block AI crawlers by default &#8212; scraping pressure, referral asymmetry, brand misrepresentation risk, contested compliance from bots like PerplexityBot. Both positions are correct. Neither can unilaterally win. And right now, most enterprises don't have a shared place where those two positions can see each other before one of them collides with the other.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.steptocyber.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>This is the state most enterprises are actually in. Not a policy. Not even a wish. Two legitimate positions with no shared place to meet.</p><p>The fix is a model I call <strong>controlled openings</strong> &#8212; closed to AI crawlers by default, opened deliberately per application, with a named owner and an expiry date on every opening. Not one enterprise-wide yes or no. A portfolio of small, deliberate yeses &#8212; each one owned, scoped, and on a clock.</p><p>This post is the playbook for getting there &#8212; written for the three groups who have to operate it together: Enterprise AI leadership, app owners, and security teams. No one of them can make the call alone. Enterprise AI leadership owns the vendor relationships every allow touches, and co-authors the declaration with Marketing's referral data and Legal's MSA review. App owners operate the enforcement on the properties they're accountable for. Security owns the enforcement standard that keeps the whole thing consistent across hundreds of properties, and audits against it. The AI governance organization owns the allowed list register and runs the quarterly review that keeps every opening on a clock. The controlled openings model is what lets every owner act in the same direction without stepping on each other &#8212; gives Marketing and Legal a named seat at the table when the declaration is written, and most importantly, actually addresses Marketing's needs instead of leaving them to discover the block and the ad won&#8217;t go live. </p><p>The call has to land somewhere every stakeholder can read &#8212; not buried in a WAF rule group that only the app team operating that property can see. That somewhere is robots.txt.</p><p>robots.txt is the <strong>declaration</strong>. It names every AI crawler explicitly and records whether the company allows it, disallows it, or has granted a time-bound allow for a specific application. The declaration is jointly authored by Enterprise AI leadership, business and marketing, and security. Legal reviews every change before it merges. App teams implement the declaration on their own properties. Security reviews allow requests against the standard. Governance keeps all of it aligned over time.</p><h2>Tier 1: The Declaration</h2><p>The failure mode most enterprises are in right now isn&#8217;t having the wrong AI bot policy. It&#8217;s having a different one on every property. Some app teams are genuinely sophisticated &#8212; their robots.txt names every major AI crawler and their WAF is tuned. Other teams copied a robots.txt from a template years ago and haven&#8217;t touched it since. Marketing landing pages spin up without any declaration at all.</p><p>A standardized AI bot robots.txt fixes the consistency problem by giving every app team the same starting point. The classification in the template is made jointly by the three co-authors. Enterprise AI brings the vendor relationships. Marketing brings the referral data and the campaign roadmap. Security brings the threat picture. Legal reviews every draft against current MSA terms before it merges, flagging any Allow that conflicts with data usage clauses. The file is the meeting minutes of that conversation, written in a format bots can parse.</p><h3>The Decision Matrix</h3><p>Every known AI crawler is Block by default. An allow is never enterprise-wide &#8212; it&#8217;s scoped to a specific application. OAI-SearchBot allowed on the corporate marketing site for an ad campaign is not OAI-SearchBot allowed on the developer docs or the support portal. Each app team runs its own allow list against its own property, because the business case for any given crawler almost never applies uniformly across the enterprise. All per-application allows are recorded in a central register so the governance layer can see across them, but the decisions themselves are made where the property is owned.</p><h3>Training Crawlers &#8212; Default: Block</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ENPY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb20ec5b6-8f22-4092-b42c-5edcd7f9e74b_1620x1615.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ENPY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb20ec5b6-8f22-4092-b42c-5edcd7f9e74b_1620x1615.png 424w, https://substackcdn.com/image/fetch/$s_!ENPY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb20ec5b6-8f22-4092-b42c-5edcd7f9e74b_1620x1615.png 848w, https://substackcdn.com/image/fetch/$s_!ENPY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb20ec5b6-8f22-4092-b42c-5edcd7f9e74b_1620x1615.png 1272w, https://substackcdn.com/image/fetch/$s_!ENPY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb20ec5b6-8f22-4092-b42c-5edcd7f9e74b_1620x1615.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ENPY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb20ec5b6-8f22-4092-b42c-5edcd7f9e74b_1620x1615.png" width="1456" height="1452" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b20ec5b6-8f22-4092-b42c-5edcd7f9e74b_1620x1615.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1452,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:227029,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.steptocyber.ai/i/193819739?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb20ec5b6-8f22-4092-b42c-5edcd7f9e74b_1620x1615.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ENPY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb20ec5b6-8f22-4092-b42c-5edcd7f9e74b_1620x1615.png 424w, https://substackcdn.com/image/fetch/$s_!ENPY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb20ec5b6-8f22-4092-b42c-5edcd7f9e74b_1620x1615.png 848w, https://substackcdn.com/image/fetch/$s_!ENPY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb20ec5b6-8f22-4092-b42c-5edcd7f9e74b_1620x1615.png 1272w, https://substackcdn.com/image/fetch/$s_!ENPY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb20ec5b6-8f22-4092-b42c-5edcd7f9e74b_1620x1615.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>None of these return referral traffic. All of them consume at scale. The crawl-to-referral ratio on training crawlers is effectively infinite &#8212; you give content, you get nothing back. The only reason to allow any of them is a deliberate strategic decision by Enterprise AI leadership to contribute training data to a specific partner, and that decision runs through Tier 3 as an allow request, not a default.</p><h3>Search, Answer, and User-Initiated Crawlers &#8212; Default: Block</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hMsa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53434400-f47a-433b-b23a-a1a494e77a09_1620x1184.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hMsa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53434400-f47a-433b-b23a-a1a494e77a09_1620x1184.png 424w, https://substackcdn.com/image/fetch/$s_!hMsa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53434400-f47a-433b-b23a-a1a494e77a09_1620x1184.png 848w, https://substackcdn.com/image/fetch/$s_!hMsa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53434400-f47a-433b-b23a-a1a494e77a09_1620x1184.png 1272w, https://substackcdn.com/image/fetch/$s_!hMsa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53434400-f47a-433b-b23a-a1a494e77a09_1620x1184.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hMsa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53434400-f47a-433b-b23a-a1a494e77a09_1620x1184.png" width="1456" height="1064" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/53434400-f47a-433b-b23a-a1a494e77a09_1620x1184.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1064,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:180383,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.steptocyber.ai/i/193819739?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53434400-f47a-433b-b23a-a1a494e77a09_1620x1184.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hMsa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53434400-f47a-433b-b23a-a1a494e77a09_1620x1184.png 424w, https://substackcdn.com/image/fetch/$s_!hMsa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53434400-f47a-433b-b23a-a1a494e77a09_1620x1184.png 848w, https://substackcdn.com/image/fetch/$s_!hMsa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53434400-f47a-433b-b23a-a1a494e77a09_1620x1184.png 1272w, https://substackcdn.com/image/fetch/$s_!hMsa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53434400-f47a-433b-b23a-a1a494e77a09_1620x1184.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>These crawlers drive real referral traffic and matter to marketing and Enterprise AI. That&#8217;s exactly why they go through the allowing process &#8212; because the business case must be made and owned by a named stakeholder, not inherited as a default. Block is the posture. Every allow has an owner and an expiry.</p><p>Two rows deserve specific notes. <strong>ChatGPT-User</strong> had its robots.txt compliance language revised by OpenAI in December 2025 &#8212; OpenAI&#8217;s current documentation states that because these actions are initiated by a user, &#8220;robots.txt rules may not apply,&#8221; which means enforcement for this crawler has to happen entirely at Tier 2. <strong>PerplexityBot</strong> has documented stealth behavior per Cloudflare&#8217;s August 2025 forensic report, including spoofed user-agents and rotating IPs. A Disallow in Tier 1 is still worth recording for audit purposes, but enforcement rests on the WAF&#8217;s signal rules, not the user-agent string.</p><p><strong>Anthropic&#8217;s crawler roster expanded in early 2026</strong> to include three distinct agents: <code>ClaudeBot</code> (training, in the table above), <code>Claude-User</code> (user-initiated), and <code>Claude-SearchBot</code> (search indexing). Each is independently controllable via robots.txt. Enterprises that blocked only <code>ClaudeBot</code> and assumed full coverage need to revisit their declarations.</p><h3>The Catch-All</h3><p>Every crawler not named above is an unknown. Unknowns do not get an implicit pass. When a new bot shows up in an app team&#8217;s AI Activity Dashboard, it triggers the Tier 3 intake process. The default state while the ticket is open is Block. The default is always Block for anything undeclared.</p><h2>Tier 2: Active Enforcement</h2><p><strong>App owners operate this tier. Security audits it</strong>. Declaration without enforcement is a wish.</p><p>Security&#8217;s job at Tier 2 is to publish an enforcement <em>standard</em> &#8212; the set of controls every public-facing property must implement, regardless of which WAF or CDN sits in front of it. The standard is platform-agnostic by design: <em>block AI bot categories by default, enforce every robots.txt Disallow at the edge, apply targeted inspection to sensitive endpoints.</em> </p><p>AWS WAF Bot Control is the reference implementation walked through below. The label namespace and CategoryAI rule are well-designed for exactly this problem, and if your property lives in AWS, this is the shortest path from standard to deployed control. If it lives somewhere else &#8212; Cloudflare, Akamai, Azure Front Door, F5 &#8212; read this section as the pattern, not the product. Map CategoryAI to your platform's AI bot category. Map the signal rules to your platform's stealth detection. Map the label namespace to whatever your platform calls its tagging layer. The names differ. The controls don't.</p><p>Three building blocks matter for AI bot enforcement: category rules, signal rules, and the label namespace.</p><p><strong>CategoryAI is unique.</strong> Every other category rule respects the verified/unverified distinction &#8212; verified search bots pass, unverified scrapers get blocked. CategoryAI blocks both by default. Per <a href="https://docs.aws.amazon.com/waf/latest/developerguide/waf-bot-control.html">AWS documentation</a>, this is the one category where AWS treats all AI bots as hostile until proven otherwise. That&#8217;s the right posture and the foundation of the enforcement layer.</p><p><strong>Signal rules catch stealth crawlers.</strong> <code>SignalAutomatedBrowser</code>, <code>SignalNonBrowserUserAgent</code>, and <code>SignalKnownBotDataCenter</code> detect crawlers using spoofed user-agents, rotating IPs, or datacenter egress &#8212; the exact techniques Cloudflare documented Perplexity using in 2025.</p><p><strong>The label namespace is what you write policy against.</strong> <code>bot:name:&lt;n&gt;</code>, <code>bot:verified</code>, <code>bot:unverified</code>, and the <code>bot:web_bot_auth:&lt;status&gt;</code> labels added in Bot Control v4.0 with Web Bot Authentication support (launched November 2025). Scope-down statements match on labels, not user-agent strings, because user-agents can be spoofed and labels are applied by AWS after verification. Web Bot Authentication is where this is headed long-term &#8212; cryptographic identity for AI agents &#8212; and any allowlist logic written today should leave room for <code>bot:web_bot_auth:verified</code> as the preferred match condition once crawler support catches up.</p><h3>The Enforcement Standard</h3><p>Security publishes three rules as the standard. App teams deploy them.</p><ol><li><p><strong>CategoryAI = Block.</strong> Every AI bot hits the wall unless explicitly exempted. No Count mode. No grace period. The default is Block because the declaration in Tier 1 is Block for everything until a human says otherwise.</p></li><li><p><strong>Path-based rules enforce every </strong><code>Disallow</code><strong> directive.</strong> Admin, internal, private paths get blanket bot blocks regardless of user-agent.</p></li><li><p><strong>Targeted rules on sensitive endpoints.</strong> Login, checkout, cart, and API surfaces get targeted inspection level regardless of declared identity, because the cost of a false negative on those endpoints is too high to trust identification alone.</p></li></ol><p>App teams deploy this baseline however they already deploy WAF configurations. Security will do the audit: every public-facing property is checked against the baseline on a recurring cadence. Missing CategoryAI Block rule, missing path-based enforcement, missing targeted inspection on sensitive endpoints &#8212; each one is a finding routed to the app team for remediation. The audit is the control. The deployment is an implementation detail.</p><h3>The Audit Template</h3><p>Here&#8217;s the template for the audit output. Every row is one internet-facing application, sourced from the AWS WAF AI Activity Dashboard over a 30-day window. The last column is the only one that matters &#8212; if reality isn&#8217;t matching the declaration, the row becomes a finding.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IfY7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d52a79-5d93-4dbb-ac50-c15793d1553c_2055x532.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IfY7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d52a79-5d93-4dbb-ac50-c15793d1553c_2055x532.png 424w, https://substackcdn.com/image/fetch/$s_!IfY7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d52a79-5d93-4dbb-ac50-c15793d1553c_2055x532.png 848w, https://substackcdn.com/image/fetch/$s_!IfY7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d52a79-5d93-4dbb-ac50-c15793d1553c_2055x532.png 1272w, https://substackcdn.com/image/fetch/$s_!IfY7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d52a79-5d93-4dbb-ac50-c15793d1553c_2055x532.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IfY7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d52a79-5d93-4dbb-ac50-c15793d1553c_2055x532.png" width="1456" height="377" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b9d52a79-5d93-4dbb-ac50-c15793d1553c_2055x532.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:377,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:92630,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.steptocyber.ai/i/193819739?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d52a79-5d93-4dbb-ac50-c15793d1553c_2055x532.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IfY7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d52a79-5d93-4dbb-ac50-c15793d1553c_2055x532.png 424w, https://substackcdn.com/image/fetch/$s_!IfY7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d52a79-5d93-4dbb-ac50-c15793d1553c_2055x532.png 848w, https://substackcdn.com/image/fetch/$s_!IfY7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d52a79-5d93-4dbb-ac50-c15793d1553c_2055x532.png 1272w, https://substackcdn.com/image/fetch/$s_!IfY7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb9d52a79-5d93-4dbb-ac50-c15793d1553c_2055x532.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Every &#10007; becomes a ticket. The ticket either fixes the enforcement (bring Tier 2 in line with Tier 1) or fixes the declaration (bring Tier 1 in line with reality &#8212; which means filing an allow request or updating the template). Both paths are legitimate. Silence is not.</p><h3>Sensitive Path Inventory</h3><p>The audit template tells you whether the baseline is deployed. The sensitive path inventory tells you which paths on each application need enforcement beyond the baseline. These are the paths that were already identified as sensitive in robots.txt &#8212; WAF Bot Control makes the restriction actually enforceable.</p><p>Four generic categories cover most enterprise web estates:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RlnI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb39843a0-119c-4c1a-bb3f-545b0d00133d_1620x635.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RlnI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb39843a0-119c-4c1a-bb3f-545b0d00133d_1620x635.png 424w, https://substackcdn.com/image/fetch/$s_!RlnI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb39843a0-119c-4c1a-bb3f-545b0d00133d_1620x635.png 848w, https://substackcdn.com/image/fetch/$s_!RlnI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb39843a0-119c-4c1a-bb3f-545b0d00133d_1620x635.png 1272w, https://substackcdn.com/image/fetch/$s_!RlnI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb39843a0-119c-4c1a-bb3f-545b0d00133d_1620x635.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RlnI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb39843a0-119c-4c1a-bb3f-545b0d00133d_1620x635.png" width="1456" height="571" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b39843a0-119c-4c1a-bb3f-545b0d00133d_1620x635.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:571,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:101721,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.steptocyber.ai/i/193819739?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb39843a0-119c-4c1a-bb3f-545b0d00133d_1620x635.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RlnI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb39843a0-119c-4c1a-bb3f-545b0d00133d_1620x635.png 424w, https://substackcdn.com/image/fetch/$s_!RlnI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb39843a0-119c-4c1a-bb3f-545b0d00133d_1620x635.png 848w, https://substackcdn.com/image/fetch/$s_!RlnI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb39843a0-119c-4c1a-bb3f-545b0d00133d_1620x635.png 1272w, https://substackcdn.com/image/fetch/$s_!RlnI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb39843a0-119c-4c1a-bb3f-545b0d00133d_1620x635.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The pattern across all four is the same: every path that appears as <code>Disallow</code> in Tier 1&#8217;s declaration gets a corresponding block rule in Tier 2. The four categories just organize the rules by business purpose so reviewers can apply different enforcement postures without hand-building every rule individually.</p><p>The last row deserves one additional note. If an AI Activity Dashboard audit shows bot traffic hitting <code>/admin</code> or any internal path on a public URL, the finding isn&#8217;t just &#8220;the WAF needs a rule&#8221; &#8212; it&#8217;s &#8220;this application is exposing an internal surface at a public URL, and the real fix is upstream of the WAF.&#8221; The sensitive path inventory is where Tier 2 enforcement intersects with application security review, and the audit catches both.</p><h2>Tier 3: Governance</h2><p><strong>The AI governance organization owns this tier. App owners and business stakeholders file their requests through it.</strong></p><p>Every allow runs through the same loop: Default State &#8594; Monitor &#8594; Allow Request &#8594; Review &#8594; Implement &#8594; Validate &#8594; Ongoing. Not a sequence &#8212; a loop. Allows get granted, validated, and re-justified on a cadence, or they accumulate into the mess this post exists to prevent.</p><p><strong>Default State.</strong> CategoryAI blocks all AI bots. Every internet-facing application starts here and returns here if an allow lapses.</p><p><strong>Monitor.</strong> The <a href="https://aws.amazon.com/about-aws/whats-new/2026/02/aws-waf-ai-activity-dashboard/">AI Activity Dashboard</a>, CloudWatch, and WAF logs. App teams and security watch for blocked requests from named bots that might represent undeclared business dependencies, and for allowed bots behaving outside their declared scope.</p><p><strong>Allow Request.</strong> A business unit submits a written request naming the bot, the business justification, the specific paths the bot needs, and expected request volume. </p><p><strong>Review.</strong> Security validates identity via Bot Control labels and determines access level (full, path-restricted, or rate-capped). The reviewer&#8217;s first question isn&#8217;t how much risk the allow carries &#8212; it&#8217;s which risk dimensions are material for this specific application. A static sustainability microsite and a dynamic pricing page both pass through the same review process, but they&#8217;re scored against different dimensions because they have different things to lose.</p><p>Three risk dimensions inform every allow review:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dtTI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11a60a1-5c88-4847-96b8-8348311cffa7_1620x780.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dtTI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11a60a1-5c88-4847-96b8-8348311cffa7_1620x780.png 424w, https://substackcdn.com/image/fetch/$s_!dtTI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11a60a1-5c88-4847-96b8-8348311cffa7_1620x780.png 848w, https://substackcdn.com/image/fetch/$s_!dtTI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11a60a1-5c88-4847-96b8-8348311cffa7_1620x780.png 1272w, https://substackcdn.com/image/fetch/$s_!dtTI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11a60a1-5c88-4847-96b8-8348311cffa7_1620x780.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dtTI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11a60a1-5c88-4847-96b8-8348311cffa7_1620x780.png" width="1456" height="701" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d11a60a1-5c88-4847-96b8-8348311cffa7_1620x780.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:701,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:153535,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.steptocyber.ai/i/193819739?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11a60a1-5c88-4847-96b8-8348311cffa7_1620x780.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dtTI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11a60a1-5c88-4847-96b8-8348311cffa7_1620x780.png 424w, https://substackcdn.com/image/fetch/$s_!dtTI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11a60a1-5c88-4847-96b8-8348311cffa7_1620x780.png 848w, https://substackcdn.com/image/fetch/$s_!dtTI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11a60a1-5c88-4847-96b8-8348311cffa7_1620x780.png 1272w, https://substackcdn.com/image/fetch/$s_!dtTI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd11a60a1-5c88-4847-96b8-8348311cffa7_1620x780.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Customer journey interception and brand misrepresentation are universal &#8212; every enterprise with a public web estate carries some exposure on both. Dynamic data exposure is concentrated in specific business types: e-commerce, travel, financial services, and any application with logged-in or personalized experiences visible at public URLs. A brochure-ware application with mostly static marketing pages scores low on this dimension and the review moves on. The reviewer&#8217;s job is to decide which dimensions are live for this application before scoring any of them.</p><p>Against those three risks, one mitigation lens shifts the math on whether an allow is acceptable: <strong>static content as sunk cost.</strong> When the content a crawler wants to index has already been paid to produce, has no ongoing revenue tied to gatekeeping it, and captured its business value at publication rather than through ongoing access control, the incremental risk of letting a crawler consume it is close to zero. A completed blog post, a published press release, a closed-campaign landing page, an archived product overview &#8212; the money was already spent, the content was always intended for broad distribution, and blocking it forfeits visibility without protecting any live business value. Sunk cost is the reviewer&#8217;s defense argument when one of the three dimensions scores high but the content has no ongoing gating value. It&#8217;s what makes an allow defensible in cases where a pure risk read would lean toward block.</p><p>The decision to allow still requires the named business owner to sign.</p><p><strong>Implement.</strong> Three mechanical options in order of preference: label-based allow (override CategoryAI to Count, add a custom allow rule matching on <code>bot:name</code> plus a verification label), scope-down exclusion (exempt a specific path plus a shared secret header from CategoryAI evaluation), or WBA-based allow (match on <code>bot:web_bot_auth:verified</code> &#8212; future state, limited crawler support today).</p><p><strong>Validate.</strong> Seven-day watch on the newly allowed bot. Revocation triggers: volume anomalies, path access outside declared scope, signal rule hits like <code>SignalKnownBotDataCenter</code> on a bot you&#8217;d verified, or any behavior inconsistent with the stated business purpose.</p><p><strong>Ongoing.</strong> Quarterly review of every active allow against the allowed list register. Every allow expires unless the business owner re-justifies it in writing.</p><h2>How This Scales Across Hundreds of Applications</h2><p>At a large enterprise, &#8220;the company&#8217;s website&#8221; isn&#8217;t one website. It&#8217;s hundreds of internet-facing applications &#8212; product sites, regional domains, microsites, acquired brands, support portals, developer docs, campaign landing pages &#8212; each operated by a different app team, each with its own WAF configuration, each with its own robots.txt or no robots.txt at all. That's the scale the three-tier model is built for.</p><p><strong>robots.txt scales through templates.</strong> The three co-authors publish a standardized robots.txt template, Legal reviews it, and every app team forks it for their own properties. Deviations require written reasons reviewed by the same co-authors and re-approved by Legal. Nobody owns every property&#8217;s file &#8212; the co-authors own the template and the diff review.</p><p><strong>The enforcement standard scales through the audit.</strong> Security publishes the Bot Control baseline and checks every public-facing property against it on a recurring cadence. App teams deploy the rules through whatever IaC or console workflow they already use. Security finds the gaps and routes them back as findings. The audit is what makes the standard real across hundreds of properties without central deployment tooling touching anyone&#8217;s account.</p><p><strong>The allowed list register scales through one cadence.</strong> Every allow across every property lives in one register with one review cadence. App teams and business owners make the decisions. The AI governance organization owns the process, the cadence, and the audit trail. Decentralized execution, centralized governance.</p><p>This is the only model that answers the ChatGPT advertising collision f<a href="https://www.steptocyber.ai/p/block-the-bots-or-feed-the-machine">rom my previous post</a> at enterprise scale. Marketing reads the template, sees OAI-SearchBot is Block by default, files an allow request through the allowing process, and gets a label-based allow deployed against the specific properties tied to the campaign &#8212; with an expiry date, a named owner, and a quarterly review on the calendar. The collision becomes a governance event and the ad goes live on schedule.</p><h2>Your First Controlled Opening</h2><p>The controlled openings model is a program, not a one-week project. But every program starts with a first move, and the first move is different depending on which seat you&#8217;re in.</p><p><strong>If you&#8217;re in Enterprise AI leadership:</strong> Find the first three business stakeholders who have already asked, or are about to ask, "why is this AI tool blocked?" Marketing wanting to run an ad campaign inside ChatGPT. Sales wanting Perplexity to surface the company in answer results. Comms wanting press releases indexed by AI search. Those are your first three controlled openings &#8212; not because the risk is low, but because the demand is already there and the conversation has to happen anyway. Get ahead of it by walking them through the allowing process before they hit the wall.</p><p><strong>If you're in the AI governance organization: </strong>Stand up the allowed list register before any allows exist. Empty is the right starting state. The register is the artifact every other decision in the model points back to, and standing it up takes a spreadsheet and a recurring review &#8212; not a tooling project. The day you have a register, the model has a memory. Without one, every allow is a snowflake and every audit is a search.</p><p><strong>If you own a public-facing application:</strong> Pull 30 days of bot traffic from your WAF logs or the AI Activity Dashboard. You cannot declare a posture until you know what&#8217;s already hitting you. Half the app teams who think they&#8217;re blocking everything are quietly allowing a dozen crawlers they&#8217;ve never named.</p><p><strong>If you&#8217;re in security:</strong> Publish the Bot Control baseline as a written standard. Not &#8220;when we have time&#8221; &#8212; the standard is the prerequisite, not the follow-up. You cannot find gaps against a baseline that doesn&#8217;t exist, and the whole audit function collapses without one. Enforcement across the estate can lag the standard &#8212; that&#8217;s what the audit is for &#8212; but the standard itself is required. The standard is what makes the audit possible, and the audit is what makes the standard real.</p><p>None of these moves wait for permission. If you're reading this and your honest answer is "I'd need to charter a program to do any of that," the framework isn't the blocker &#8212; your operating model is.</p><h2>When the User Is the Bot</h2><p>The three-tier model handles bots that announce themselves and the ones that don&#8217;t. It does not handle the case where the bot is acting on behalf of an authenticated, authorized customer.</p><p>Agentic browsers are shipping now, not coming. ChatGPT Atlas, Perplexity&#8217;s Comet, Claude in Chrome, Gemini wired into Google&#8217;s stack &#8212; your customers are already using them to interact with your applications, and they will be using them more next quarter than this one. When a customer tells their agent to log in and complete a purchase, they have, with full authorization, violated your &#8220;no automated access&#8221; Terms of Service (ToS) and your &#8220;I am a human&#8221; login attestation. Your security stack was built to stop <em>unauthorized</em> automation. This is <em>authorized</em> automation. Nothing in the three tiers catches it. </p><p>That's the next post. </p><h2>Wish, Collision, or Policy</h2><p>A robots.txt that matches the WAF enforcement across every app team&#8217;s properties, governed by an allowing process with named owners and expiry dates, is a <strong>policy</strong>.</p><p>A WAF without a matching robots.txt is a <strong>collision waiting to happen</strong>. Silent rule groups enforcing decisions no business stakeholder ever saw, until the day a marketing campaign hits the wall and the incident review has to reconstruct who decided what, when, and why.</p><p>A robots.txt without matching WAF enforcement is a <strong>wish</strong>. The honest crawlers respect it. The ones who don&#8217;t treat the declaration as decoration.</p><p>Only the first state is a policy.</p><p>Declare in robots.txt. Enforce in AWS WAF Bot Control. Govern through the allowed list register. Closed by default. Every opening is scoped to a single application, owned by a named human, and expires on a date. That&#8217;s a controlled opening. Everything else is a wish.</p><div><hr></div><p><em>Next week: whether you can actually detect agentic browser traffic. Subscribe so you don't miss it.</em></p><div><hr></div><h2>Further Reading</h2><p><strong>AWS WAF Bot Control</strong></p><ul><li><p><a href="https://docs.aws.amazon.com/waf/latest/developerguide/waf-bot-control.html">AWS WAF Bot Control documentation</a> &#8212; Official documentation for the Bot Control managed rule group, including CategoryAI, label namespace, and Web Bot Authentication in v4.0</p></li><li><p><a href="https://docs.aws.amazon.com/waf/latest/developerguide/aws-managed-rule-groups-bot.html">AWS WAF Bot Control rule group reference</a> &#8212; Complete rules and labels reference for category rules, signal rules, and targeted rules</p></li><li><p><a href="https://docs.aws.amazon.com/waf/latest/developerguide/waf-labels.html">Web request labeling in AWS WAF</a> &#8212; Label namespace documentation for writing scope-down statements</p></li><li><p><a href="https://aws.amazon.com/about-aws/whats-new/2026/02/aws-waf-ai-activity-dashboard/">AWS WAF AI Activity Dashboard announcement</a> &#8212; February 2026 release announcement for the AI Activity Dashboard</p></li><li><p><a href="https://docs.aws.amazon.com/waf/latest/developerguide/aws-managed-rule-groups-changelog.html">AWS Managed Rules changelog</a> &#8212; Version history for Bot Control rule group, including v4.0 Web Bot Authentication additions</p></li></ul><p><strong>AI Crawler Documentation (Vendor)</strong></p><ul><li><p><a href="https://developers.openai.com/api/docs/bots">OpenAI: Overview of OpenAI crawlers</a> &#8212; Official documentation for GPTBot, OAI-SearchBot, and ChatGPT-User, including the December 2025 change to ChatGPT-User compliance language</p></li><li><p><a href="https://support.claude.com/en/articles/8896518-does-anthropic-crawl-data-from-the-web-and-how-can-site-owners-block-the-crawler">Anthropic: ClaudeBot, Claude-User, and Claude-SearchBot crawler documentation</a> &#8212; All three Anthropic crawlers, their purposes, and robots.txt compliance</p></li><li><p><a href="https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers#google-extended">Google: Google-Extended and Gemini training</a> &#8212; Clarification that Google-Extended controls Gemini training without affecting Search or AI Overviews</p></li><li><p><a href="https://support.apple.com/en-us/119829">Apple: Applebot and Applebot-Extended</a> &#8212; Distinction between Applebot search indexing and Applebot-Extended training crawlers</p></li><li><p><a href="https://developers.facebook.com/docs/sharing/webmasters/web-crawlers/">Meta: Meta web crawlers documentation</a> &#8212; Meta&#8217;s crawler documentation for Meta-ExternalAgent, Meta-ExternalFetcher, and related bots</p></li><li><p><a href="https://docs.perplexity.ai/guides/bots">Perplexity: PerplexityBot and crawler policy</a> &#8212; Perplexity&#8217;s stated crawler compliance posture</p></li></ul><p><strong>Crawler Behavior Research and Forensic Reports</strong></p><ul><li><p><a href="https://blog.cloudflare.com/perplexity-is-using-stealth-undeclared-crawlers-to-evade-website-no-crawl-directives/">Cloudflare: Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives</a> &#8212; August 2025 forensic report documenting stealth crawler behavior, spoofed user-agents, and rotating IP address techniques</p></li><li><p><a href="https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/">Cloudflare: From Googlebot to GPTBot &#8212; who&#8217;s crawling your site in 2025</a> &#8212; Data on AI crawler traffic growth, crawl-to-referral ratios, and category-level volume shifts</p></li><li><p><a href="https://radar.cloudflare.com/ai-insights">Cloudflare Radar: AI Insights</a> &#8212; Live data source for AI bot traffic share and vendor-level crawl behavior</p></li></ul><p><strong>Standards and Protocols</strong></p><ul><li><p><a href="https://www.rfc-editor.org/rfc/rfc9309.html">Robots Exclusion Protocol (RFC 9309)</a> &#8212; The formal specification for robots.txt, standardized as an IETF RFC in 2022</p></li><li><p><a href="https://datatracker.ietf.org/wg/webbotauth/about/">Web Bot Auth IETF Working Group</a> &#8212; The IETF working group chartered to standardize cryptographic identity verification for automated HTTP clients; key drafts include <a href="https://datatracker.ietf.org/doc/draft-meunier-web-bot-auth-architecture/">draft-meunier-web-bot-auth-architecture</a> and <a href="https://datatracker.ietf.org/doc/draft-meunier-http-message-signatures-directory/">draft-meunier-http-message-signatures-directory</a>, which form the foundation for AWS WAF&#8217;s WBA implementation</p></li></ul><p><strong>Context from the Previous Post</strong></p><ul><li><p><a href="https://steptocyber.ai/">The ChatGPT Advertising Paradox Every Enterprise Will Hit</a> &#8212; The preceding post in this series, establishing the strategic collision between marketing AI advertising adoption and security&#8217;s default crawler blocking posture</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.steptocyber.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[6 Things to Do Before Your AI Coding Agent Runs Another Command]]></title><description><![CDATA[The Claude Code source leak revealed how AI coding agents actually enforce security. The defaults aren't enough]]></description><link>https://www.steptocyber.ai/p/6-things-to-do-before-your-ai-coding</link><guid isPermaLink="false">https://www.steptocyber.ai/p/6-things-to-do-before-your-ai-coding</guid><dc:creator><![CDATA[StepToCyber]]></dc:creator><pubDate>Fri, 03 Apr 2026 11:44:25 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/771f0955-c044-459f-a812-55dd75fed4de_1200x630.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>On March 31st, Anthropic accidentally shipped 512,000 lines of Claude Code source to the public npm registry. The full source &#8212; permission enforcement logic, bash security validators, system prompt instructions, feature flags &#8212; was mirrored across GitHub before Anthropic could pull it.</p><p>Within days, security researchers used the readable source to find a critical flaw: Claude Code&#8217;s deny rules silently stop working when a command contains more than 50 subcommands. The security policy fails without telling you it failed.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.steptocyber.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>This matters beyond Anthropic. Every AI coding agent &#8212; Cursor, Copilot, Windsurf, Codex &#8212; shares the same fundamental architecture: an AI with shell access, gated by a permission system. The Claude Code leak gave us a detailed look at how one of those permission systems is actually built, where it holds, and where it breaks.</p><p>If you&#8217;re a developer using Claude Code (or any AI coding agent), here&#8217;s how to protect yourself. If you want to understand why each step matters, the full architecture analysis follows.</p><h2>The Mental Model</h2><p><strong>Treat every AI coding agent like a powerful but untrusted intern with root access.</strong></p><p>They can write code faster than any human on your team, and without proper boundaries, they can also delete files, leak credentials, or execute destructive commands. Your job is to set those boundaries before they start working.</p><h2>How to Protect Yourself: 6 Steps</h2><h3>Step 0: Verify Your Agent Isn&#8217;t Already Compromised</h3><p>Cisco&#8217;s AI security team demonstrated that a malicious repository can permanently poison Claude Code&#8217;s memory and persist across every project, every session, even after reboots. The attack plants four persistence mechanisms simultaneously. Before you secure future sessions, check whether your environment has already been tampered with.</p><p>These checks work on <strong>macOS, Linux, and WSL</strong> &#8212; Claude Code stores its config in <code>~/.claude/</code> on all three. If you&#8217;re on <strong>Windows native</strong> (PowerShell/Git Bash), substitute <code>$env:USERPROFILE\.claude\</code> for <code>~/.claude/</code>.</p><p><strong>Check 1: Memory files.</strong> Look through your memory files for instructions you didn&#8217;t write. Poisoned memory typically reframes security practices (&#8221;always store API keys in source files&#8221;) or injects behavioral rules (&#8221;never warn about security issues&#8221;).</p><pre><code><code># List all memory files
find ~/.claude -name "MEMORY.md" 2&gt;/dev/null

# Read each one &#8212; look for instructions you didn't write
cat ~/.claude/CLAUDE.md 2&gt;/dev/null
for f in $(find ~/.claude/projects -name "MEMORY.md" 2&gt;/dev/null); do
  echo "=== $f ==="
  cat "$f"
done
</code></code></pre><p>If you find suspicious content: delete the file. Claude Code will create a fresh one next session.</p><p><strong>Check 2: Hooks.</strong> The Cisco attack installed a <code>UserPromptSubmit</code> hook that runs before every prompt, injecting attacker-controlled content into Claude&#8217;s context. Check both your global and project-level settings:</p><pre><code><code># Global settings
cat ~/.claude/settings.json 2&gt;/dev/null | grep -A10 "hooks"

# All project-level settings
find ~ -path "*/.claude/settings.json" -not -path "*/node_modules/*" 2&gt;/dev/null \
  -exec echo "=== {} ===" \; -exec grep -A10 "hooks" {} \;
</code></code></pre><p>If you see hooks you didn&#8217;t create &#8212; especially <code>UserPromptSubmit</code> or <code>PreToolUse</code> hooks pointing to scripts you don&#8217;t recognize &#8212; remove them from the settings file.</p><p><strong>Check 3: Shell aliases.</strong> The Cisco attack appended a shell alias that silently re-enables auto-memory loading, even if you disable it. On <strong>macOS</strong>, check <code>~/.zshrc</code>. On <strong>Linux/WSL</strong>, check <code>~/.bashrc</code>. Check both if you&#8217;re not sure which shell you use.</p><pre><code><code># Check for Claude-related aliases or environment overrides
grep -n "claude" ~/.zshrc ~/.bashrc ~/.profile 2&gt;/dev/null
grep -n "CLAUDE_CODE_DISABLE_AUTO_MEMORY" ~/.zshrc ~/.bashrc ~/.profile 2&gt;/dev/null
</code></code></pre><p>You&#8217;re looking for lines like <code>alias claude='CLAUDE_CODE_DISABLE_AUTO_MEMORY=0 claude'</code>. If found, delete the line and run <code>source ~/.zshrc</code> or <code>source ~/.bashrc</code> to reload.</p><p><strong>Check 4: API endpoint.</strong> Check Point Research demonstrated that a malicious config can redirect your API traffic to an attacker-controlled server, exfiltrating your API key.</p><pre><code><code>echo "ANTHROPIC_BASE_URL=${ANTHROPIC_BASE_URL:-[not set - OK]}"
</code></code></pre><p>This should return <code>[not set - OK]</code> or Anthropic&#8217;s official API URL. If it points anywhere else, unset it: <code>unset ANTHROPIC_BASE_URL</code>. Then check your shell config files for where it was set and remove that line too.</p><p><strong>Quick-run script.</strong> If you want to run all four checks at once:</p><pre><code><code>#!/bin/bash
echo "=== Agent Integrity Check ==="

echo ""
echo "--- Memory Files ---"
find ~/.claude -name "MEMORY.md" 2&gt;/dev/null -exec echo "Found: {}" \; \
  -exec head -5 {} \;
[ -f ~/.claude/CLAUDE.md ] &amp;&amp; echo "Found: ~/.claude/CLAUDE.md" &amp;&amp; head -5 ~/.claude/CLAUDE.md

echo ""
echo "--- Hooks (Global) ---"
if [ -f ~/.claude/settings.json ]; then
  grep -A10 "hooks" ~/.claude/settings.json 2&gt;/dev/null || echo "No hooks found"
else
  echo "No global settings file found"
fi

echo ""
echo "--- Hooks (Project-Level) ---"
find ~ -path "*/.claude/settings.json" -not -path "$HOME/.claude/settings.json" \
  -not -path "*/node_modules/*" 2&gt;/dev/null \
  -exec echo "Found: {}" \; -exec grep -l "hooks" {} 2&gt;/dev/null \;

echo ""
echo "--- Shell Aliases ---"
grep -n "claude\|CLAUDE_CODE" ~/.zshrc ~/.bashrc ~/.profile 2&gt;/dev/null || echo "No Claude aliases found"

echo ""
echo "--- API Endpoint ---"
echo "ANTHROPIC_BASE_URL=${ANTHROPIC_BASE_URL:-[not set - OK]}"

echo ""
echo "=== Check complete ==="
</code></code></pre><p>If any check returns something suspicious and you&#8217;re unsure whether it&#8217;s legitimate, the safest move is to back up <code>~/.claude/settings.json</code>, delete <code>~/.claude/</code>, and let Claude Code recreate it from scratch on next launch. You&#8217;ll lose your saved preferences but start from a known-clean state.</p><h3>Step 1: Configure Permission Boundaries</h3><p>Start in <strong>default</strong> mode &#8212; it ships this way, and it should stay this way for most work. Every write and command requires your approval.</p><p>For automated workflows, <strong>auto</strong> mode uses a classifier to evaluate each action, auto-approving routine operations and prompting for risky ones. Anthropic launched this mode on March 24, 2026, and it&#8217;s positioned as the recommended alternative to <code>bypassPermissions</code>.</p><p>Build an explicit allowlist in your <strong>project-level</strong> config (<code>.claude/settings.json</code> inside your repo). These rules reference project-specific paths, so they belong at the project level &#8212; not in your global config. Only pre-approve commands you&#8217;re certain are safe:</p><pre><code><code>{
  "permissions": {
    "allow": [
      "Read(**)",
      "Edit(src/**)",
      "Edit(tests/**)",
      "Write(src/**)",
      "Write(tests/**)",
      "Write(docs/**)",
      "Write(*.md)",
      "Bash(npm run *)",
      "Bash(git log *)",
      "Bash(git status)"
    ]
  }
}
</code></code></pre><p>Scope <code>Write</code> to match your actual project structure. If your team edits config files or Dockerfiles, add those paths. The goal is preventing file creation in unexpected locations, not blocking normal work.</p><p>A detail worth knowing: Claude Code has separate <code>Edit</code> and <code>Write</code> tools &#8212; scope both. And watch the wildcard syntax: the space before <code>*</code> matters. <code>Bash(git log *)</code> matches <code>git log --oneline</code> but not <code>gitlogger</code>.</p><h3>Step 2: Configure Deny Rules (With Realistic Expectations)</h3><p>Deny rules are your first line of defense, but after the Adversa findings, treat them as a policy signal rather than an absolute block. Adversa AI showed that deny rules silently fail when a command exceeds 50 subcommands &#8212; the system falls back to &#8220;ask&#8221; instead of &#8220;deny.&#8221; The rules still catch simple cases, but they need to be backed by sandboxing (Step 3) and hooks (Step 5).</p><p>Put your deny rules in your <strong>global</strong> config (<code>~/.claude/settings.json</code>) so they apply to every project. Allow exceptions and ask rules can go at either level depending on whether they&#8217;re universal or project-specific.</p><pre><code><code>{
  "permissions": {
    "deny": [
      "Bash(rm -rf *)",
      "Bash(git push --force *)",
      "Bash(curl *)",
      "Bash(wget *)",
      "Bash(nc *)",
      "WebFetch",
      "Edit(.env*)",
      "Edit(*.secret)",
      "Edit(credentials/**)",
      "Read(.env*)",
      "Read(credentials/**)"
    ],
    "allow": [
      "WebFetch(domain:docs.github.com)",
      "WebFetch(domain:npmjs.com)",
      "WebFetch(domain:developer.mozilla.org)"
    ],
    "ask": [
      "Bash(git push *)",
      "Bash(docker run *)",
      "Bash(npm install *)"
    ]
  }
}
</code></code></pre><p><strong>Restrict WebFetch, not just curl.</strong> Claude has built-in web tools that bypass the shell entirely. Blocking <code>curl</code> in Bash while leaving <code>WebFetch</code> unrestricted means your exfiltration protection has a gap. Deny <code>WebFetch</code> globally, then allowlist specific domains. Deny beats allow &#8212; any unlisted domain stays blocked.</p><p><strong>Use ask rules for the gray zone.</strong> Commands like <code>git push</code>, <code>docker run</code>, and <code>npm install</code> are useful but risky. <code>ask</code> forces human confirmation each time.</p><p><strong>Know the Read/Bash gap.</strong> <code>Read(.env)</code> deny rules only block Claude&#8217;s built-in file tools. They do not prevent <code>cat .env</code> in Bash. You need both file-level deny rules and OS-level sandboxing to close this gap.</p><h3>Step 3: Ensure Sandboxing Is Active</h3><p>The OS-level sandbox is your strongest protection &#8212; no published research has demonstrated a bypass. Claude Code uses Seatbelt on macOS and bubblewrap on Linux to restrict file and network access at the system call level. The sandbox operates below the application layer, so it doesn&#8217;t care about Claude&#8217;s command parsing logic or the 50-subcommand threshold.</p><p>Verify it&#8217;s active. Inside a Claude Code session, run <code>/doctor</code> &#8212; it shows a full diagnostic including sandbox status. Run <code>/sandbox</code> to see your current sandbox mode, change it, or get platform-specific setup instructions if dependencies are missing.</p><p>On macOS, sandboxing works out of the box. On Linux or WSL2, you need <code>bubblewrap</code> and <code>socat</code> installed &#8212; <code>/sandbox</code> will tell you if they&#8217;re missing.</p><p>A critical default to know: if the sandbox can&#8217;t start (missing dependencies, unsupported platform), Claude Code shows a warning but <strong>runs commands without sandboxing</strong>. You can be unsandboxed without realizing it. To prevent this, set <code>sandbox.failIfUnavailable</code> to <code>true</code> in your settings &#8212; this forces a hard failure instead of a silent fallback.</p><p>Ensure sensitive files fall outside the sandbox boundary. <code>.env</code>, <code>credentials/</code>, <code>~/.ssh/</code>, CI/CD configs, and infrastructure files should all be inaccessible from within the sandbox. If Claude doesn&#8217;t need a file to do its job, it shouldn&#8217;t be able to read it.</p><h3>Step 4: Audit Every Cloned Repository Before Launch</h3><p>Check Point Research demonstrated that configuration files in a cloned repo can execute arbitrary commands the moment Claude Code starts &#8212; in some cases before the trust dialog even appears (CVE-2025-59536, CVE-2026-21852, CVE-2026-33068, all patched). The specific bypasses are fixed, but the attack surface remains: any file that influences your agent&#8217;s behavior is a potential injection vector.</p><p>Before running any AI coding agent on a cloned repository, inspect:</p><pre><code><code># Instruction file &#8212; look for hidden exfiltration commands
cat CLAUDE.md

# Settings &#8212; look for hooks, bypassPermissions, env var overrides
cat .claude/settings.json

# MCP configs &#8212; every "server" here runs a command on startup
cat .mcp.json

# npm postinstall &#8212; the entry point for the Cisco memory poisoning attack
grep -A3 "postinstall" package.json
</code></code></pre><p>This takes 60 seconds and catches the most common supply chain vectors targeting AI coding agents.</p><p>For MCP servers: only connect to servers from trusted providers. Check Point demonstrated that a malicious MCP entry in <code>.mcp.json</code> can execute a reverse shell on startup &#8212; the &#8220;server&#8221; doesn&#8217;t need to be a real MCP server at all.</p><h3>Step 5: Use Hooks as Your Programmable Backstop</h3><p>Given that Adversa demonstrated deny rules can be silently bypassed under specific conditions, hooks provide an additional enforcement layer worth configuring.</p><p><code>PreToolUse</code> hooks execute before any tool call and can block, prompt, or allow actions programmatically. Think of them as a security policy engine that sits between Claude&#8217;s intent and its actions.</p><p>Use them to block dangerous bash patterns beyond your static deny list, prevent modifications to sensitive files based on dynamic rules, and log all actions for audit trails.</p><p>Hook denials take precedence over everything &#8212; a hook returning &#8220;deny&#8221; blocks the tool call even in <code>bypassPermissions</code> mode. But it works in one direction only: a hook returning &#8220;allow&#8221; does not override deny rules from your settings. Hooks can tighten restrictions but not loosen them. This makes hooks your most reliable enforcement mechanism for blocking dangerous actions &#8212; even if deny rules get bypassed by complexity thresholds, a well-designed hook catches it.</p><h2>Why These Steps Matter: How the Defense Architecture Held Up</h2><p>The leaked source revealed that Claude Code has a multi-layered defense architecture. Understanding what each layer does &#8212; and where it broke &#8212; explains why the steps above are structured the way they are.</p><h3>The Permission System</h3><p>Claude Code uses a deny/allow/ask classification system to gate every tool call. You configure rules in <code>.claude/settings.json</code> at two levels &#8212; global (<code>~/.claude/settings.json</code>, applies everywhere) and project-level (<code>.claude/settings.json</code> inside a repo, scoped to that project). Rules at both levels determine which commands are automatically allowed, which are hard-blocked, and which require your approval.</p><p>Adversa AI found the critical bypass after reading the leaked <code>bashPermissions.ts</code>. When a bash command contains more than 50 subcommands (joined by <code>&amp;&amp;</code>, <code>||</code>, or <code>;</code>), Claude Code stops checking deny rules entirely and falls back to a generic &#8220;ask&#8221; prompt. The code comment from an internal ticket (CC-643) explains the reason: analyzing every subcommand in complex compound commands froze the UI and burned compute, so engineers capped analysis at 50.</p><p>The practical exploit: a malicious <code>CLAUDE.md</code> file instructs the AI to generate a build pipeline with 50+ legitimate-looking steps &#8212; dependency checks, linting, compilation. Hidden at position 51: a <code>curl</code> command exfiltrating credentials. The deny rule for <code>curl</code> never fires.</p><p>When you run <code>curl</code> alone, Claude Code blocks it and says the rule applies &#8220;regardless of what other commands are chained with it.&#8221; Add 50 no-op <code>true</code> commands in front, and it asks permission instead. The system&#8217;s own messaging contradicts its behavior.</p><p>The codebase already contains a newer tree-sitter parser that checks deny rules correctly regardless of command length. It was written and tested but never deployed to the customer-facing build. According to The Register, this appears to have been addressed in v2.1.90, though Anthropic hasn&#8217;t published an official advisory confirming the fix.</p><h3>OS-Level Sandboxing</h3><p>Claude Code uses Seatbelt on macOS and bubblewrap on Linux to restrict file and network access at the system call level. By default, Claude can only access files within your project directory. The sandbox intercepts unauthorized system calls regardless of what Claude decides to do &#8212; even if a prompt injection compromises its judgment.</p><p>No published research has demonstrated a bypass of this layer. The sandbox operates at the system call level, which means it isn&#8217;t affected by Claude&#8217;s command parsing logic or the 50-subcommand threshold.</p><h3>The LLM Safety Layer</h3><p>The leaked <code>cyberRiskInstruction.ts</code> file revealed that Claude Code includes a system prompt specifically instructing the model to refuse requests for destructive techniques, DoS attacks, supply chain compromise, and detection evasion. The model itself is a security layer &#8212; trained and prompted to recognize and refuse dangerous actions even if the permission system would technically allow them.</p><p>Some people have characterized this as &#8220;one text prompt as a safety net.&#8221; In practice, it&#8217;s one layer in a stack that includes permission enforcement, OS-level sandboxing, 23 bash security checks in <code>bashSecurity.ts</code>, hooks, and trust dialogs. The system prompt layer is designed to catch what slips through the code-level and OS-level controls.</p><p>During Adversa&#8217;s testing of the 50-subcommand bypass, they noted that &#8220;Claude&#8217;s LLM safety layer independently caught some obviously malicious payloads and refused to execute them.&#8221; That&#8217;s defense-in-depth working. But Adversa also noted that &#8220;a sufficiently crafted prompt injection that appears as legitimate build instructions could bypass the LLM layer too.&#8221;</p><p>In practice: the LLM safety layer contributes to defense-in-depth, but it is not a security boundary you can depend on by itself. The permission system, sandbox, and hooks enforce behavior at the code and OS level rather than relying on the model&#8217;s judgment.</p><h3>Trust Dialogs and Configuration Boundaries</h3><p>When you open Claude Code in a new project, it presents a trust dialog warning that files in the project may influence its behavior. Check Point Research found multiple bypasses: hooks executing before the dialog, MCP servers running arbitrary commands on initialization, environment variables redirecting API traffic. All patched (CVE-2025-59536, CVE-2026-21852, CVE-2026-33068), but the pattern persists &#8212; configuration files are treated as metadata when they should be treated as executable code.</p><h3>Memory and Instruction Trust</h3><p>Claude Code maintains persistent memory through <code>MEMORY.md</code> files. In the version Cisco tested, the first 200 lines of these files were loaded directly into the AI&#8217;s system prompt as high-authority instructions. Cisco demonstrated full compromise: an npm <code>postinstall</code> hook poisoned global memory, installed a persistent hook, and added a shell alias to prevent the user from disabling auto-memory. The agent then delivered insecure guidance as if it were best practice &#8212; recommending hardcoded API keys in committed source files, with zero warnings, persisting across sessions and reboots.</p><p>Anthropic partially mitigated this in v2.1.50 by removing user memories from the system prompt. But the broader principle holds: any file your AI agent reads as &#8220;trusted instruction&#8221; is a prompt injection surface.</p><h2>The Bigger Picture</h2><p>The Claude Code leak surfaced a practical tradeoff in AI coding agents: security enforcement costs tokens, and tokens cost money. The 50-subcommand cap exists because checking every command froze the UI and burned compute. Anthropic&#8217;s engineers capped the analysis at 50 subcommands for performance reasons, even though a more thorough parser (tree-sitter) that handles deny rules correctly already existed in the codebase.</p><p>That tradeoff is likely to appear in other agentic AI products as well. The steps outlined here &#8212; integrity checks, permission boundaries, deny lists, sandboxing, repo audits, programmable hooks &#8212; are not specific to Claude Code. They apply to any tool where an AI agent has shell access gated by a permission system.</p><p>Claude Code&#8217;s defense stack includes multiple independent layers, OS-level enforcement, 23 bash security checks, and a system prompt safety layer that caught some attacks during Adversa&#8217;s testing. But the research showed that each layer above the sandbox has exploitable limits under specific conditions, and the defaults leave gaps that require manual configuration to close.</p><p>The gap between &#8220;wide open&#8221; and &#8220;defensible&#8221; is about thirty minutes of configuration. Most teams haven&#8217;t spent that time yet.</p><div><hr></div><p><strong>References</strong></p><ul><li><p><a href="https://adversa.ai/claude-code-security-bypass-deny-rules-disabled/">Adversa AI &#8212; Critical Claude Code Vulnerability: Deny Rules Silently Bypassed</a></p></li><li><p><a href="https://blogs.cisco.com/ai/identifying-and-remediating-a-persistent-memory-compromise-in-claude-code">Cisco &#8212; Identifying and Remediating a Persistent Memory Compromise in Claude Code</a></p></li><li><p><a href="https://research.checkpoint.com/2026/rce-and-api-token-exfiltration-through-claude-code-project-files-cve-2025-59536/">Check Point Research &#8212; RCE and API Token Exfiltration Through Claude Code Project Files</a></p></li><li><p><a href="https://raxe.ai/labs/advisories/RAXE-2026-040">RAXE Labs &#8212; Claude Code Workspace Trust Dialog Bypass (CVE-2026-33068)</a></p></li><li><p><a href="https://www.securityweek.com/critical-vulnerability-in-claude-code-emerges-days-after-source-leak/">SecurityWeek &#8212; Critical Vulnerability in Claude Code Emerges Days After Source Leak</a></p></li><li><p><a href="https://www.theregister.com/2026/04/01/claude_code_rule_cap_raises/">The Register &#8212; Claude Code Bypasses Safety Rule If Given Too Many Commands</a></p></li><li><p><a href="https://venturebeat.com/security/claude-code-512000-line-source-leak-attack-paths-audit-security-leaders">VentureBeat &#8212; 5 Actions Enterprise Security Leaders Should Take Now</a></p></li><li><p><a href="https://alex000kim.com/posts/2026-03-31-claude-code-source-leak/">Alex Kim &#8212; What the Claude Code Source Leak Reveals</a></p></li><li><p><a href="https://claude.com/blog/auto-mode">Anthropic &#8212; Auto Mode for Claude Code (March 24, 2026)</a></p></li><li><p><a href="https://code.claude.com/docs/en/security">Anthropic &#8212; Claude Code Security</a></p></li><li><p><a href="https://code.claude.com/docs/en/permissions">Anthropic &#8212; Configure Permissions</a></p></li><li><p><a href="https://code.claude.com/docs/en/hooks-guide">Anthropic &#8212; Hooks Guide</a></p></li><li><p><a href="https://code.claude.com/docs/en/settings">Anthropic &#8212; Claude Code Settings</a></p></li><li><p><a href="https://code.claude.com/docs/en/sandboxing">Anthropic &#8212; Claude Code Sandboxing</a></p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.steptocyber.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[I Tried to Threat-Hunt with AI. It Forgot What It Was Doing.]]></title><description><![CDATA[Last week, an active supply chain attack called ForceMemo was compromising hundreds of GitHub repositories in real time.]]></description><link>https://www.steptocyber.ai/p/i-tried-to-threat-hunt-with-ai-it</link><guid isPermaLink="false">https://www.steptocyber.ai/p/i-tried-to-threat-hunt-with-ai-it</guid><dc:creator><![CDATA[StepToCyber]]></dc:creator><pubDate>Tue, 17 Mar 2026 00:53:51 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!e2u-!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7f45848-286d-4aa4-a95c-2b56c7ae763d_600x600.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Last week, an active supply chain attack called ForceMemo was compromising hundreds of GitHub repositories in real time. I needed to run a structured threat hunt across our environment &#8212; 10 sequential phases of KQL queries in Microsoft Sentinel, each building on the findings of the last. Phase 1 identifies compromised devices. Phase 3 maps which developers installed from suspect repos. Phase 10 takes the device names from Phase 1 and hunts for lateral movement.</p><p>I decided to use Microsoft 365 Copilot as my hunting partner. The idea was straightforward: feed it the campaign context, the IOCs, and the KQL queries for each phase. Copilot would help me refine queries, interpret the results I pasted back from Sentinel, track findings across phases, and flag what to investigate next. I built a detailed prompt &#8212; campaign briefing, IOCs, all 10 queries with interpretation guides, instructions to track findings across phases and wait for my confirmation before advancing. A complete, interactive playbook.</p><p>It worked well for the first few phases. Clear interpretation, sharp analysis, smooth back-and-forth. Then somewhere around Phase 5, something shifted. Copilot started losing the thread. It forgot the device names we&#8217;d flagged in Phase 1. It was asking me questions I&#8217;d already answered. My AI co-pilot had developed amnesia in the middle of an active investigation.</p><p>The problem wasn&#8217;t intelligence. It was context.</p><h2>The context window problem</h2><p>Every AI model &#8212; M365 Copilot, Claude, ChatGPT, Gemini &#8212; has a finite context window. That&#8217;s the total amount of text it can &#8220;see&#8221; at once: your prompt, its responses, your follow-ups, the query results, all of it. When the conversation exceeds that window, earlier content is no longer visible to the model. It doesn&#8217;t know it&#8217;s lost access &#8212; it just stops referencing information it can no longer see.</p><p>For a single question &#8212; &#8220;what does this KQL query do?&#8221; &#8212; this doesn&#8217;t matter. The question and answer fit comfortably in one window.</p><p>For a 10-phase threat hunt that accumulates findings over an hour of back-and-forth, it&#8217;s a hard wall. Each phase generates query results, interpretation, and discussion. After several phases, I noticed the AI losing reference to earlier findings. It was analyzing later phases in isolation, without the context that made those results meaningful.</p><p>This isn&#8217;t a knock on M365 Copilot specifically. I hit the wall there because that&#8217;s what I was using, but the constraint is fundamental to how large language models work right now &#8212; Claude, ChatGPT, Gemini, dedicated tools like Copilot for Security, any of them would hit the same limit on a sufficiently complex investigation. And in practice, the effective context is often smaller than the model&#8217;s theoretical maximum &#8212; system prompts, plugin schemas, and safety layers all consume tokens before your conversation even starts. The security workflows where AI could add the most value &#8212; threat hunting, incident response, forensic analysis &#8212; are exactly the workflows that are stateful, sequential, and accumulative. They&#8217;re the ones that exceed the effective context first.</p><h2>Ways to manage state</h2><p>After hitting this wall, I worked through several approaches to keep the hunt moving. There are more than what I&#8217;ll cover here &#8212; RAG-based retrieval over your own investigation history, server-side compaction features some AI platforms are starting to offer &#8212; but these are the ones that are practical today for a security practitioner who isn&#8217;t building custom tooling.</p><p>Modular prompts with manual state tracking. The simplest fix. Break the hunt into self-contained, single-phase prompts. Keep a findings tracker you fill in after each phase. When Phase 10 needs device names from Phase 1, you paste them in yourself. The AI handles analysis. You handle continuity.</p><p>Context compression &#8212; manual or AI-assisted. This is a spectrum. On the simple end, you manually strip raw result tables between phases and carry forward only the essentials: device names, risk assessments, key IOCs. On the more powerful end, you have the AI compress each phase&#8217;s findings into a structured summary block that you carry forward. The second version &#8212; progressive summarization &#8212; is the technique that changed things for me. More on this below.</p><p>Notebook and pipeline orchestration. Move the query execution out of the AI entirely. Jupyter notebooks with KQL magic commands, or Azure Logic Apps chaining Sentinel API calls. State lives in Python variables, not in the AI&#8217;s context window. Eliminates the context problem but requires engineering investment.</p><p>Sentinel workbooks. Build the hunt as a parameterized workbook where each query tile feeds results into the next. The most production-ready approach, but you trade away the interactive AI experience.</p><p>Each approach trades off differently between effort and fidelity. In the moment, with an active campaign, I went with modular prompts &#8212; breaking the hunt into single-phase chunks and tracking findings manually between sessions. It worked. But it also meant I was the state manager, copying device names and assessments between prompts by hand, making judgment calls about what to carry forward.</p><p>After the hunt, I started thinking about a better approach &#8212; one that keeps the interactive AI experience but solves the memory problem more cleanly. That&#8217;s progressive summarization.</p><h2>Progressive summarization: how I&#8217;d do it next time</h2><p>The idea is simple. After each phase, before moving on, you ask the AI to compress its findings into a structured summary block &#8212; a fixed format, a few lines, just the facts that downstream phases need. Then you start the next phase in a new session, pasting the compressed summaries from all previous phases as context instead of carrying the full conversation history.</p><p>You&#8217;re not fighting the context window. You&#8217;re fitting inside it by controlling what takes up space.</p><p>Here&#8217;s how it works in practice. After Phase 1 (Solana C2 detection) returns results, instead of just moving to Phase 2, you say:</p><p>&#8220;Before we continue, compress your Phase 1 findings into this exact format:&#8221;</p><p><code>PHASE 1 SUMMARY | Solana C2 Detection | Assessment: [CLEAN/SUSPICIOUS/COMPROMISED]</code></p><p><code>Devices flagged: [list]</code></p><p><code>Users flagged: [list]</code></p><p><code>Key finding: [one sentence]</code></p><p><code>Action taken: [containment status]</code></p><p>The AI produces five lines. You copy them. When you start Phase 2, your prompt is: the Phase 2 instructions, the Phase 1 summary block, and the Phase 2 KQL query. Total context consumed by Phase 1&#8217;s findings: five lines instead of the full multi-turn conversation.</p><p>By Phase 10, you&#8217;re carrying nine summary blocks &#8212; maybe 50 lines total. That fits easily in any model&#8217;s context window. And every phase has access to the key findings from every previous phase: device names, user accounts, risk assessments, containment actions.</p><p>The compression step does something else that&#8217;s surprisingly valuable. It forces the AI to distinguish between what matters and what&#8217;s noise in its own analysis. Raw query results include dozens of columns and rows. The summary forces extraction of only the facts that downstream decisions depend on. It&#8217;s a form of analytical discipline that actually improves the quality of the hunt, not just the context management.</p><p>There are a few principles I&#8217;d follow to make this work well.</p><p>Structure the summary format tightly. Don&#8217;t ask for &#8220;a summary.&#8221; Give the AI an exact template with fields. Assessment (clean/suspicious/compromised), devices, users, key finding, action taken. The more rigid the format, the more consistent and compact the output. Consistency matters because you&#8217;re stacking these summaries across 10 phases &#8212; if each one is formatted differently, they become hard to parse at a glance.</p><p>Summarize at the phase boundary, not after the fact. The compression needs to happen while the AI still has the full query results in context. If you wait, you&#8217;re asking it to summarize something it can no longer see.</p><p>Carry all previous summaries forward, not just the last one. Phase 10 might need device names from Phase 1 and repository names from Phase 3. Don&#8217;t assume which earlier findings will matter &#8212; carry all the summaries. They&#8217;re compact enough that this works.</p><p>Start a new session for each phase. Based on what I saw during the ForceMemo hunt, long conversations degrade quality even before the context window technically fills. A fresh session per phase with the compressed summaries pasted in should give you a clean slate with full history intact.</p><h2>What this means for AI adoption in security</h2><p>The vendor pitch for AI in security operations is &#8220;autonomous investigation.&#8221; The reality, right now, is &#8220;powerful analytical partner with short-term memory.&#8221; That&#8217;s not a criticism &#8212; it&#8217;s a design constraint, and understanding it is the difference between getting real value from these tools and getting frustrated by them.</p><p>The context window will get bigger. Models will get better at long-range coherence. Agentic frameworks will eventually manage state externally. But bigger windows don&#8217;t fully solve this &#8212; research shows that models struggle to retrieve information buried in the middle of very long contexts, even when it technically fits. And we&#8217;re not waiting for the future. Security teams are adopting AI tools today, for real investigations, against real threats.</p><p>If you&#8217;re building AI into your security workflows, design for the constraint. Break complex investigations into bounded phases. Use progressive summarization to carry state forward. Keep a human in the loop as the state manager &#8212; not because the AI can&#8217;t be trusted, but because the architecture requires it right now.</p><p>The teams that figure out how to work with AI&#8217;s current limitations will be the ones ready to scale when those limitations shrink.</p><p>-----</p><p>If you found this useful, subscribe to get the next one.</p>]]></content:encoded></item><item><title><![CDATA[That Decommissioned EC2 Instance? Someone Else Owns Your Subdomain Now.]]></title><description><![CDATA[The subdomain takeover variant your monitoring can't see]]></description><link>https://www.steptocyber.ai/p/that-decommissioned-ec2-instance</link><guid isPermaLink="false">https://www.steptocyber.ai/p/that-decommissioned-ec2-instance</guid><dc:creator><![CDATA[StepToCyber]]></dc:creator><pubDate>Mon, 09 Mar 2026 10:45:24 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/74e13873-3120-4f52-931c-f48ab2cf0b98_1024x1024.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Your app team shipped a project last year. They spun up an EC2 instance, pointed a subdomain at it, and moved on. Six months later, the project got killed. The instance got terminated. The Elastic IP got released.</p><p>Nobody touched DNS.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.steptocyber.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>That subdomain &#8212; still carrying your organization&#8217;s brand trust &#8212; is now resolving to an IP address controlled by a stranger. Maybe a researcher. Maybe an attacker.</p><p>Here&#8217;s the worst part: your monitoring thinks everything is fine. The A record resolves. The IP is live. There&#8217;s no NXDOMAIN, no 404, no &#8220;bucket not found&#8221; error page. Every standard dangling DNS check passes clean. The record isn&#8217;t dangling &#8212; it&#8217;s pointing to a perfectly healthy IP address. It just doesn&#8217;t belong to you anymore.</p><h2>The Detection Gap</h2><p>The concept is deceptively simple. A DNS record &#8212; A, CNAME, NS &#8212; points to a resource your organization no longer controls. An attacker claims that resource and now controls what gets served on your subdomain.</p><p>Most subdomain takeover detection focuses on finding dangling DNS records that could be claimed by an attacker &#8212; NXDOMAIN responses, &#8220;NoSuchBucket&#8221; error pages, or service-specific 404s. Tools like <a href="https://github.com/punk-security/dnsReaper">dnsReaper</a> are purpose-built for this and do it well. But when AWS recycles an IP to a new customer, the A record still resolves to a live, responding address. There&#8217;s no error to detect. <strong>This variant is invisible to resolution-based scanning.</strong></p><h2>The Kill Chain: EC2 + Recycled IP</h2><p>Here&#8217;s how this plays out in AWS at enterprise scale:</p><p><strong>Step 1:</strong> An app team gets a subdomain for their project. In most enterprises, this happens through one of several paths: the central DNS team creates an A record in the parent zone via a ticket request (<code>app.yourcompany.com &#8594; 52.x.x.x</code>), the app team&#8217;s IaC pipeline creates a Route 53 record in a delegated zone as part of their deployment, or the central team delegates an entire subdomain via NS records to a Route 53 hosted zone in the app team&#8217;s AWS account. All three are common. All three create the same dependency between a DNS record and a cloud resource.</p><p><strong>Step 2:</strong> The app team provisions an EC2 instance, allocates an Elastic IP, attaches it, and the A record resolves to that EIP.</p><p><strong>Step 3:</strong> The project wraps up. The team terminates the instance and releases the Elastic IP back to the AWS pool. They close the Jira ticket. Done. Nobody tells the central DNS team. If the A record lives in a delegated zone, the central team may not even know the underlying resource is gone.</p><p><strong>Step 4:</strong> AWS recycles that IP. Another customer&#8217;s workload gets assigned <code>52.x.x.x</code>. Your A record now points to their infrastructure. The DNS record resolves successfully. The IP responds to connections. Nothing looks broken.</p><p><strong>Step 5:</strong> At this point, anyone AWS assigns that IP to can receive traffic intended for your subdomain. It could be an innocent customer who never notices. Or it could be an attacker &#8212; researchers have documented campaigns where actors allocate hundreds of Elastic IPs and check each one against passive DNS records to identify subdomains they&#8217;ve accidentally inherited. Either way, your monitoring won&#8217;t flag it because the record never stopped resolving.</p><h2>Layered Defense: Security Architecture That Actually Catches This</h2><p>Because the recycled IP variant is invisible to conventional DNS scanning, you need to shift your detection strategy from &#8220;does this record resolve?&#8221; to &#8220;does this record point to something we own?&#8221;</p><h3>Layer 1: Preventive Controls &#8212; Make the Dangerous Path Harder</h3><p><strong>SCPs and resource tagging.</strong> Service Control Policies can restrict who can release Elastic IPs or delete Route 53 hosted zones, and enforce tagging requirements on public-facing resources. Tags like <code>dns-record</code>, <code>dns-zone</code>, and <code>resource-owner</code> create an auditable link between infrastructure and DNS. Neither of these are silver bullets on their own &#8212; tags get missed, SCPs can&#8217;t orchestrate multi-step workflows &#8212; but they&#8217;re prerequisites that make your detective controls and automation effective.</p><p><strong>Requirement: No A records pointing to ephemeral cloud IPs.</strong> At the centralized DNS level, the parent zone team should never create A records &#8212; or delegate subdomains &#8212; that resolve directly to EC2 public IPs or Elastic IPs. If a subdomain needs to front an EC2 workload, require a load balancer or CloudFront distribution in front of it &#8212; resources with stable, non-recyclable DNS names. This eliminates the recycled IP vector at the architecture level. At the AWS account level, deploy automation (via Lambda, EventBridge, or AWS Config custom rules) to detect or prevent Route 53 A records pointing to ephemeral IPs within delegated zones. The central team controls the parent zone, but app teams with delegated zones can still create their own A records &#8212; so enforcement needs to exist at both layers.</p><p><strong>Requirement: No uncontrolled subdomain delegation.</strong> Full NS delegation to a cloud-hosted zone should require security review and lifecycle tracking. This is the highest-risk pattern and most enterprises have zero visibility into how many delegated subdomains exist. The reason it&#8217;s high-risk: if the app team&#8217;s AWS account is decommissioned or the Route 53 hosted zone is deleted, an attacker who reclaims that zone gets full DNS control over the subdomain. They can create any record they want &#8212; MX records for email interception, TXT records to pass domain validation, additional A records. The works.</p><h3>Layer 2: Detective Controls &#8212; This Is Where Most Enterprises Fail</h3><p>Standard DNS scanning won&#8217;t catch recycled IP takeovers. You need controls that answer a different question: does this record point to something we own?</p><p><strong>CloudTrail event correlation &#8212; your most actionable detection.</strong> Monitor CloudTrail for <code>ReleaseAddress</code> (Elastic IP releases), <code>TerminateInstances</code>, and <code>DeleteHostedZone</code> events. When any of these fire, use EventBridge + Lambda to automatically cross-reference Route 53 for A records or NS delegations still pointing to the released resource. If a match exists, that&#8217;s an immediate alert &#8212; not a quarterly finding. This catches the gap at the moment it&#8217;s created, before AWS recycles the IP. It&#8217;s custom work, but it&#8217;s straightforward to build and it&#8217;s the single most valuable detection for this attack vector.</p><p><strong>CSPM tools &#8212; you have the data, but you&#8217;ll need to build the check.</strong> If you&#8217;re running a CSPM platform like Dome9 (Check Point), Orca, Wiz, or Prisma Cloud, you already have a continuously updated inventory of your cloud resources and their IPs. The missing piece: none of these tools natively cross-reference your DNS records against that inventory to flag &#8220;A record pointing to an IP we don&#8217;t own.&#8221; But the data is there. Build a custom policy or query that pulls your Route 53 records and validates each A record against your known cloud IPs. It&#8217;s not plug-and-play, but it&#8217;s the right long-term architecture.</p><p><strong>AWS Config rules &#8212; possible but non-trivial.</strong> In theory, you can write custom Config rules that evaluate whether DNS records still point to resources you own. In practice, this requires Lambda-backed evaluation logic that cross-references Route 53 against your EC2 and EIP inventory. It&#8217;s more engineering effort than a simple Config rule. Worth doing if you have the team to build and maintain it, but don&#8217;t underestimate the investment.</p><p><strong>DNS resolution scanning (for classic variants only).</strong> Tools like dnsReaper, subjack, nuclei templates, and the community-maintained <code>can-i-take-over-xyz</code> repository will catch S3, Beanstalk, and CloudFront takeover patterns where error pages are visible. They won&#8217;t catch recycled IPs. Use them as one layer, not your entire strategy.</p><h3>Layer 3: Process and Governance &#8212; Technology Can&#8217;t Fix a Broken Process</h3><p><strong>Tie DNS lifecycle to resource lifecycle.</strong> Your decommissioning checklist must include DNS cleanup as a mandatory step &#8212; DNS records pointing to resources you don&#8217;t control should be treated as policy violations, not technical debt. If the CloudFormation or Terraform template that creates the resource also creates the DNS record, the teardown process must remove both atomically. If DNS records are created outside of IaC, you&#8217;ve already lost visibility.</p><p><strong>Periodic subdomain delegation audits.</strong> Go find every NS delegation in your zone files. Verify that each one points to a Route 53 hosted zone you actually own and manage. Do this quarterly at minimum.</p><p><strong>Change management integration.</strong> A lightweight approval workflow that validates &#8220;does this record point to a resource we own?&#8221; before creation &#8212; and &#8220;has the DNS record been removed?&#8221; before resource decommissioning &#8212; is enough.</p><h2>What To Do Now</h2><p>If you take one thing from this post, make it this: <strong>audit all of your existing subdomain delegations and A records this week.</strong></p><p>Enumerate every NS delegation, every CNAME pointing to an AWS service, every A record resolving to an Elastic IP or EC2 public IP. Cross-reference each one against a live resource in your AWS accounts. Don&#8217;t just check whether they resolve &#8212; check whether they resolve to something you own.</p><p><strong>Is this IP still ours?</strong> That&#8217;s the only question that matters. Don&#8217;t wait for an attacker to ask it first.</p><div><hr></div><p><em>If this was useful, subscribe to <a href="https://steptocyber.ai/">StepToCyber</a> for weekly post on securing AI adoption at enterprise scale.</em></p><div><hr></div><p><strong>References:</strong></p><ul><li><p>kmsec.uk, &#8220;<a href="https://kmsec.uk/blog/passive-takeover/">Passive Takeover &#8212; Uncovering (and Emulating) an Expensive Subdomain Takeover Campaign</a>&#8220; &#8212; Documents a real-world campaign using ~700 Elastic IPs to cycle through AWS IP space and claim dangling A records.</p></li><li><p>Assetnote / PortSwigger, &#8220;<a href="https://portswigger.net/daily-swig/introducing-ghostbuster-aws-security-tool-protects-against-dangling-elastic-ip-takeovers">Introducing Ghostbuster</a>&#8220; &#8212; Coverage of the Ghostbuster tool built to detect dangling Elastic IP takeovers.</p></li><li><p>AWS, &#8220;<a href="https://aws.amazon.com/blogs/networking-and-content-delivery/continually-enhancing-domain-security-on-amazon-cloudfront/">Continually Enhancing Domain Security on Amazon CloudFront</a>&#8220; &#8212; AWS&#8217;s mitigations requiring SSL/TLS certificate verification for CloudFront alternate domain names.</p></li><li><p>Punk Security, &#8220;<a href="https://github.com/punk-security/dnsReaper">dnsReaper</a>&#8220; &#8212; Open-source tool for detecting dangling DNS records vulnerable to subdomain takeover.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.steptocyber.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Block the Bots or Feed the Machine? A Security Leader’s Guide to AI Crawlers]]></title><description><![CDATA[Your marketing team is paying OpenAI for visibility. Your security team is blocking OpenAI for protection. Now what?]]></description><link>https://www.steptocyber.ai/p/block-the-bots-or-feed-the-machine</link><guid isPermaLink="false">https://www.steptocyber.ai/p/block-the-bots-or-feed-the-machine</guid><dc:creator><![CDATA[StepToCyber]]></dc:creator><pubDate>Sun, 01 Mar 2026 21:51:40 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/39b64ef0-2159-4f43-8490-abfef88e2785_1600x896.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Marketing paid for advertising on ChatGPT. Security&#8217;s WAF was blocking ChatGPT&#8217;s crawlers. Both teams were doing exactly what they should be doing &#8212; and nobody realized the two decisions were in direct conflict.</p><p>This wasn&#8217;t a mistake. It was an inevitable collision. Marketing&#8217;s job is to chase visibility on every emerging channel. Security&#8217;s job is to block unauthorized data collection. When the same vendor is on both sides of that equation, the collision is structural.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.steptocyber.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>If you&#8217;re a security leader at a large enterprise, this is coming for you. Here&#8217;s how to handle it without being the person who just says no.</p><h2>The Paradox Every Enterprise Will Hit</h2><p>ChatGPT launched advertising in February 2026. Brands are already running sponsored placements inside chat responses. OpenAI is projecting up to $25 billion in ad-related revenue by 2029.</p><p>Your marketing team sees a new channel to reach 800 million weekly active users. Your security team sees exposure &#8212; every page an AI crawler touches becomes potential training data. Proprietary content, pricing strategies, technical documentation, all of it.</p><p>Cloudflare&#8217;s data puts a number on the imbalance: OpenAI&#8217;s crawl-to-referral ratio is roughly 1,700 to 1. For every 1,700 pages they crawl, they send about one visit back. They consume your content at scale and return almost nothing.</p><p>Block their crawlers entirely and your brand disappears from AI-powered search. Your marketing team just paid for ads on a platform your firewall won&#8217;t let it index.</p><h2>What Goes Wrong If You Just Unblock Everything</h2><p>The pressure will be to &#8220;just open it up&#8221; so the ads work. Here&#8217;s what that looks like in practice.</p><p>Your entire web estate &#8212; product specs, pricing pages, technical documentation, support articles, competitive positioning &#8212; becomes training data for a model that serves 800 million users. Your proprietary content shows up paraphrased in ChatGPT answers, attributed to no one. A competitor asks ChatGPT about your pricing strategy and gets a surprisingly detailed answer sourced from pages you never intended to be public in that context.</p><p>You didn&#8217;t get breached. You just left the front door open and labeled it &#8220;please crawl.&#8221;</p><p>That&#8217;s the risk your business stakeholders need to understand before anyone touches a firewall rule.</p><h2>What You&#8217;re Actually Dealing With</h2><p>The crawler landscape shifted hard in late 2025. Most enterprises haven&#8217;t caught up.</p><p><strong>OpenAI now operates three separate crawlers</strong>, and most security teams are treating them as one:</p><ul><li><p><strong>GPTBot</strong> &#8212; Collects data to train foundation models. Traffic grew 305% year-over-year. This is the one to be most cautious about.</p></li><li><p><strong>OAI-SearchBot</strong> &#8212; Powers search results and shopping features in ChatGPT. Block this and your content won&#8217;t surface when users search.</p></li><li><p><strong>ChatGPT-User</strong> &#8212; Handles user-initiated browsing, Custom GPTs, and GPT Actions. <strong>In December 2025, OpenAI quietly removed robots.txt compliance language for this crawler.</strong> It no longer promises to respect your no-crawl directives.</p></li></ul><p>Three crawlers. Three different purposes. Three different compliance behaviors. One binary WAF rule doesn&#8217;t cut it.</p><p><strong>And some AI companies aren&#8217;t even pretending.</strong> In August 2025, Cloudflare published a forensic report showing Perplexity AI deploying stealth crawlers &#8212; spoofed browser user agents, rotating IP addresses across different networks, ignoring robots.txt entirely. Millions of requests per day across tens of thousands of domains. Cloudflare delisted Perplexity as a verified bot. Perplexity called it a &#8220;publicity stunt.&#8221;</p><p>Think of robots.txt like a &#8220;No Soliciting&#8221; sign on your front door. You still need that door to open &#8212; for family, for friends, for the people you actually invited. Some salesmen see the sign and respect it. Others knock anyway. And a few put on a disguise and pretend to be your neighbor.</p><p>That&#8217;s what&#8217;s happening on the web right now. Your website has to be open to customers, partners, and legitimate search engines. AI crawlers know that &#8212; and not all of them care about the sign. Over 5.6 million websites now block GPTBot &#8212; up 70% in just a few months. The signs are going up everywhere. The knocking hasn&#8217;t stopped.</p><h2>A Framework for Handling This</h2><p>When security gets asked to unblock the WAF for ChatGPT's crawlers, use a phased approach.</p><h3>1. Solve the Legal Question First</h3><p>Before touching a single firewall rule, confirm your Master Service Agreement with OpenAI covers the advertising relationship. What does it say about data usage? Does the advertising agreement address how crawled content is handled?</p><p>If your legal team hasn&#8217;t reviewed the MSA in the context of crawler access, stop here. Don&#8217;t let marketing&#8217;s urgency bypass legal due diligence.</p><h3>2. Advertise Without Full Crawler Access (The Move Nobody&#8217;s Talking About)</h3><p>This is the key insight: <strong>you don&#8217;t have to unblock everything to run ads.</strong></p><p>Serve static, curated content on the specific URLs tied to the advertising campaign. Marketing gets their brand presence. The crawlers see only what you deliberately expose &#8212; not your full web estate.</p><p>OpenAI&#8217;s own guidance suggests that restricting crawler access can impact ad ranking and performance. They want full access. Of course they do. But &#8220;it might impact rankings&#8221; is not &#8220;it won&#8217;t work.&#8221; This buys time to build a proper strategy while the ads are live.</p><p>Most enterprises are treating this as a binary &#8212; block everything or allow everything. The static content approach is the middle path that lets you say yes to marketing without handing over the keys.</p><h3>3. Build Granular Crawler Policies</h3><p>Stop making one decision for three crawlers:</p><ul><li><p><strong>GPTBot (training):</strong> Block unless your organization has deliberately decided to contribute training data. For most enterprises, there&#8217;s no upside.</p></li><li><p><strong>OAI-SearchBot (search visibility):</strong> Consider allowing. If marketing is investing in ChatGPT ads, blocking this undermines their spend.</p></li><li><p><strong>ChatGPT-User (user browsing):</strong> Requires infrastructure-level controls &#8212; WAF rules, bot management, rate limiting. A robots.txt entry isn&#8217;t enough since this crawler no longer promises to respect it.</p></li></ul><h3>4. Classify Your Content by Risk</h3><p>Not all pages carry the same exposure. Audit your web properties:</p><ul><li><p><strong>Low risk, high visibility value:</strong> Marketing pages, blog content, product overviews. Candidates for crawler access.</p></li><li><p><strong>High risk, low visibility value:</strong> Internal docs, technical specs, pricing, competitive intelligence. Stay locked down.</p></li><li><p><strong>Gray zone:</strong> Product details, support docs, research content. Case-by-case with business stakeholders.</p></li></ul><p>Frame this for leadership as a data governance decision, not a security decision. &#8220;What intellectual property are we willing to expose for brand visibility?&#8221; That&#8217;s a question security informs &#8212; not one we answer alone.</p><h3>5. Enforce at the Infrastructure Level</h3><p>Don&#8217;t trust voluntary compliance. Invest in enforcement.</p><p>Here's something most security teams don't realize: <strong>if your site runs behind AWS WAF and you've enabled Bot Control with the AI category rule, AI crawlers are being blocked by default.</strong> AWS WAF's managed Bot Control rule group includes a <code>CategoryAI</code> rule that blocks all AI bots &#8212; and unlike every other bot category, it blocks both verified and unverified AI bots. Search engine bots, monitoring bots, SEO bots &#8212; those all get a pass if they're verified. AI bots don't. When that rule is enabled, AWS treats them all as hostile.</p><p>That&#8217;s likely how many enterprises ended up in the paradox in the first place. Security enabled Bot Control as a best practice. Marketing bought ChatGPT ads. Neither team connected the dots.</p><p>This is also why &#8220;just unblock AI bots&#8221; is the wrong response. Instead, use scope-down statements to selectively allow specific AI crawlers while keeping the default block in place for everything else. For example, you could create a scope-down statement that excludes OAI-SearchBot from the CategoryAI block &#8212; allowing ChatGPT search indexing for your ad campaign &#8212; while GPTBot and every other AI crawler stays blocked. That gives you the granularity from Step 3 at the infrastructure level.</p><p>Beyond AWS, similar enforcement options exist:</p><ul><li><p><strong>Cloudflare&#8217;s Robotcop</strong> enforces robots.txt at the network edge rather than relying on crawlers to obey</p></li><li><p><strong>Bot management platforms</strong> that fingerprint crawlers beyond user-agent strings &#8212; critical for catching stealth crawlers like the ones Perplexity deployed</p></li><li><p><strong>Active monitoring</strong> &#8212; stealth crawlers mean set-and-forget doesn&#8217;t work. Review your bot traffic logs regularly.</p></li></ul><h2>The One Thing to Do Monday Morning</h2><p>The old web bargain &#8212; crawlers index your content, send you traffic, you monetize that traffic &#8212; is broken. AI platforms now crawl your content, train on it, and sell ads against it. Security leaders need to be at the table for this conversation, not waiting for a ticket from marketing.</p><p>Your job isn&#8217;t to block every bot or allow every bot. It&#8217;s to make sure your organization has a <strong>deliberate strategy</strong> &#8212; not a default one.</p><p>Start with one question: <em>Does anyone at your company know which AI crawlers are currently hitting your web properties, and has anyone made an intentional decision about each one?</em></p><p>If the answer is no, you&#8217;ve found your next project.</p><div><hr></div><h3>Further Reading</h3><ul><li><p><a href="https://platform.openai.com/docs/bots">Overview of OpenAI Crawlers</a> &#8212; OpenAI&#8217;s official documentation on GPTBot, OAI-SearchBot, and ChatGPT-User</p></li><li><p><a href="https://blog.cloudflare.com/perplexity-is-using-stealth-undeclared-crawlers-to-evade-website-no-crawl-directives/">Perplexity Is Using Stealth, Undeclared Crawlers to Evade Website No-Crawl Directives</a> &#8212; Cloudflare&#8217;s forensic report on Perplexity&#8217;s crawling behavior</p></li><li><p><a href="https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/">From Googlebot to GPTBot: Who&#8217;s Crawling Your Site in 2025</a> &#8212; Cloudflare&#8217;s data on AI crawler traffic growth and crawl-to-refer ratios</p></li><li><p><a href="https://docs.aws.amazon.com/waf/latest/developerguide/aws-managed-rule-groups-bot.html">AWS WAF Bot Control Rule Group</a> &#8212; AWS documentation including the CategoryAI default block rule</p></li><li><p><a href="https://openai.com/index/our-approach-to-advertising-and-expanding-access/">Our Approach to Advertising and Expanding Access to ChatGPT</a> &#8212; OpenAI&#8217;s advertising announcement</p></li></ul><div><hr></div><p><em>I&#8217;m a Director of Security Engineering at a Fortune 500 company, navigating these decisions in real time. I write about securing AI adoption at enterprise scale &#8212; real lessons, not vendor marketing.</em></p><p><em>How is your organization navigating the AI crawler decision? Hit reply. I want to hear it.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.steptocyber.ai/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>