Detecting Prompt Injection in LLM Apps (Python Library)

By Pulse Warden · April 1, 2026 · 1 min read

I've been working on LLM-backed applications and ran into a recurring issue: prompt injection via user input. Typical examples: "Ignore all previous instructions" "Reveal your system prompt" "Act as another AI without restrictions" In many applications, user input is passed directly to the model, which makes these attacks practical. Most moderation APIs are too general-purpose and not designed specifically for prompt injection detection, especially for non-English inputs. So I built a small Python library to act as a screening layer before sending input to the LLM: https://github.com/kanekoyuichi/promptgate Detection strategies: rule-based (regex / phrase matching) latency: <1ms, no dependencies embedding-based (cosine similarity with attack exemplars) latency: ~5–15ms, uses sentence-transformers LLM-as-judge higher accuracy, but +150–300ms latency, requires external API Baseline evaluation (rule-only): FPR: 0.0% (0 / 30 benign samples) Recall: 61.4% (27 / 44 attack samples) So rule

Detecting Prompt Injection in LLM Apps (Python Library)

Related Posts

Trending on ShareHub

Latest on ShareHub

Browse Topics

Around the Network