Detecting Prompt Injection in LLM Apps (Python Library)
I've been working on LLM-backed applications and ran into a recurring issue: prompt injection via user input. Typical examples: "Ignore all previous instructions" "Reveal your system prompt" "Act a...

Source: DEV Community
I've been working on LLM-backed applications and ran into a recurring issue: prompt injection via user input. Typical examples: "Ignore all previous instructions" "Reveal your system prompt" "Act as another AI without restrictions" In many applications, user input is passed directly to the model, which makes these attacks practical. Most moderation APIs are too general-purpose and not designed specifically for prompt injection detection, especially for non-English inputs. So I built a small Python library to act as a screening layer before sending input to the LLM: https://github.com/kanekoyuichi/promptgate Detection strategies: rule-based (regex / phrase matching) latency: <1ms, no dependencies embedding-based (cosine similarity with attack exemplars) latency: ~5–15ms, uses sentence-transformers LLM-as-judge higher accuracy, but +150–300ms latency, requires external API Baseline evaluation (rule-only): FPR: 0.0% (0 / 30 benign samples) Recall: 61.4% (27 / 44 attack samples) So rule