Deep Dive: Semantic Duplicate Detection with AST Analysis - How AI Keeps Rewriting Your Logic
You've just asked your AI assistant to add email validation to your new signup form. It writes this: function validateEmail(email: string): boolean { return email.includes('@') && email.inc...

Source: DEV Community
You've just asked your AI assistant to add email validation to your new signup form. It writes this: function validateEmail(email: string): boolean { return email.includes('@') && email.includes('.'); } Simple enough. But here's the problem: this exact logic—checking for '@' and '.'—already exists in four other places in your codebase, just written differently: // In src/utils/validators.ts const isValidEmail = (e) => e.indexOf('@') !== -1 && e.indexOf('.') !== -1; // In src/api/auth.ts if (user.email.match(/@/) && user.email.match(/\./)) { /* ... */ } // In src/components/EmailForm.tsx const checkEmail = (val) => val.split('').includes('@') && val.split('').includes('.'); // In src/services/user-service.ts return email.search('@') >= 0 && email.search('.') >= 0; Your AI didn't see these patterns. Why? Because they look different syntactically, even though they're semantically identical. This is semantic duplication—and it's one of th