What “reverse-engineer” actually means here
Before answering the headline, it’s worth being precise about what an attacker actually wants. There are three different goals that get bundled under “reverse-engineer”:
- Lexical recovery — rename
_0x4f1a back to plausible English names like calculateHash, userToken, parseSettings.
- Semantic recovery — understand what the code does well enough to reimplement it in a different codebase.
- Tampering — modify the running code, e.g. to bypass a paywall check or grant a privileged feature.
LLMs are good at goal 1, weaker at goal 2, and mostly orthogonal to goal 3 (tampering happens at runtime, not in the source). Most of the public anxiety about “AI deobfuscation” is about goal 2, which is the hardest.
What today’s LLMs are good at
The 2026 generation of code assistants — ChatGPT’s GPT-class models, Claude, Copilot, and the open-weight models that follow the same architecture — are very good at three specific tasks that intersect with reverse engineering:
- Variable renaming from context. Given a function that does
x = a * 37 + b.charCodeAt(0) and returns a hex string, an LLM will confidently rename it to calculateLicenseHash or similar. This works because the operations in the function are unambiguous semantic anchors.
- Library pattern recognition. Even with renamed identifiers, an LLM will recognize jQuery’s
$.ajax shape, React’s reconciler internals, or Lodash utility patterns. It has seen enough of each in training to identify them by structure alone.
- Static-obfuscator pattern matching. The output of the open-source
javascript-obfuscator npm package, of basic minifiers, of UglifyJS — all of this is in the training data. Models have seen thousands of examples of the same transforms applied to different code, which means they’ve learned the inverse.
A skilled developer working with Claude or GPT-class models can take a basic minified or simply-obfuscated file and produce a readable approximation in minutes. This is genuinely a step change from the manual deobfuscation tools of five years ago.
Where they break down
The places where AI assistants stop being useful are structural, not stylistic. Three transforms specifically defeat the way LLMs reason about code:
1. Per-build polymorphic decoders
If your obfuscator emits the same decoder function in every build, an LLM only needs to see that decoder once. Once it understands the inverse, it can apply it to every release you ship for the next decade. This is exactly what happens with the output of the popular open-source obfuscators — their decoder is fixed, deterministic, and well-represented in training data.
A polymorphic decoder is the inverse: every build emits a decoder with a different shape. Different identifier prefix. Different key derivation routine. Different constant-pool encoding. Different state variable names in the dispatcher. The LLM that “solved” yesterday’s release sees a structurally different program today and cannot apply the same approach. It has to start over each time, against a target whose shape it has never seen.
This is the load-bearing claim behind anti-LLM obfuscation. Without per-build randomization, every other transform eventually loses to scale: an LLM that sees 1,000 builds of the same shape learns the shape. With per-build randomization, the LLM that sees 1,000 builds learns 1,000 unique shapes that each appear once.
2. Encrypted constant pools
Strings, numbers, and regex literals are normally the LLM’s anchor points. A function that contains the literal "Bearer " and the literal "https://api.example.com/v2/auth" is doing OAuth, full stop — the LLM doesn’t need to read the rest of the function.
An encrypted constant pool removes those anchors. The strings exist in the file, but only as bytes that look like noise. They’re reconstructed at runtime by the decoder. To recover them, the LLM has to either:
- Run the decoder — which it can’t, because it’s reading text, not executing code.
- Symbolically simulate the decoder — which it can sometimes do for simple decoders, but not when the decoder uses runtime state (function table indices, build-time keys, hash chains).
Without those anchors, the LLM is left reasoning about a function that takes opaque inputs and produces opaque outputs. It can guess the shape, but it can’t verify the guess.
3. Flat-transformed control flow
Normal control flow has structural cues. if (x > 0) tells the LLM that x is a number being compared. for (let i = 0; i < arr.length; i++) tells the LLM this is an array iteration. The LLM uses these patterns to build a model of what the code is doing.
Flat-transformed control flow eliminates those cues. if/else becomes a switch statement that dispatches on a state variable. for/while becomes a state machine. Function calls become indirect through a dispatcher table. What was 30 lines of structured logic becomes 200 cases of a switch statement, each case advancing the state variable to the next case.
To understand a flat-transformed function, the LLM has to reason about state transitions across hundreds of cases without an execution trace. This is the kind of analysis humans need a debugger for. Models can do it for trivial cases, but their accuracy drops sharply as the case count grows, and they cannot verify their reconstruction without running the code.
4. Context window limits
A real obfuscated bundle for a production application is somewhere between 500 KB and 5 MB. Every consumer-facing LLM in 2026 has a context window measured in tens of thousands of tokens, which corresponds to roughly 20–200 KB of code. A 2 MB bundle simply doesn’t fit.
What happens in practice: developers paste the first portion of the bundle, get a partial analysis, paste more, get a different partial analysis, and the LLM cannot hold the full call graph in memory long enough to reason about it. It produces summaries that sound right but contain hallucinated structure where the real code wasn’t in the window.
A walkthrough of what each transform removes from the LLM’s reasoning surface
Consider a tiny but realistic function: a license-key hash routine.
Original source
function calculateLicenseHash(userId, productKey) {
const seed = userId * 37 + productKey.charCodeAt(0);
const padded = seed.toString(16).padStart(8, '0');
return 'JSO-' + padded.toUpperCase();
}
The original is trivially readable. Variable names, the arithmetic, and the literal 'JSO-' all tell an LLM exactly what this function is for.
After basic obfuscation (renaming + string array, no per-build randomization)
var _s=['JSO-'];
function _0x1a4(_a,_b){
var _c=_a*37+_b.charCodeAt(0);
var _d=_c.toString(16).padStart(8,'0');
return _s[0]+_d.toUpperCase();
}
Lexical recovery is easy here for any 2026 LLM. The string 'JSO-' is right there in the array, the arithmetic shape is preserved, and a model that has seen the open-source javascript-obfuscator output thousands of times will rename _0x1a4 to calculateLicenseHash with reasonable confidence.
After Maximum-mode protection (per-build decoder, encrypted strings, flat-transformed control flow)
// Decoder shape and keys regenerated for this build only.
// Identifier prefix “_0x9c2f1a” differs in the next release.
(function(){var _0x9c2f1a=_dec(0x4a),_0x9c2f1b=_dec(0x4b);
var _s=function(i,k){/* per-build decoder body, ~80 lines */};
function _0x9c2f1c(_p0,_p1){
var _st=0,_acc=0;
while(_st!==-1){switch(_st){
case 0:_acc=_p0*0x25;_st=1;break;
case 1:_acc+=_p1.charCodeAt(_dec(0x4c));_st=2;break;
case 2:_acc=_acc.toString(0x10).padStart(0x8,_dec(0x4d));_st=3;break;
case 3:return _s(_dec(0x4e),_0x9c2f1a)+_acc.toUpperCase();
}}}})();
Same function, three structural changes the LLM has to defeat:
- The literal
'JSO-' is gone — recovering it requires running _s(_dec(0x4e), _0x9c2f1a), which depends on the build-time key.
- The arithmetic constants
37, 16, 8 are now 0x25, 0x10, 0x8 — readable, but the second argument _dec(0x4c) is opaque, so even the LLM that decodes the hex still can’t tell what charCodeAt is being called with.
- The control flow is a state machine.
case 0 → 1 → 2 → 3 happens to be linear here for clarity, but in real Maximum-mode output the cases are reordered with dead transitions and decoy states. The LLM has to follow the state variable to know what runs next.
And critically: this exact set of transforms appears in this build only. Tomorrow’s release of the same source will produce a structurally different file.
The honest limits
Obfuscation does not prevent every form of attack. Things it explicitly does not stop:
- Live execution and observation. The code has to run in a real browser or runtime. An attacker can always start a debugger, set breakpoints in the decoder, and dump the constant pool at runtime. Obfuscation slows this down. It does not prevent it.
- Determined human + sandbox. Given enough time, a skilled attacker with a debugger can reverse most obfuscation. The question is whether the value of the protected code exceeds the cost of that effort. For most production code, it does.
- LLMs paired with execution sandboxes. An LLM that can send code to a sandbox and observe the output is meaningfully more capable than an LLM working from text alone. This is a category of attack that is becoming more accessible. Per-build randomization still helps because each new build forces the sandboxed LLM to redo its analysis from scratch — but it doesn’t make the attack impossible.
The mental model that holds up: Obfuscation is a multiplier on the attacker’s time and cost. For most threat models, that multiplier is enough to be uneconomical for the attacker. For threat models where it isn’t, you pair obfuscation with controls that attack the parts obfuscation doesn’t cover — runtime monitoring, server-side authority, distribution restrictions.
How to think about your threat model
Three buckets cover most production code:
- Anti-casual-copying. A competitor scrapes your script and runs it on their site. Standard obfuscation handles this — they get a working file but can’t maintain or extend it, and the cost of doing so exceeds the cost of writing their own.
- Anti-IP-theft. Someone wants to understand your algorithm well enough to reimplement it. Maximum-mode protection is the right answer: per-build randomization, encrypted constants, flat control flow. The attacker has to invest meaningful manual effort, and the effort restarts on every release.
- Anti-active-attacker. Someone wants to tamper with your running code — bypass a paywall, grant a feature, manipulate a game. Maximum-mode protection plus a runtime monitoring suite plus server-side authority on anything that matters. Obfuscation alone is not the right answer at this tier; it’s a layer.
Practical guidance
- Use Maximum mode for anything that took more than a person-month to build. The cost of an attacker reverse engineering it scales with the value to them, and Maximum mode is meaningfully harder to defeat than Standard.
- Don’t put secrets in client code. No API keys, no signing keys, no validation logic that grants access. Obfuscation hides; it does not authorize. Anything an attacker can recover with execution and observation, they will recover.
- Pair with monitoring when active attackers are part of the threat model. Obfuscation defeats analysis. It does not defeat tampering at runtime. Runtime monitoring is the layer that catches tampering in production.
- Treat obfuscation as a release step, not an afterthought. The desktop app generates a deterministic command line from a project file. The npm CLI fits the same pattern in build scripts. Both let you check the protection step into version control alongside the rest of your release pipeline — which is what makes the build reproducible and auditable.
A note on methodology
The mechanism arguments above are model-agnostic. They’re about what kinds of reasoning structural transforms remove from the code, not about specific accuracy numbers from specific models. We deliberately avoid claims like “ChatGPT recovers Maximum-mode output 38% of the time” because:
- Those numbers depend on context window, prompt strategy, and the source code being protected. They are not stable across builds, models, or even prompt variations.
- They imply a level of empirical rigor that consumer LLM testing does not currently support — the same prompt, on the same model, on the same week, can produce meaningfully different outputs.
- The question that matters for a buyer isn’t the percentage today; it’s whether the obfuscator’s defense holds structurally as models get smarter. Per-build randomization, encrypted constants, and flat-transformed control flow attack the categories LLMs are bad at, not specific weights of specific models.
Try Maximum mode in the playground
Paste a function, choose the Maximum preset, and see what the LLM-resistant output actually looks like. The playground emits the same protection used in the desktop app and API.
Open the playground