A well-crafted system immediate will enhance the standard of code produced by your coding assistant. It does make a distinction. In case you present tips in your system immediate for writing code and exams, coding assistants will observe the rules.
Though that is dependent upon your definition of “will observe.” In case your definition is “will observe usually,” then it’s correct. In case your definition is “will observe at all times” and even “will observe more often than not,” then it’s inaccurate (except you’ve discovered a strategy to make them dependable that I haven’t—please let me know).
Coding brokers will ignore directions within the system immediate regularly. Because the context window fills up and begins to intoxicate them, all bets are off.
Even with the most recent Opus 4.5 mannequin, I haven’t observed a serious enchancment. So if we will’t depend on fashions to observe system prompts, we have to put money into suggestions cycles.
I’ll present you ways I’m utilizing Claude Code hooks to implement computerized code evaluation on all AI-generated code in order that code high quality is larger earlier than it reaches the human within the loop.
| You will discover a code instance that demonstrates the ideas mentioned on this publish on my GitHub. |
Auto Code Evaluate for Quick, Semantic Suggestions
After I speak about auto code evaluation on this publish, I’m describing a quick suggestions mechanism supposed to evaluation frequent code high quality points. This might be run every time Claude has completed making edits, so it must be quick and environment friendly.
I additionally use coding assistants for detailed code critiques when reviewing a PR, for instance. That may spin up a number of subagents and take a bit longer. That’s not what I’m speaking about right here.
The aim of the auto code evaluation is to strengthen what’s in your system immediate, challenge documentation, and on-demand expertise. Issues that Claude might have ignored. A part of a multipronged strategy.
Wherever doable, I like to recommend utilizing your lint and take a look at guidelines to bake in high quality, and depart auto code evaluation for extra semantic points that instruments can’t verify.
If you wish to set a most size to your information or most stage of indentation, then use your lint device. If you wish to implement a minimal take a look at protection, use your take a look at framework.
Semantic Code Evaluate
A semantic code evaluation appears to be like at how properly the code is designed. For instance, naming: Does the code precisely describe the enterprise ideas it represents?
AI will usually default to names like “helper” and “utils.” However AI can also be good at understanding the nuance and discovering higher names in the event you problem it, and it could actually do that rapidly. So it is a good instance of a semantic rule.
You possibly can ban sure phrases like “helper” and “utils” with lint instruments. (I like to recommend doing that.) However that gained’t catch every part.
One other instance is logic leaking out of the area mannequin. When a use case/software service queries an entity after which comes to a decision, it’s extremely seemingly your area logic is leaking into the applying layer. Not really easy to catch with lint instruments, however value addressing.

One other instance is default fallback values. When Claude has an undefined worth the place a worth is predicted, it should set a default worth. It appears to hate throwing exceptions or difficult the sort signature and asking, “Ought to we permit undefined right here?” It desires to make the code run it doesn’t matter what and irrespective of how a lot the system immediate tells it to not.

You possibly can catch a few of this with lint guidelines but it surely’s very nuanced and is dependent upon the context. Typically falling again to a default worth is appropriate.
Constructing an Auto Code Evaluate with Claude Hooks
In case you’re utilizing Claude Code and need to construct an auto code evaluation for checks which you could’t simply outline with lint or testing instruments, then an answer is to configure a script that runs on the Cease hook.
The Cease hook is when Claude has completed working and passes management again to the person to decide. So right here, you may set off a subagent to carry out the evaluation on the modified information.
To set off the subagent it’s worthwhile to return the error standing code which blocks the primary agent and forces them to learn the output.

I feel it’s usually thought-about a greatest follow to make use of a subagent targeted on the evaluation with a really crucial mindset. Asking the primary agent to mark its personal homework is clearly not an excellent strategy, and it’ll burn up your context window.
| The answer I exploit is out there on GitHub. You possibly can set up it as a plug-in in your repo and customise the code evaluation directions, or simply use it as inspiration to your personal resolution. Any suggestions is welcome. |
Within the instance above you may see it took 52 seconds. Most likely faster than me reviewing and offering the suggestions myself. However that’s not at all times the case. Typically it could actually take a couple of minutes.
In case you’re sitting there blocked ready for evaluation, this may be slower than doing it your self. However in the event you’re not blocked and are engaged on one thing else (or watching TV), this protects you time as a result of the tip consequence might be larger high quality and require much less of your time to evaluation and repair.
Scanning for Up to date Information
I would like my auto code evaluation to solely evaluation information which were modified for the reason that final pull request. However Claude doesn’t present this data within the context to the Cease hook.
I can discover all information modified or unstaged utilizing Git, however that’s not ok.
What I do as an alternative is to hook into PostToolUse by preserving a log of every modified file.

When the Cease hook is triggered, the evaluation will discover the information modified for the reason that final evaluation and ask the subagent to evaluation solely these. If there aren’t any modified information, the code evaluation just isn’t activated.
Challenges with the Cease Hook
Sadly the Cease hook just isn’t 100% dependable for this use case for just a few causes. Firstly, Claude would possibly cease to ask a query, e.g. so that you can make clear some necessities. You won’t need the auto evaluation to set off right here till you’ve answered Claude and it has completed.
The second cause is that Claude can commit adjustments earlier than the Cease hook. So by the point the subagent performs the evaluation, the adjustments are already dedicated to Git.
Which may not be an issue, and there are easy methods to resolve it whether it is. It’s simply additional issues to remember and setup.
The best resolution could be for Anthropic (or different device distributors) to supply us hooks which are larger stage in abstraction—extra aligned with the software program growth workflow and never simply low-level file modification operations.
What I might actually love is a CodeReadyForReview hook which gives all of the information that Claude has modified. Then we will throw away our customized options.
Let Me Know If You Have a Higher Strategy
I don’t know if I’m not trying in the suitable locations or if the knowledge isn’t on the market, however I really feel like this resolution is fixing an issue that ought to already be solved.
I’d be actually grateful in the event you can share any recommendation that helps to bake in code high quality earlier than the human within the loop has to evaluation it.
Till then I’ll proceed to make use of this auto code evaluation resolution. While you’re giving AI some autonomy to implement duties and reviewing what it produces, it is a helpful sample that may prevent time and cut back frustration from having to repeat the identical suggestions to AI.

