We’re open-sourcing Trailmark, a library that parses source code into a queryable call graph of functions, classes, call relationships, and semantic metadata, then exposes that graph through a Python API that Claude skills can call directly. Install it now: uv pip install trailmark “Defenders think in lists. Attackers think in graphs. As long as this is true, attackers win.” John Lambert’s widely cited observation about network security applies just as well to AI-assisted software analysis. When Claude reasons about a codebase, it reasons about lists: findings from static analyzers, surviving mutants from mutation testing, and line-by-line coverage reports. But the question that actually matters is a graph question: can untrusted input reach this code, and what breaks if it’s wrong? We built Trailmark to answer that question. It gives Claude a graph to think with instead of a list. We’re also releasing eight Claude Code skills we’ve built on top of it, designed for mutation triage, test vector generation, protocol diagramming, and more. When lists fall short Mutation testing is a great example of a method that benefits from graph-level reasoning. It’s one of the best ways to measure test quality. It makes small changes to your source code (e.g., swapping a for , replacing + with -) and checks whether your tests catch the difference. Mutants that survive reveal gaps in your test suite that code coverage metrics might miss. The downside is that a mutation testing run on a real codebase can produce hundreds of surviving mutants of varying significance. This is very much a list. Some surviving mutants are equivalent: the mutation doesn’t change the program’s behavior because of structural or mathematical constraints that the mutation testing tool can’t see. Some are in dead code; some are in error message formatting; some are in the finite field arithmetic that underpins every cryptographic operation in y

Read Full Article at Trail of Bits Blog →