The Test: Same Drawings, Human vs AI

We ran a head-to-head comparison on a real structural drawing set — the Hutchings package, a 7-page US structural set using AISC designations. Billy, an experienced steel estimator, performed a manual takeoff. Steelflo processed the same PDF through its multi-stage extraction pipeline.

No cherry-picking. No pre-processing the drawings to make them easier for the AI. Same input, measured by the same criteria: how many pieces found, how many member types identified, and what got missed.

The Results

| Metric | Billy (Manual) | Steelflo (AI) | |--------|---------------|---------------| | Total pieces found | 41 | 53 | | Member types identified | 17 | 18 | | Missed members | 1 (W10x12) | 0 | | False positives | 0 | 0 | | Time required | Hours of focused work | Minutes | | Audit trail | Spreadsheet (no page links) | Every detection linked to source page with bounding box |

The numbers tell a clear story, but they need context.

Why Steelflo Found More: 53 vs 41

Steelflo's 53 detections versus Billy's 41 pieces doesn't mean the AI hallucinated 12 extra steel members. The difference comes from how AI and humans handle members that appear in multiple drawing views.

A single W12x26 beam might appear in:

The floor framing plan (as a line with a label)
A section cut on another page (as a cross-section)
A connection detail (showing the end condition)

Billy, as an experienced estimator, recognized these as the same member and counted it once. Steelflo detected every occurrence — each one correctly identified as a real steel member label on the drawing, linked to the specific page and location where it appears.

The extra detections aren't errors. They're the same members appearing in plan views and detail views. This is actually useful during verification because it gives you multiple visual confirmations of each member, but the raw detection count needs deduplication to get to a true piece count.

The Member Billy Missed

The more significant finding: Steelflo identified 18 member types to Billy's 17. The extra type was a W10x12 that Billy didn't catch during his manual pass.

This happens. A W10x12 is a small member — 10-inch nominal depth, 12 pounds per foot. In a drawing full of larger W14s and W16s, it's easy to overlook, especially if it appears in only one location or is partially obscured by dimensions and notes.

This is exactly the kind of miss that costs money on a bid. One W10x12 isn't going to break the budget, but the same type of oversight on a larger member — or across multiple common mistakes in a bigger drawing set — adds up fast.

What Billy Does Better

The comparison isn't all in Steelflo's favor. Billy brings capabilities that AI doesn't have:

Design intent. When Billy sees a note that says "W12x26 OR EQUAL," he knows the engineer is open to substitutions. He might price a W12x22 as an alternative. Steelflo reads the label literally and records W12x26.

Drawing errors. If the plan says W12x26 but the beam schedule says W14x26, Billy catches the discrepancy and flags it for an RFI before bid day. Steelflo reports both labels as found, without cross-checking for consistency.

Contextual judgment. Billy knows that the transfer beam on grid line C is the most critical member in the package and prices it accordingly. He knows that the cantilever condition on the roof means that beam needs extra attention during fabrication. Steelflo treats every detected member the same.

Unusual conditions. Sloped members, coped connections, members that need to be field-welded in specific sequences — Billy accounts for these in his estimate. Current AI extraction captures the profile designation but not the fabrication complexity.

The Practical Comparison

| Capability | Manual (Billy) | Steelflo | Combined | |-----------|----------------|----------|----------| | Initial detection speed | Hours | Minutes | Minutes | | Detection completeness | 41/42 (missed 1) | 42/42 (all found) | 42/42 | | Deduplication | Automatic (human judgment) | Requires verification step | Human reviews AI detections | | Design intent interpretation | Yes | No | Yes (human layer) | | Drawing error detection | Yes | No | Yes (human layer) | | Source traceability | Weak (spreadsheet) | Strong (bounding box + page link) | Strong | | Confidence flagging | N/A | Low-confidence items flagged | Estimator focuses on flagged items | | Consistency across projects | Varies with fatigue/pressure | Consistent | Consistent | | Multi-standard recognition | Depends on estimator experience | Auto-detects AISC, BS/IS, AS/NZS, EN | Automatic |

Where Scale Changes the Equation

The Hutchings test was a 7-page package — small enough that a manual takeoff is manageable. The advantage of AI extraction compounds dramatically as project size increases.

Consider the results Steelflo has produced on larger projects:

| Project | Standard | Detections | |---------|----------|------------| | Hutchings (7-page US commercial) | AISC | 53 | | Indian convention center | BS/IS | 1,047 | | Australian commercial building | AS/NZS | 237 |

An estimator manually counting 1,047 BS/IS labels across a convention center package is looking at multiple days of work — with fatigue-induced errors becoming increasingly likely as the count passes into the hundreds. The AI runs the same extraction without performance degradation regardless of set size.

The Best Workflow Uses Both

The takeaway from the Hutchings test isn't "replace your estimator with AI." It's "use AI for what it's good at and humans for what they're good at."

The most effective workflow looks like this:

Steelflo runs the initial detection — captures all member labels with source page links and confidence scores
Estimator reviews the output — deduplicates members appearing in multiple views, verifies low-confidence detections, dismisses any false matches
Estimator adds judgment — design intent, unusual conditions, fabrication complexity, potential substitutions
Export for pricing — CSV for the estimating spreadsheet, order sheet with nesting optimization (Steelflo's first-fit decreasing bin packing with 1/4" kerf), highlighted PDF for documentation. See how Steelflo works for the full technical breakdown.

This combined approach captured all 42 members on the Hutchings test (including the W10x12 Billy missed), with full traceability back to drawing locations, in a fraction of the time the manual-only approach required.

What This Means for Your Estimating Team

If you have experienced estimators like Billy, AI takeoff tools don't threaten their jobs — they make their time more valuable. Instead of spending hours on the mechanical task of finding and counting members, they spend that time on the judgment calls that actually affect bid accuracy and profitability.

The firms that get this right will bid more projects in the same hours, catch more of the small members that get missed under deadline pressure, and produce takeoffs with audit trails that hold up when someone questions the numbers.

The Hutchings test gave us real data: 41 pieces found manually versus 53 detections by AI, with the AI catching a member the human missed. Not a theoretical improvement — a measured one, on a real drawing set, compared against a real estimator's work. Ready to try it? Start free at steelfloai.com or view pricing.

Steelflo vs Manual Steel Estimating: A Real-World Comparison