How Does Automated Steel Quantity Extraction Work?

Automated steel quantity extraction reads PDF construction drawings and identifies every structural steel member label — section sizes like W12X26, HSS6X4X1/4, or C10X15.3 — along with their exact location on each page. The best implementations use a multi-stage pipeline because no single extraction method works reliably across all drawing types. Here is how the process actually works, using Steelflo's pipeline as a concrete example.

Stage 1: Text Extraction

The first stage pulls all readable text from every page of the PDF using a library like pymupdf. This is not OCR — it reads the text layer embedded in the PDF file itself, which is faster and more accurate than optical character recognition. For each text element, the extractor captures the string content and its bounding box coordinates (exact position on the page in PDF points).

Most modern structural drawings produced by Revit, AutoCAD, or Tekla have a clean text layer. On these pages, text extraction alone captures 80-90% of steel labels with high accuracy. The coordinates are precise because they come directly from the PDF structure, not from image processing.

Stage 2: Steel-Specific Pattern Matching

Raw text extraction returns everything on the page — title blocks, grid labels, notes, dimensions, revision clouds, door schedules. The system needs to separate steel member designations from all that noise.

This is where steel-specific regex pattern matching comes in. The pipeline applies a library of patterns tuned to recognize structural steel nomenclature: W-shapes (W12X26), wide-flange columns, HSS sections, pipes, channels, angles, plates, and specialty items like lintels. Each match is classified by category (beam, column, HSS, channel, angle, plate) and tagged with a confidence score.

Steelflo takes this further with multi-standard support. The pipeline first scans all pages to detect which naming standard the drawings use — AISC (US), BS/IS (UK/India/Middle East), AS/NZS (Australia/New Zealand), or EN (Europe). It then applies the correct pattern library: the full AISC set for American drawings, 10 dedicated regex patterns for BS/IS notation like UC305x305x158 or SHS220x220x6.0, and 11 patterns for AS/NZS notation like 310UB40.4 or 150PFC.

Stage 3: Vision AI Fallback

Some pages defeat text extraction entirely. CAD drawings with vector fonts, hand-lettered annotations, or embedded raster images have no usable text layer — pymupdf returns little or nothing. These pages are common in older drawing sets and certain CAD workflows.

For these pages, the pipeline renders each one as a high-resolution PNG image and sends it to a vision AI model (Steelflo uses Gemini Vision AI). The model analyzes the image, identifies steel callouts visually, and returns bounding box coordinates for each detection. This is slower and slightly less precise than text extraction, but it catches labels that would otherwise be completely missed.

The pipeline flags pages that need vision processing automatically: any page with zero text-based steel labels and a high path count (indicating vector graphics) gets routed to the vision fallback.

What the Numbers Look Like

On a 7-page US commercial structural package, Steelflo's pipeline found 53 individual steel label occurrences across 18 distinct section types. A human estimator counted 17 types on the same drawings — the pipeline found all 17 plus one the estimator missed (a W10X12 tucked into a detail view).

On larger international projects, the results scale: 1,047 labels from a convention center drawing set using BS/IS standards (India), and 237 labels from Australian commercial drawings using AS/NZS notation.

From Detections to a Bill of Materials

Each detection is an individual bounding box — one label on one page. The same member type might appear 12 times across plan views, sections, and details. The system aggregates these into a bill of materials: group by member type, sum the count, and look up unit weights from a validated database (Steelflo references 550+ AISC section profiles).

The critical step between detection and final BOM is human verification. Every detection is linked to its source page with a visual overlay on the actual PDF. The estimator reviews each one, confirms legitimate detections, and rejects false positives (a grid label that looked like a steel callout, or the same member shown in both plan and elevation). Low-confidence extractions are flagged automatically so the estimator knows where to focus attention.

Why Multi-Stage Matters

A system that only uses text extraction misses vector-font pages entirely. A system that only uses vision AI is slower, more expensive, and less precise on pages where text extraction works perfectly. The multi-stage approach — text first, pattern matching second, vision fallback third — balances speed, accuracy, and coverage across the full range of drawing quality that fabricators encounter in practice. For the accuracy data behind this pipeline, see how accurate is AI for steel takeoffs. To see it in action, try SteelFlo on your own drawings.