VOOZH about

URL: https://dev.to/wolfof420street/building-tri-fort-why-we-abandoned-pure-machine-learning-and-built-a-construction-intelligence-2f6n

⇱ Building Tri-Fort: Why We Abandoned Pure Machine Learning and Built a Construction Intelligence Engine Instead - DEV Community


Introduction

Over the last several months, I've been building Tri-Fort, an AI-powered construction cost estimation platform designed for the Kenyan construction industry.

At first, the goal seemed straightforward:

Gather historical construction data, train a machine learning model, and let AI predict project costs.

Like many founders building AI products today, I assumed the machine learning model would be the product.

I was wrong.

The deeper I went into the construction industry, the more I realized that the biggest challenge wasn't model selection, neural networks, or feature engineering.

The challenge was data.

And that realization fundamentally changed the architecture of Tri-Fort.

This article documents the engineering journey, the mistakes, the discoveries, and how we evolved from an ML-first architecture into a hybrid construction intelligence platform.


The Original Vision

The first version of Tri-Fort was designed around a traditional machine learning pipeline.

Users would enter:

  • Location
  • Project type
  • Built-up area
  • Number of floors
  • Finish level
  • Material preferences

The system would then:

  1. Generate features
  2. Feed them into a regression model
  3. Return estimated construction costs

The architecture looked something like this:

User Input
 ↓
Feature Engineering
 ↓
ML Model
 ↓
Cost Prediction

Simple.

At least on paper.


The Data Problem Nobody Talks About

Most machine learning tutorials assume you already have clean data.

Construction doesn't work that way.

The data we had access to included:

  • Bills of Quantities (BoQs)
  • Work schedules
  • Cost books
  • Quantity Surveyor reports
  • Project specifications
  • Market research datasets
  • Historical pricing documents
  • Contractor estimates

At first glance, this looked like a goldmine.

In reality, it was chaos.

Files existed as:

  • PDFs
  • Scanned PDFs
  • Excel workbooks
  • OCR outputs
  • Cost schedules
  • Multiple revisions of the same project

The same project often existed in three or four versions.

For example:

Kiambu Mall BoQ
Kiambu Mall Revised BoQ
Kiambu Mall Perimeter Wall BoQ
Kiambu Mall 2nd Floor Provision BoQ

To a human, these are clearly related.

To a machine learning pipeline, they appear as entirely different projects.


The Audit

Rather than blindly train a model, we built a data discovery and audit pipeline.

The pipeline performed:

  • File inventory
  • Project grouping
  • Duplicate detection
  • OCR quality assessment
  • Cost recovery analysis
  • Dataset readiness scoring

What we found was surprising.

Out of dozens of documents and thousands of extracted rows:

  • Only 9 distinct projects were recoverable
  • Only 2 projects contained evidence of actual final costs
  • The remaining projects were estimates

This was a critical distinction.

Most datasets contained:

Estimated Cost

What we actually needed was:

Final Actual Cost

Those are not the same thing.

Training on estimates teaches a model to reproduce estimates.

It does not teach a model to predict reality.


The Moment We Paused Deployment

At one point, the platform appeared production-ready.

The APIs worked.

Authentication worked.

Reporting worked.

Infrastructure passed testing.

Even the ML pipeline passed synthetic validation.

But the data audit exposed an uncomfortable truth.

The model wasn't learning from reality.

It was learning from other estimates.

Shipping at that point would have created an illusion of intelligence.

So deployment was paused.

The machine learning model was no longer the priority.

The data became the priority.


Discovering a Better Approach

While auditing the data, we acquired an official Quantity Surveying cost handbook.

This changed everything.

Instead of treating the handbook as a PDF, we treated it as a structured knowledge source.

The handbook contained:

  • Regional construction rates
  • Cost benchmarks
  • Building classifications
  • Measurement standards
  • Cost adjustment factors
  • Material pricing references

Suddenly we had something more valuable than a small ML dataset.

We had domain expertise.


Turning a Handbook into a Knowledge Graph

The next challenge was engineering.

How do you transform a static handbook into software?

We built an extraction pipeline that converts handbook data into structured rules.

The system identifies:

  • Regions
  • Rate schedules
  • Building classes
  • Construction categories
  • Cost multipliers

These are stored in a machine-readable rule graph.

Conceptually:

Handbook PDF
 ↓
Extraction
 ↓
Rule Graph
 ↓
Cost Intelligence Engine

Instead of hardcoding numbers throughout the application, the cost engine can now reason from structured construction knowledge.


The Hybrid Architecture

The current architecture no longer relies exclusively on machine learning.

Instead it combines three intelligence sources.

1. Handbook Intelligence

Official QS benchmark rates.

2. Historical Project Intelligence

Recovered BoQs and project data.

3. User Feature Intelligence

Inputs collected through the estimator.

The architecture now looks like this:

User Inputs
 ↓
Feature Engine
 ↓
Handbook Intelligence
 ↓
Historical Cost Intelligence
 ↓
Cost Engine
 ↓
Explainable Estimate

This approach is dramatically more stable than pure ML.


Why Explainability Matters

Construction projects involve large sums of money.

Users don't trust black boxes.

If a system says:

KES 18,400,000

the next question is:

Why?

Modern AI systems often struggle with this.

Tri-Fort now generates reasoning traces.

For example:

Base rate: 54,000 KES/sqm
Location adjustment: Nairobi +20%
Luxury finish adjustment: +15%
Two-storey adjustment: +8%
Historical correction: -2%

Users see not only the estimate but the rationale.

That transparency creates trust.


Engineering the Infrastructure

Alongside the estimation engine, the platform required production-grade infrastructure.

The stack includes:

Backend

  • FastAPI
  • PostgreSQL
  • Domain-driven architecture
  • Background task processing

Frontend

  • Next.js
  • TypeScript
  • Responsive dashboard

Infrastructure

  • Docker Compose
  • Caddy
  • HTTPS automation
  • Environment-driven configuration

Everything is configured so a VPS deployment requires only:

git pull
docker compose up -d --build

No code changes.

No production-specific branches.

No manual edits.


Lessons Learned

If I could restart this project tomorrow, I'd follow three rules.

Rule 1

Never trust dataset size.

Audit it.

A thousand rows can represent five projects.

Rule 2

Domain knowledge beats machine learning when data is scarce.

A handbook written by experienced Quantity Surveyors can outperform a poorly trained model.

Rule 3

Users care about answers, not algorithms.

Nobody hires a construction estimator because it uses AI.

They hire it because the estimate is accurate.


Where Tri-Fort Goes Next

The long-term vision remains machine learning.

But now the roadmap is grounded in reality.

The next stage focuses on collecting:

  • Final accounts
  • Completion certificates
  • Contractor invoices
  • Variation orders
  • Actual project costs

As the dataset grows, machine learning can become increasingly important.

Eventually the platform will evolve into a true hybrid system:

Domain Knowledge
 +
Historical Projects
 +
Machine Learning
 +
Human Explainability

That's the future.

Not AI replacing expertise.

AI amplifying it.


Final Thoughts

The biggest lesson from building Tri-Fort is that successful AI products are rarely about the model.

They're about understanding the problem deeply enough to know when a model is not the answer.

For construction estimation, intelligence comes from a combination of:

  • Engineering
  • Quantity surveying
  • Historical data
  • Domain expertise
  • Software architecture

Machine learning is just one piece of that puzzle.

And sometimes the smartest engineering decision is knowing when not to rely on it.