Voozh

Introduction

Over the last several months, I've been building Tri-Fort, an AI-powered construction cost estimation platform designed for the Kenyan construction industry.

At first, the goal seemed straightforward:

Gather historical construction data, train a machine learning model, and let AI predict project costs.

Like many founders building AI products today, I assumed the machine learning model would be the product.

I was wrong.

The deeper I went into the construction industry, the more I realized that the biggest challenge wasn't model selection, neural networks, or feature engineering.

The challenge was data.

And that realization fundamentally changed the architecture of Tri-Fort.

This article documents the engineering journey, the mistakes, the discoveries, and how we evolved from an ML-first architecture into a hybrid construction intelligence platform.

The Original Vision

The first version of Tri-Fort was designed around a traditional machine learning pipeline.

Users would enter:

Location
Project type
Built-up area
Number of floors
Finish level
Material preferences

The system would then:

Generate features
Feed them into a regression model
Return estimated construction costs

The architecture looked something like this:

User Input
 ↓
Feature Engineering
 ↓
ML Model
 ↓
Cost Prediction

Simple.

At least on paper.

The Data Problem Nobody Talks About

Most machine learning tutorials assume you already have clean data.

Construction doesn't work that way.

The data we had access to included:

Bills of Quantities (BoQs)
Work schedules
Cost books
Quantity Surveyor reports
Project specifications
Market research datasets
Historical pricing documents
Contractor estimates

At first glance, this looked like a goldmine.

In reality, it was chaos.

Files existed as:

PDFs
Scanned PDFs
Excel workbooks
OCR outputs
Cost schedules
Multiple revisions of the same project

The same project often existed in three or four versions.

For example:

Kiambu Mall BoQ
Kiambu Mall Revised BoQ
Kiambu Mall Perimeter Wall BoQ
Kiambu Mall 2nd Floor Provision BoQ

To a human, these are clearly related.

To a machine learning pipeline, they appear as entirely different projects.

The Audit

Rather than blindly train a model, we built a data discovery and audit pipeline.

The pipeline performed:

File inventory
Project grouping
Duplicate detection
OCR quality assessment
Cost recovery analysis
Dataset readiness scoring

What we found was surprising.

Out of dozens of documents and thousands of extracted rows:

Only 9 distinct projects were recoverable
Only 2 projects contained evidence of actual final costs
The remaining projects were estimates

This was a critical distinction.

Most datasets contained:

Estimated Cost

What we actually needed was:

Final Actual Cost

Those are not the same thing.

Training on estimates teaches a model to reproduce estimates.

It does not teach a model to predict reality.

The Moment We Paused Deployment

At one point, the platform appeared production-ready.

The APIs worked.

Authentication worked.

Reporting worked.

Infrastructure passed testing.

Even the ML pipeline passed synthetic validation.

But the data audit exposed an uncomfortable truth.

The model wasn't learning from reality.

It was learning from other estimates.

Shipping at that point would have created an illusion of intelligence.

So deployment was paused.

The machine learning model was no longer the priority.

The data became the priority.

Discovering a Better Approach

While auditing the data, we acquired an official Quantity Surveying cost handbook.

This changed everything.

Instead of treating the handbook as a PDF, we treated it as a structured knowledge source.

The handbook contained:

Regional construction rates
Cost benchmarks
Building classifications
Measurement standards
Cost adjustment factors
Material pricing references

Suddenly we had something more valuable than a small ML dataset.

We had domain expertise.

Turning a Handbook into a Knowledge Graph

The next challenge was engineering.

How do you transform a static handbook into software?

We built an extraction pipeline that converts handbook data into structured rules.

The system identifies:

Regions
Rate schedules
Building classes
Construction categories
Cost multipliers

These are stored in a machine-readable rule graph.

Conceptually:

Handbook PDF
 ↓
Extraction
 ↓
Rule Graph
 ↓
Cost Intelligence Engine

Instead of hardcoding numbers throughout the application, the cost engine can now reason from structured construction knowledge.

The Hybrid Architecture

The current architecture no longer relies exclusively on machine learning.

Instead it combines three intelligence sources.

1. Handbook Intelligence

Official QS benchmark rates.

2. Historical Project Intelligence

Recovered BoQs and project data.

3. User Feature Intelligence

Inputs collected through the estimator.

The architecture now looks like this:

User Inputs
 ↓
Feature Engine
 ↓
Handbook Intelligence
 ↓
Historical Cost Intelligence
 ↓
Cost Engine
 ↓
Explainable Estimate

This approach is dramatically more stable than pure ML.

Why Explainability Matters

Construction projects involve large sums of money.

Users don't trust black boxes.

If a system says:

KES 18,400,000

the next question is:

Why?

Modern AI systems often struggle with this.

Tri-Fort now generates reasoning traces.

For example:

Base rate: 54,000 KES/sqm
Location adjustment: Nairobi +20%
Luxury finish adjustment: +15%
Two-storey adjustment: +8%
Historical correction: -2%

Users see not only the estimate but the rationale.

That transparency creates trust.

Engineering the Infrastructure

Alongside the estimation engine, the platform required production-grade infrastructure.

The stack includes:

Backend

FastAPI
PostgreSQL
Domain-driven architecture
Background task processing

Frontend

Next.js
TypeScript
Responsive dashboard

Infrastructure

Docker Compose
Caddy
HTTPS automation
Environment-driven configuration

Everything is configured so a VPS deployment requires only:

git pull
docker compose up -d --build

No code changes.

No production-specific branches.

No manual edits.

Lessons Learned

If I could restart this project tomorrow, I'd follow three rules.

Rule 1

Never trust dataset size.

Audit it.

A thousand rows can represent five projects.

Rule 2

Domain knowledge beats machine learning when data is scarce.

A handbook written by experienced Quantity Surveyors can outperform a poorly trained model.

Rule 3

Users care about answers, not algorithms.

Nobody hires a construction estimator because it uses AI.

They hire it because the estimate is accurate.

Where Tri-Fort Goes Next

The long-term vision remains machine learning.

But now the roadmap is grounded in reality.

The next stage focuses on collecting:

Final accounts
Completion certificates
Contractor invoices
Variation orders
Actual project costs

As the dataset grows, machine learning can become increasingly important.

Eventually the platform will evolve into a true hybrid system:

Domain Knowledge
 +
Historical Projects
 +
Machine Learning
 +
Human Explainability

That's the future.

Not AI replacing expertise.

AI amplifying it.

Final Thoughts

The biggest lesson from building Tri-Fort is that successful AI products are rarely about the model.

They're about understanding the problem deeply enough to know when a model is not the answer.

For construction estimation, intelligence comes from a combination of:

Engineering
Quantity surveying
Historical data
Domain expertise
Software architecture

Machine learning is just one piece of that puzzle.

And sometimes the smartest engineering decision is knowing when not to rely on it.

URL: https://dev.to/wolfof420street/building-tri-fort-why-we-abandoned-pure-machine-learning-and-built-a-construction-intelligence-2f6n

⇱ Building Tri-Fort: Why We Abandoned Pure Machine Learning and Built a Construction Intelligence Engine Instead - DEV Community