Healthcare
Python · PyTorch · scikit-learn · peer-reviewed validation
Predicting Disease Onset From Behavioural Data
Problem
A clinical research group needed to flag children at risk of Autism Spectrum Disorder months earlier than standard paediatric screening allowed. Routine screening relies on the Q-Chat-10 questionnaire interpreted by trained clinicians, but specialist capacity was limited and assessment waiting lists ran 3 to 6 months. Many at-risk children passed the early-intervention window before they reached a diagnosis.
Approach
We built a deep-learning classifier that ingests Q-Chat-10 responses and outputs a calibrated probability of ASD diagnosis, benchmarked against the same expert-clinician ground truth used in routine practice. Multiple architectures (an MLP, gradient-boosted trees, an attention-based model) were compared on held-out validation cohorts; the winner was packaged behind a clinician-facing tool with a confidence interval on every prediction and a one-click "I disagree" feedback path that flowed back into the training pipeline.
Outcome
Earlier triage of at-risk children, validated against expert diagnoses and published in a peer-reviewed venue (DOI: 10.61643/c478960). The model is not a substitute for clinical diagnosis; it is a triage tool that gets the right children in front of a specialist sooner.
Manufacturing
Time-series fusion · self-supervised pre-training · CMMS integration
Predictive Maintenance on Legacy Equipment
Problem
A mid-size manufacturer was losing six-figure sums per quarter to unplanned downtime on production lines instrumented with patchwork sensor coverage that had grown over the years. Different machines reported to different historians, the data formats varied by vendor, and the maintenance team operated reactively: a fault occurred, then a technician was paged. Each unplanned stop disrupted upstream and downstream work centres.
Approach
We fused vibration, temperature, motor-current, and acoustic streams from heterogeneous sensors into a unified time-series store, then trained per-asset failure-prediction models on the resulting feature set. Where labels were sparse (true faults are rare events) we used self-supervised pre-training on healthy-operation data and fine-tuned on the small set of labelled failure windows the historian had captured. Predictions surfaced in a Slack channel and inside the maintenance team's existing CMMS, hours before the failure event.
Outcome
Multiple hours of warning before in-progress faults, long enough to fold maintenance into a planned changeover instead of paging a technician at 2 a.m. The team transitioned from reactive to predictive maintenance across the instrumented asset set; the next phase extends coverage to assets still on patchwork sensing.
Finance
Gradient-boosted trees · Kafka · SHAP explanations · sub-100ms p99
Real-Time Fraud Detection at Transaction Scale
Problem
A payments operator processing high-volume card-not-present transactions needed sub-100ms risk decisions on every transaction, plus a per-decision explanation that would satisfy a regulator and a chargeback dispute. Their existing rules engine had become a brittle accretion of patches, and false positives were eating into legitimate revenue.
Approach
We deployed a gradient-boosted tree model behind a low-latency streaming pipeline (Kafka into a Flink-flavoured inference service, sub-50ms p99 at peak load). Every decision is paired with a SHAP-based explanation showing the top contributing features, retained in an immutable audit log for the regulatory review trail. A kill-switch in the operator dashboard reverts to the prior rules engine in seconds if model behaviour ever looks wrong. Drift monitoring on input features and on decision distributions runs continuously.
Outcome
Sub-100ms decision latency at peak load with explainability per decision and a documented rollback path. The compliance team gets the audit log they need without operators writing one-off queries against production logs.
Defence
Multi-source NLP · semantic clustering · analyst-feedback learning
Open-Source Threat Intelligence at Scale
Problem
A defence customer was drowning in open-source signal: news wires, social posts, technical reports, leaked dump sites, far beyond what an analyst team could read in a day. The decisions that mattered were buried in a torrent of routine noise, and analysts were spending the day on triage instead of analysis.
Approach
We built a multi-source NLP pipeline that ingests open-source feeds, deduplicates near-identical content, classifies by topic and threat type, and clusters semantically related items. A relevance ranking model trained on analyst feedback surfaces the handful of items that fit the customer's standing intelligence requirements; everything else flows into a searchable archive. Every surfaced item links back to its source document with the path the ranking decision took, so analysts can audit and correct the model.
Outcome
Analyst hours redirected from triage to analysis. The pipeline now ships routinely-curated daily briefings while the analyst team refines the standing intelligence requirements and the ranking model retrains on their feedback.