Running Modern ML Models on Edge Devices

By:Metagne Charnelle

Picture of setup

Little milestone #2.

You don’t really understand your model until you try running it on constrained hardware.

In this post, I explore how two modern object detection models, YOLOv9 and RF-DETR perform on RaspBerry Pi 4 and RaspBerry Pi 5, using a small real-world test.

What is a RaspBerry Pi?

It is a series of small affordable, single board computers. It is a low-cost, credit-card-sized, single-board computer originally designed by the RaspBerry Pi Foundation in the UK to teach programming.

It is now used in:

Embedded systems
IoT devices
Edge AI applications

But the real deal is:

Can we run large computations on limited hardware?

In today’s post, we will see results from running modern ML models, specifically YOLOv9 and RF-DETR on the RaspBerry pi 4 and 5 on 6 test images.

The aim of this comparison is to find out if RaspBerry is capable of running a heavy model in real time.

We evaluate this using:

Inference time
Frames per second (FPS)
Resource Usage

Benchmark on Raspberry Pi 4 and Raspberry Pi 5

We conducted benchmark tests on both the Raspberry Pi 4 and Raspberry Pi 5 to evaluate inference performance and FPS. We tested both models on either Pi and got some interesting results.

Here’s a comparison of model performance between the two Raspberry Pi.

SET UP

Pi 4

Pi 5

RASPBERRY Pi 4

Models	Average Inference time/ms	Average Frames per Second (FPS)	CPU Usage	AVERAGE MEMORY USAGE/MB
YOLOv9	12939.95	0.08	12.64%	906.2
RFDETR NANO	2240.47	0.33	365.5%	387.4
RFDETR SMALL	3092.49	0.24	361.9%	205.7
RFDETR MEDIUM	4116.13	0.18	370.3%	168.5
RFDETR LARGE	5512.57	0.14	371.4%	200.6

RASPBERRY Pi 5

Models	Average Inference time/ms	Average Frames per Second (FPS)	CPU Usage	AVERAGE MEMORY USAGE/MB
YOLOv9	3685.10	0.27	11.71%	845.55
RFDETR NANO	696.68	1.08	371.0%	374.4
RFDETR SMALL	915.73	0.82	380.4%	171.5
RFDETR MEDIUM	1230.65	0.61	383.4%	163.5
RFDETR LARGE	1707.29	0.44	384.9%	118.6

Below, we will see a few test images across the model to appreciate the accuracy based on object recognition and segmentation.

Images before and after Segmentation.

Raspberry Pi 4

Input Image

YOLOv9 vs RF-DETR NANO

YOLOv9 vs RF-DETR MEDIUM

YOLOv9 vs RF-DETR SMALL

YOLOv9 vs RF-DETR LARGE

Raspberry Pi 5

YOLOv9 vs RF-DETR NANO

YOLOv9 vs RF-DETR MEDIUM

YOLOv9 vs RF-DETR SMALL

YOLOv9 vs RF-DETR LARGE

From the table and the few images, we can see that the benchmark wasn’t bluffing when it showed that the transformer model came for YOLO’s crown.

Pi 4 vs Pi 5

1. Performance Gain

Pi 5 is roughly 2× faster across all models
Inference time is significantly reduced
FPS nearly doubles in most cases

2. Best Performing Model

RF-DETR Nano remains the best option on both devices
It offers the best balance between speed, detection quality and resource usage.

3. Real-Time Feasibility

Even on the Pi 5:

FPS is still below 1 FPS
This is not sufficient for real-time segmentation

4. CPU Bottleneck

CPU usage exceeds 300–400% (multi-core saturation)
Indicates no hardware acceleration and heavy reliance on CPU inference

5. Memory Observations

YOLOv9 consumes the most memory
RF-DETR variants are more memory-efficient

Key Insights

From these tests, we see that:

Running modern ML models on RaspBerry Pi is possible
RF-DETR Nano is edge-friendly (relatively)
Pi 5 provides a meaningful performance boost

No model achieves real-time inference
CPU-only inference is a major limitation
Larger models scale poorly

Conclusion

This experiment reveals an important truth about Edge AI:

Running a model is not the same as deploying a system.

While both YOLOv9 and RF-DETR can run on RaspBerry Pi devices, their performance falls short of real-time requirements. The RaspBerry Pi 5 significantly improves performance, nearly doubling speed across all models, but still does not fully meet the demands of a responsive system.

Among all tested models, RF-DETR Nano stands out as the most practical choice, offering the best compromise between speed and efficiency. However, even this model requires further optimization and NPU to be viable in production.

See you at the next bug. Bye

Good luck as we create magic.

BENCHMARKING YOLOv9 AND RF-DETR ON PI 4 AND 5