Skip to main content

Notes from Drone research April 9 2025

·777 words·4 mins·
Author
Mark Ogata
AI and Robotics Researcher at BAIR

plan of action as to how you want to proceed with the project:
#

Raspberry pi with 2 CSI cameras with feed streaming via USB to voxl2.

Backup: USB extendor and 2 USB ultrawide cameras

what methods/ algorithms do you have in mind?
#

Overview:
#

  • Detection
    • Optical Flow
    • Object classifier
      • EfficientNet (2019) (ILSVRC dataset)
  • tracking
    • down sampled feature description matching between frames
    • timeout + optical flow with stationary assumption for occlusion
  • movement tracking
    • can orient towards target and use TOF sensor for distance

What goals/objectives you have
#

  1. Get raspi and camera
  2. Get USB communication between voxl2 and raspi working
  3. Get two camera streams displaying simulataniously in raspberry pi (30fps+)
  4. Get video steaming to voxl2 working (if bandwidth too low, we can sen vectors and patches)
  5. On the voxl2, try out algorithms for moving object or drone detection (single camera) (can test by just walking around with the setup and moving other objects in the room simultaniously)
  6. mount the raspberry pi and cameras in the drone
  7. Adapt the algorithm to work on omnidirectional on the drone
  8. Test detection algorithm outdoors by looking at detections while having the drone hover and walking around, throwing objects, flying drones
  9. Deal with occlusions (stationarity assumption?)
  10. get prioritization algorithm and tracking algorithm working
  11. Test it indoors
  12. Test it outdoors

what sort of vision system you plan to implement
#

2 raspberry pi ultra wide cameras with voxl2 front camera

why you need the hardware
#

  • raspi 5 (2 camera ports are needed)
  • power hat for raspbeery pi 5 (need to power the raspberry pi)
  • 2 ultrawide CSI 220 FOV cam (need at least 2 cameras for omni vision)
  • a usb A to usb C cable (comunicate to voxl via USB, uart is too slow)
  • charger for raspberry pi batteries (to power raspberry pi)
  • charger for drone to power it via wall (to develop on voxl2 without needing to charge every hour)

how you are going to use it, and over what timeline, 
#

Start using raspberry pi and USB ASAP to see if the current communication setup is compatible

Then mount the raspbeery pi setup on the drone and use the power hardware we order

todo
#

calculation of bandwith of USB to voxl2

if cant do that, can raspberry pi run the algorithms for feature detection

look up flops per second on raspberry pi look up benchmarking on the optical flow, effifcientnet

Weight of the power system for hte rapsberry pi, and how long it would last

POWER, BANDWIDTH, WEIGHT, FLOPS PER SECOND

FOV of voxl2 front

DROID SLAM, MEGASAM to get lightweight moving object detection

Bandwidth calculations (Mark)
#

The qualcomm chip on voxl2 can do USB 3.1 (5 GBps) I assume real bandwidth of 3GBps for real world conditions

Assumptions:

  1. Two independent camera streams share the 3 Gbps equally → 1.5 Gbps per stream.
  2. Uncompressed RGB24 (8 bits per channel × 3 channels = 24 bits/pixel).
  3. No additional protocol overhead (in practice you’d reserve ~5–10 % for USB framing, etc., but we’ll ignore that for simplicity).

1. Maximum pixels per second per stream
#

Each stream has:

$$ \frac{1.5\times10^9\ \text{bits/s}}{24\ \text{bits/pixel}} = 62.5\times10^6\ \text{pixels/s} $$

So each camera can send up to 62.5 megapixels per second.


2. Resolution & fps
#

For a given resolution ( W \times H ), the maximum frames per second is:

$$ \text{fps}_{\max} = \frac{62.5\times10^6}{W \times H} $$

Or conversely, for a target fps:

$$ W \times H \le \frac{62.5\times10^6}{\text{fps}} $$


3. Supported configurations
#

ResolutionPixel CountMax fps per streamNotes
1920×10802.073 MP≈ 30.1 fpsFull HD @30 fps
1280×7200.922 MP≈ 67.8 fpsHD @60 fps
1024×7680.786 MP≈ 79.5 fpsXGA @75 fps
800×6000.480 MP≈ 130.2 fpsSVGA @120 fps
640×4800.307 MP≈ 203.5 fpsVGA @200 fps
3840×21608.294 MP≈ 7.5 fps4K UHD @7 fps (very low)

4. Trade-offs
#

  • Chroma subsampling / compression (e.g. YUV4:2:0, MJPEG, H.264) can dramatically reduce bandwidth, allowing much higher fps or resolution—at the cost of latency and CPU/GPU load.
  • USB protocol overhead (~5–10 %) and other USB devices on the same bus reduce available ~1.5 Gbps per stream.
  • Sensor bit-depth: If using 10-bit or 12-bit per channel, multiply the bits/pixel accordingly (e.g. 36 bits/pixel for RGB12).
  • Bus bursts & buffering: Real deployments require handling USB packetization, isochronous transfer guarantees, etc.

Bottom line
#

With uncompressed RGB24 on a dedicated USB 3.0 link (3 Gbps total), comfortably supports 1080p@30 fps on each of two cameras, or 720p@60 fps, or any combination along the following trade-off curve:

$$ \boxed{W \times H \times \text{fps} ;\le; 62.5 \times 10^6\ \text{pixels/s per stream}} $$

feasibility calcs

Related

Notes from Drone research March 21 2025
·506 words·3 mins
Cool Startups and Labs
·293 words·2 mins
Flow Matching - How Image Generation Works
·1064 words·5 mins