September 9, 2024
6 min read
Improving inference speed of vision-language-action models for edge devices while preserving encoding power.