Frame Vision Maps integrates Brilliant Labs Frame smart glasses with the LLaVA Vision-Language model and OpenStreetMap to create an interactive tool for capturing and visualizing spatially contextualized photos. The system captures moments in real-time, generates AI-driven descriptions using a locally hosted LLaVA model, extracts location data via IP-based services, and maps these elements. This enables users to seamlessly log spatially enriched memories and visualize their captured moments on an interactive map.
The project utilized Python to orchestrate communication between the Brilliant Labs Frame smart glasses and the LLaVA model (hosted locally via Ollama's API). Captured images were processed to generate meaningful textual descriptions. Geolocation was retrieved using IP-based services and appended to captions along with timestamps. This metadata, including photo paths, captions, and geolocations, was stored in a JSON file for visualization on a Leaflet-based map interface. Achieving smooth Bluetooth pairing with the Frame and optimizing API communication required significant debugging.
The project successfully demonstrated the integration of AI-driven visual descriptions with spatial data visualization, showcasing the feasibility of combining AR hardware with AI for creating spatially and semantically rich image logs. The interactive map provided a user-friendly interface for exploring captured moments. However, the reliance on IP-based geolocation limited the accuracy and precision of spatial metadata, highlighting the need for GPS-based enhancements in future iterations. While this proof-of-concept showed promise, further development could also focus on refining generated captions and improving real-time synchronization between captures and map updates.
calluxpore/FrameMap-Photo-Capture-Description-and-Location-Visualization
calluxpore/FrameVision-Smart-Image-Capture-Description-with-Location