Back to Homepage

◉–◉ Frame Vision Maps

AI-enhanced photo capture with geolocation, descriptions, and interactive map visualization using Brilliant Labs Frames.

The Idea

Frame Vision Maps integrates Brilliant Labs Frame smart glasses with the LLaVA Vision-Language model and OpenStreetMap to create an interactive tool for capturing and visualizing spatially contextualized photos. The system captures moments in real-time, generates AI-driven descriptions using a locally hosted LLaVA model, extracts location data via IP-based services, and maps these elements. This enables users to seamlessly log spatially enriched memories and visualize their captured moments on an interactive map.

Development

The project utilized Python to orchestrate communication between the Brilliant Labs Frame smart glasses and the LLaVA model (hosted locally via Ollama's API). Captured images were processed to generate meaningful textual descriptions. Geolocation was retrieved using IP-based services and appended to captions along with timestamps. This metadata, including photo paths, captions, and geolocations, was stored in a JSON file for visualization on a Leaflet-based map interface. Achieving smooth Bluetooth pairing with the Frame and optimizing API communication required significant debugging.

Reflection

The project successfully demonstrated the integration of AI-driven visual descriptions with spatial data visualization, showcasing the feasibility of combining AR hardware with AI for creating spatially and semantically rich image logs. The interactive map provided a user-friendly interface for exploring captured moments. However, the reliance on IP-based geolocation limited the accuracy and precision of spatial metadata, highlighting the need for GPS-based enhancements in future iterations. While this proof-of-concept showed promise, further development could also focus on refining generated captions and improving real-time synchronization between captures and map updates.

What Worked

  • Seamless integration of image capture, AI description generation, and location mapping.
  • Interactive map visualization of captured photos with descriptive popups.
  • Reliable image capture mechanism with Frame smart glasses.
  • Accurate generation of textual descriptions for captured images via LLaVA.
  • Efficient logging and robust backend for storing metadata (timestamps, descriptions, geolocations, photo paths) for frontend visualization.

What Did Not Work

  • IP-based geolocation lacked precision compared to GPS, limiting spatial accuracy.
  • Potential latency and occasional delays in API response or synchronizing real-time photo captures with map updates, especially during high-frequency use.
  • Negligible Augmented Reality (AR) immersion provided by the hardware/software setup.
  • Image quality captured by the Frames was poor, exhibiting visible banding and noise artifacts.
  • Nighttime image capture quality suffered significantly from excessive color banding, rendering photos nearly unusable in low-light conditions.

Github

calluxpore/FrameMap-Photo-Capture-Description-and-Location-Visualization

calluxpore/FrameVision-Smart-Image-Capture-Description-with-Location

Previous project
Back to all projects