Overview of Seeing AI
Seeing AI is a groundbreaking, free mobile app developed by Microsoft, designed specifically to assist individuals who are blind or have low vision. This innovative app leverages advanced AI technology to narrate and describe the visual world, enhancing users’ independence and interaction with their surroundings.
Key Features and Functionality
Channels for Various Tasks
Seeing AI is organized into several channels, each tailored to assist with different daily tasks:
- Short Text: Speaks small amounts of text, such as labels, package information, or text on a computer screen, without requiring an internet connection.
- Documents: Provides audio guidance to capture and read printed pages, including their original formatting. Users can also ask questions about the scanned document.
- Products: Scans barcodes and QR codes, using audio beeps to guide the user, and announces the product name and package information when available.
- People: Recognizes and describes people, including their age, gender, and facial expressions. It can also be trained to recognize specific faces.
- Currency: Identifies currency notes, which is particularly useful in regions where notes of different values do not have distinct tactile markings.
- Scenes: Describes the scene in front of the camera, with an initial brief description and the option to tap for more detailed information. Users can explore photos by moving their finger over the screen to hear the location of different objects.
Additional Features
- Colors: Identifies the colors of objects pointed at by the camera, without needing an internet connection.
- Handwriting: Reads handwritten text, such as in greeting cards, though this feature is available in a subset of languages.
- Light: Generates an audible tone corresponding to the brightness of the surroundings.
- Images in Other Apps: Allows users to share images from other apps like Mail, Photos, and Twitter to get descriptions of the images.
Advanced Capabilities
- World Channel with Audio Augmented Reality: Provides an immersive experience using Spatial Audio (requiring a device with LiDAR and iOS 14 ), allowing users to explore unfamiliar environments by hearing objects announced around them.
- Indoor Navigation: Enables users to create routes through buildings and navigate using sound cues (requires a device with an A9 or later processor and iOS 14 ).
- Explore Photos by Touch: Leveraging Azure Cognitive Services, users can tap on an image to hear a description of objects and their spatial relationships.
Multi-Platform Support and Languages
Seeing AI is available on both iOS and Android platforms, making it accessible to a wide range of users. The app supports 18 languages currently, with plans to expand to 36 languages in 2024.
User Experience and Feedback
The app is continuously evolving based on feedback from the blind and low vision community. Users can customize the order of channels, access face recognition more easily, and receive audio cues when the app is processing images. The app’s design and updates are driven by the principle of “nothing about us, without us,” ensuring that the needs and suggestions of the community are integral to its development.
Seeing AI represents a significant advancement in assistive technology, empowering individuals with visual impairments to navigate and interact with their world more independently and confidently.