Ian McD • Artist of Emerging Mediums with a Variety of Complex Projects
sas.png

Synthaisthesia

Synthaisthesia

Synthaisthesia Handheld Device Prototype

Synthaisthesia Handheld Device Prototype

 

Synthaisthesia :: (Synth+AI+Synesthesia)

Synthaisthesia is an open-source assistive technology suite I developed at the Smithsonian Institution. R&D for the suite was made possible by grants from Smithsonian Year of Music and the Smithsonian Accessibility Innovations Fund (SAIF). In its current form, the suite consists of two software applications, a handheld device and a hardware setup for touchless interfacing with the software using a Leap Motion controller.  

Synthaisthesia establishes a new method and medium for visual description and is intended for, but not limited to, use in a museum. Typically, when presented with an image on a website or a work of art in a museum, people with low vision or blindness will be given a verbal description to better comprehend it. These descriptions, whether in the form of a screen reader or human voice, tend to be short and general, limiting the listener’s understanding of the image composition. Although this practice is better than nothing at all, it does not reflect the true impact of imagery. The phrase “a picture is worth a thousand words,” does not necessarily mean that the picture can be described in a thousand words but cannot be contained by words alone; imagery is able to describe things that cannot always be conveyed verbally, like sensation, emotions, rhythm, and illusion. Synthaisthesia was developed to address this problem, by creating an audio experience parallel to the visual experience of art, so that blind and low-vision visitors can have a better understanding of the visuals in museums. 

Project Philosophy & Background 

The initial idea for creating a device that would translate visuals into sound came from an interest in sensory substitution. Our senses allow us to interact with each other and the world around us, so if someone lacks one of their senses, are they unable to communicate with others or experience external, everyday reality? Of course not, they learn to adapt by swapping in other sensory substitutes.  The sightless use touch to read, the soundless use sight to speak, even those who have lost their sense of balance can find it again using their tongue. Though forms sensory substitution have been around for the span of human history, it wasn’t until the middle of the 20th century that it became a studied subject. Recently, with the explosion of technology along with contemporary topics like artificial intelligence, computer vision and wearable technology, a rich landscape of opportunities to explore sensory substitution has opened. Inspired by this young-yet-rich history, Synthaisthesia set out to add a new chapter to the field. 

 

 

The .sas file

Central to the Synthaisthesia experience was an image file that allows the user to both see and hear its content, known as a .sas file. Sas is shorthand for Synthaisthesia, but I also thought nice to invent the first image file that has sass as its central trait. The sas file is almost a combination vector and raster image. The vector-like elements are coordinates that highlight the image’s key regions that contain audio files describing the regions. These regions are invisible to the eye, but allow the computer’s cursor to be aware of their location in the image. The raster elements are simply the visible pixel image, making the file appear to be a regular image at first glance, but once the user moves the cursor inside the image, it activates audio within the vector elements. Be being able to navigate the image by feeling and hearing its elements, what and their relationships to each other are, a blind user can better understand the composition, rhythm, and patterns within the work of art, a feat that is much harder to do with general description. The audio descriptions can be verbal, to give the user a literal sense of the image details, or tonal/musical, to give the user a sense of elements that cannot be conveyed by words alone. The volume of each description can be adjusted by the user so they can mix to their taste. Demonstrations of the sas file are below.


 

Demonstration of a sas file’s verbal description mode. A mouse hovers over areas of Jim Sudduth’s painting, Untitled (Chicken), starting on the tail at the left side of the painting and working its way up toward the top left where the chicken’s head resides.


Demonstration of a sas file’s tonal description mode. A mouse hovers from the left side of Gene Davis’ painting, Hot Beat, to its right. Each colored stripe has a distinct instrument sound, reflecting its emotional nature.

Demonstration of the sas file’s verbal and tonal description combined. A mouse hovers over the areas of a poster for the 1969 Woodstock music and Arts Fair. Red areas are illustrated by a drum beat to reflect the punchy color. A bass or guitar joins in the beat when the mouse hovers over the newsprint-yellow areas. The guitar also references the fingers and bird fingers clutching the guitar. As the mouse hovers, the underlying regions are described in the lyrics. Not all lyrics are heard in this demonstration.


 

“It gives me a spatial orientation…
I know there’s a border, and there is black,
and there’s the chicken and it’s facing to the right, and I understand the painting probably better than I’ve understood another painting.”

-Candice Jordan-

 

 

But How?!

The sas files, of course, are not auto-embedded through some ai-generated process. Such automation would lack a human element, at least by current ai capabilities (though arguably for any AI capabilities, given its nature). Though automated accessibility systems are feasible in some capacities, like text-to-speech, found in screen readers, these types of systems lack a human touch—in a museum setting, a cold, automated approach seems inappropriate. The solution was to design my own interface for creating sas files. The interface (pictured below) is a hybrid of a Digital Audio Workstation (DAW) and a graphics program. It allows the user to import an image and trace over its key regions. Every shape traced has two audio files attached, which are activated during the .sas file experience. Additional parameters can be modified in the application to affect things like the audio files’ play direction, panning and focus-area activations. More details about the interface are described in the two images below.


 
 

The Synthaisthesia Content Creation Application

The image below illustrates the software interface I made to create .sas files. It’s similar to Adobe Illustrator in its panel-based design. Descriptions of each panel are below the image.

 
sas-software-layout2-01.png
 
  1. The Canvas:
    The image to be described is loaded in and traced here. After tracing, the computer’s folder system prompt opens and the user can select two files for tonal and verbal descriptions.

  2. Track Properties Panel:
    Adjusts aspects of the selected shape/track’s audio. Files can be set to loop, play forward, backwards, ping-pong, reset to the beginning every time the shape is entered and set the audio mode between sampling and granular synthesis.

  3. Tracks/Shapes Panel:
    Every time a shape is drawn, it adds a new track here. An equivalent in a graphic design program would be the layers panel. The square on the left side indicates the track’s color for easier navigation. The buttons on the right and bottom are for soloing or muting the audio, hiding the shape, or loading a new audio file.

  4. The Sample Panel:
    Allows the user to load more samples onto a shape. By default each shape starts with two audio samples, for tonal and verbal desription. Description types can be selected by the description buttons. Arbitrary numbers can be added to the left and right text boxes to set the focus size audio activations. Each sample has a text box for a name and solo, mute and new file buttons.

  5. File Panel:
    Allows the user to save and load .sas files, the native file to Synthaisthesia. Master volume for tonal and verbal description can be set here.

  6. Audio Control Panel:
    Allows the user to adjust the volume and panning of the selected audio track, and another place to solo or mute on all of the samples associated with the track.

  7. Shape Modifiers Panel:
    Allows the user to adjust the inner or outer radius, the difference of which controls how quickly or slowly the volume fades as the user enters and exits the shape. Circle shapes are really just polygons with a lot of sides, the number of sides can be reduced all the way down to three using the bottom slider. Buttons for hiding and viewing the radii, snapping to other shape’s points while drawing, and converting a shape to a path (close shape),


Focus-Areas

Each .sas image has the ability to be multi-dimensional. I know this is a strange idea for a 2D image, but I don’t mean 3D, rather, every image can be considered multi-dimensional in terms of its detail. The eye has he ability to look at an image as a whole, but the eye can also refocus and see regions within the composition, and even minutia like the texture of brushstrokes. To imitate this ability, every .sas file can play different audio files depending on the focus area, illustrated by the red square around the cursor. This focus area acts as a parallel to the eye’s ability to focus on different levels. By expanding the focus, the user activates more general image descriptions, by contracting it, more detailed. An example is illustrated by the three images below with Jim Sudduth’s painting, Untitled (Chicken). More general descriptions are in the image on the left, medium-detail descriptions are in the image in the middle, and finer details in the image on the left. The shapes super-imposed on the image are traced regions, which would have audio files which read the descriptions below each image.

 

Focus Area: 67-100 - General Description

Untitled, parenthesis, Chicken, by Jimmy Lee Sudduth, circa 1995. A two-dimensional side profile of a rightward-facing chicken, painted on plywood with clay and syrup. The animal is colored white and red with hints of orange-brown in its tail. Against the deep-black background on which it is painted, the bird stands out. The painting does not make much of an attempt at realism, it’s expressive and messy, but in a good way. You can truly feel the artist’s hand and passion. Around the edge of the painting is a brown-dominant, orange-brown border, aesthetically fencing the creature in. In the bottom-left under the bird’s tail is Sudduth’s messy, cursive signature, loudly declaring who painted this work.

Sudduth Detail Example-02.png

Focus Area: 34-66 - Medium Detail Description

  1. “Jim Sudduth,” the artist’s signature.

  2. The chicken’s feet appear to have been painted in rapid downward-right strokes. They are painted in white clay, with some of the black background showing through. There is a small blotch of red clay at the center of the chicken’s left leg, perhaps accidentally dripped.

  3. The chicken’s breast is almost a perfect half circle, painted with white, almost anchoring the animal directly in the center of the square plywood used for a canvas. It feels whole and stable, acting as a base for the animal.

  4. The chicken’s tail is a blending of white and red clay, with subtle strokes of black bleeding through from the background. The brushstrokes are visible, long and form arches that flow up toward the chicken’s back, giving the painting a sense of energy.

  5. The chicken’s wing isn’t really apparent. There is a slim border of orange-brown at the top of the bird’s breast that hints at it, but there isn’t much distinction, aside from the color difference of the red and grey above the orange-brown and the white below.

  6. The chicken’s neck is thick at the bottom and thins out toward the head like a bottle neck.

  7. The chicken’s waddle is a thick red blob hanging from its beak. The waddle pairs with the comb on the top of its head to act as a frame for the bird’s eye, emphasizing it as the painting’s focal point.

  8. The chicken’s beak opens with its lower half jutting out horizontally and its upper half tilted at a forty-five degree angle, giving a sense of energy. Due to the painterly application, the emotion of this energy is a bit ambiguous, it could be angry, joyous, frightened, hopeful, it’s left up to the viewer and one might argue the painting’s beauty lies in that ambiguity

  9. There is a single eye smack-dab in the chicken’s head. Two white, broken, blobby strokes of clay form an outline of a circle, which although lopsided, reads effectively as the chicken’s eye. Both inside and outside the circle the black background of the painting shows through.

  10. The chicken’s comb, the red mo-hawk like protrusion on its head, is painted pastel-red clay with strokes that look like they were created in quick, upward motions.

  11. A brown border surrounds the chicken, almost like mud smeared across the floor. There was clearly no attempt to be exact here, but that makes it feel natural and satisfying

Sudduth Detail Example-03.png

Focus Area 0-33 - Fine Detail Description

  1. The Area around the artist’s signature is lightly speckled with white paint, giving it a chalkboard-like texture

  1. A Stray Red brush stroke, slanted slightly upward, echoing the brush motion of the paint surrounding it

  2. The chicken’s talons are made of thin, wirey strokes, almost like twigs broken off a branch

  3. Subtle orange-brown strokes, matching the color of the painting’s outer border weave their way in between the messy, more dominant red and white

 

The handheld Device

I was trying to create a cross-sensory experience and an accessible experience, so I wanted to construct a handheld that felt as interesting as it looked. If a museum wanted to create their own Synthaisthesia device, the barrier had to be low. Because it was intended for open source distribution, it had to be easy to assemble, so I whittled the design down to three main parts —a handle, a lid and a body. The parts snap together simply by friction, and are secured with a single bolt in the back of the handle. Three additional parts, small discus-shaped knobs are included in the .stl (3D print) file to attach to the potentiometers for tonal and verbal volume control and focus area adjustment.

The handle contained a camera, a button and an audio jack for headphones. The body was carved out to fit a Raspberry Pi with a custom circuit board HAT. The hat was a custom schematic to hold an accelerometer and an integrated circuit (IC) to process the potentiometer signals. The body also contained chains for wiring and the camera ribbon, holes for the potentiometers, and an area for the charging port.

 
The 5 parts, disassembled.

The 5 parts, disassembled.

The circuit designed and soldered by me. The circuit allows the Raspberry Pi to hold the accelerometer and the IC as a HAT (Hardware Attached on Top). The board could snap directly onto the Pi, creating less hassle in assembly.

The circuit designed and soldered by me. The circuit allows the Raspberry Pi to hold the accelerometer and the IC as a HAT (Hardware Attached on Top). The board could snap directly onto the Pi, creating less hassle in assembly.

 
 

(More Soon)