Iâm working on an exciting game called Our Story where players collaboratively create narrated stories with GIFs. The game runs in the browser and incorporated AI TTS (Text To Speech) for narrating the stories players create. To ship this game I needed to handle web development complexity and one of the largest hurdles I found was handling audio.
The Problem
Web Audio is a challenging problem because of the vast quantity of devices you need to support. Fortunately browsers have aligned more and more closely around a key specification provided by the W3C making this job easier. However mobile is still a challenge.
The way you might integrate audio in the years leading up to today is to use the normal Audio
constructor.
const audio = new Audio("https://example.com/audio.mp3")
This constructor under the hood creates an HTML <audio>
element and gives you a reference to that element. Handling events and interactions then uses the same interactions that you might use on a regular audio element making the interface familiar to use.
The big issue with this approach is when youâre trying to handle multiple sources of audio you need to juggle multiple references. You also need to error handle them together if audio encounters an issue on the client. This leads to especially complex code when youâre using Audio duration as a timer for an event in a game or when you might want to sync up audio sources.
To make matters worse, mobile browsers (especially Safari) add special conditions to ensure that the user means for audio to play. This will automatically block new sources of audio unless the user specifically clicks an element that says it is ok for audio to play. You can be creative to get around this but now youâre really having to write a lot of code.
How the Web Audio API Helps
The new Web Audio API standard steps in here to make things much easier to handle. An AudioContext helps to aggregate audio sources and define top level error handling and source management. Then each audio source is able to customize a âpipelineâ of configurations and listeners that enable granular tweaks. It feels like a great abstraction for working with audio.
Letâs start by showing how the AudioContext
is defined
const WindowAudioContext = window.AudioContext || window.webkitAudioContext
const context = new WindowAudioContext()
Above you can see the fallthrough check to enable safari support via webkit. Itâs one of two checks we need to make to get the new API working since Safari still doesnât support the full spec completely.
Next letâs load up an audio source into a format that can be used with the new API
const context = new WindowAudioContext()
// ...
const response = await fetch(requestUrl, { mode: "cors" })
const arrayBuffer = await response.arrayBuffer()
const buffer: AudioBuffer = await new Promise((resolve, reject) =>
context.decodeAudioData(arrayBuffer, resolve, reject)
)
Here we make a fetch call to get our audio data, we get an array buffer for that data, and we finally decode that audio data using our previously defined context. Make sure that CORS is enabled for the fetch request or you wonât be able to read the array buffer. Now we have the Audio in a shape that will enable us to load it up into the AudioContext.
Note: weâre using the wrapping syntax on the
decodeAudioData
call because of our second unsupported Safari feature. In Safari this function does not have a promise return and only has the callback interface. You can see I converted it for convenience. You can read more about it in the MDN browser compatibility section for this function.
Next letâs load this audio buffer into our context
const trackSource = context.createBufferSource()
trackSource.buffer = audioBuffer
trackSource.connect(context.destination)
trackSource.start()
Using the context we can create a buffer source compatible with that context. We put in the buffer we previously defined and connect the track to the context destination. In this case that destination will be the default, our browserâs audio source. Finally we start the track.
This looks much more complicated to get the same thing as the Audio
API but it is pretty straightforward code. The real power shows up when we start to deal with some of our edge cases I mentioned earlier.
For example, letâs start by taking a look at what we do when the audio fails to start
const [contextState, setContextState] = useState(context.state);
useEffect(() => {
setContextState(context.state);
context.addEventListener("statechange", () => {
setContextState(context.state);
});
}, [context]);
//...
{contextState === "suspended"
? <Button
label="Enable Audio"
onClick={() => {
context.resume();
}}
/>
: <p>Audio Playing</p>}
}
Here I used some React code to show how I handle the audio context and check for a suspended state. A suspended state happens when a mobile user hasnât interacted with a webpage yet and the browser has kept audio from playing. Itâs a default on mobile so that annoying ads or pages canât annoy users while they are browsing on their phones.
Looking at the above code I simply show a button when this happens and use the context.resume()
call from the API to play the audio. Because the user has interacted with the page when this happens, they are now able to start playing the audio. This resumes all tracks tied to the AudioContext.
Now if we wanted to create a timer on one specific track we can do that easily as well by listening for when a specific track ends.
trackSource.addEventListener("ended", () => {
timerTrigger()
})
When we want to clean up our work when weâre finished itâs straightforward too
trackSource?.stop()
trackSource?.disconnect()
All and all this is a way better API that working with a series of independent Audio elements. It enables a better experience to users that are using multiple browsers and it handles errors and event tracking with a good level of granularity. Hereâs a link if you want to read more about the API. Look forward to listening to what you come up with as we work together to make the web a little bit more interactive!