32-bit Audio Fight Club: The Saturn, the PlayStation and the End of the Chiptune Era

The first rule of 32-bit Audio Fight Club is that you don’t talk about 32-bit Audio Fight Club. Actually, that’s not quite true: the first rule of 32-bit Audio Fight Club is that…well…you probably wouldn’t even think to talk about 32-bit Audio Fight Club. Or register it exists at all, really.

You see, by the time console hardware manufacturers were thinking about moving on to machines with fancy “32-bit” processors, other hardware developments were afoot. Most importantly for this article, manufacturers were steadily moving away from storing games on ROM chips housed in plastic cartridges and moving towards burning them onto cheap and relatively spacious Compact Discs. Suddenly, most consoles had the ability to simply stream “real” music straight from the disc. The sound chips found in machines like the Sega Megadrive and Super Nintendo were thus suddenly rendered completely superfluous. Or were they?

As is the case with most technology, the new opportunities offered by Compact Disc were tempered by some new drawbacks. To quote veteran games composer Yasuhito Saito (interviewed in the excellent The Untold History of Japanese Game Developers: Volume One: )

There was a great sense of freedom at the beginning using CDDA for the music. It provided me with unprecedented freedom as a composer, and because there was no longer the limitations of a sound chip, I was happy to start with. But then I realised they were using the CD for data, and when the data reading started then i had to stop the music. So later on I found it was not as free as I thought it would be.

The contended disc access identified by Saito wasn’t the only drawback – there were some even more fundamental limitations to deal with as well. Though sound chips tended to have limitations when it came to sound quality, one of their big advantages was that their perfomance was a live one. Whether they were playing tiny fragments of sampled audio (like the Super Nintendo) or synthesized sounds (like the Megadrive), their musical output was created from small fragments of note data send on the fly – not unlike a musician reading a traditional score.

This not only kept the size of the data needed to play the music small (the console only needed to read the note it needed to play instead of the music itself) it also meant that the music could be altered as it was being read. For example Sega of America’s In-house sound driver, GEMS, allowed composers to alter music both for creative and practical reasons: for creative reasons it was possible to use GEMS to change the music based on a variables found in the game code (say, the player is running out of time) while on a more practical level a prioritisation system allowed composers to choose which instrument tracks could be dropped from the mix if the game needed to play a sound effect instead.

You need to play Tommy Tallerico’s GEMS- based Cool Spot soundtrack across two Megadrive consoles if you fancy hearing every instrument in the mix.

If chiptunes were small, fast and live, CD-based audio was big, slow and dead: each minute of CD-quality PCM music used approximately ten megabytes of storage space and – aside from crudely speeding the entire recording up and down – there were no real changes that the game code could make to it on the fly. Perhaps the biggest problem, however, was the spinning of the disc itself: where data stored on cartridges had been (from the user’s perspective) instantaneous to access, CD drives had to physically find the correct sector of disc to read from. This added a delay that could be measured in seconds rather than milliseconds – often leading to a moment of audible silence if the a track was changed mid game. Compact Disc gifted composers a straightforward way of including “real” music in their games, but it did so at a price that was higher than we generally give credit.

To what extent was this really a problem? Usefully, if we wish to see this bottle neck in action, we need look no further than the Saturn port of classic gore-fest Mortal Kombat II. Though on the surface all appears well – with each battle starting in the way a Mortal Kombat veteran would expect – it only does this because the game stores each stage’s music and the most common sound effects ( punches, kicks, grunts, screams and the iconic “FIGHT!”) in a readily-available cache. The second a non-cached sound is needed, the game pauses momentarily as the laser moves to the correct place on the disc in order to find the sound effect.

An example of the sound stutter found in the Saturn version of Mortal Kombat II

In the final event the Saturn has the ability to cache sound effects, so this flaw isn’t exactly game-breaking. However, if the Saturn had to read every sound effect from the disc than this delay would occur every single time a sound effect was called – creating a significantly larger headache for the player.

The Saturn and the PlayStation weren’t the first consoles to encounter this problem, of course. By 1994 the use of the medium in console hardware was increasingly common: The PC Engine’s CD-based add-on had been rushed to the market as early as 1988 while in the years that followed the world had been introduced to CD-based devices from the likes of Sega, Atari, Commodore and Panasonic. How had previous CD-based devices got around this fundamental problem?

The answer lays in the fact that most of existing pieces of CD-based hardware were add-ons for existing consoles that already had their own sound solution. With a soundchip already present, CD-based add-ons were only needed to stream linear music, with the existing sound chip always being on hand to take care of any spontaneous sound effect-generation duties.

A great example of this is Sonic on Sega’s Mega CD. While the music underscoring the areas set in the “future” and “present” section was made up of recorded audio streamed from the CD, the sound effects and music for the “past” areas were created on the fly via a combination of the Megadrive’s soundchip and a RICOH RF5C164 sampler-style PCM chip contained in the Mega CD. This was a musically ingenious combination, as the Ricoh’s 64kb of RAM meant that the arrangements of the Past tracks naturally sounded more primitive when compared with the rich, processed musical arrangements streamed straight from the CD.

As a consequence, one of the great ironies of the 3d era is that, though Nintendo’s N64 was the only machine out of the big three to stick with the outmoded Cartridge based medium, it was also the only machine that didn’t feature a dedicated sound chip, as the PlayStation and Saturn both needed something to paper over the shortcomings of optical media. If both consoles made use of custom sound chips, however, what form did they take? How did they compare to what had come before? Can we tell what music is being played from the sound chips and what is being played from the disc? For the morally-dubious purpose of creating some grubby unashamed clickbait, let’s have a look at these questions through the medium of a good old fashioned face-off.

Round 1: Sampled Audio

When it came to integrating CD players into videogames consoles you’d probably think Sega and Sony should have been neck and neck. Sega, after all, had successfully released the Mega CD while Sony had developed their original Playstation add-on for the Super Nintendo to the point of building working prototypes. Sony, however, definitely had an upper hand when it came to soundchips: While Sega’s dry run for the Saturn – the 32x addon for the Megadrive – featured a software-based sound solution, the sample-based sound system used in the Super Nintendo had been designed by none other than PlayStation father Ken Kutaragi himself .

Though the SNES wasn’t the first machine to deploy a sound chip based around sampled audio (Commodore’s Paula chip was doing this as early as 1985) the SNES design was an important model for what was to follow. Rather than a standalone chip, the design Kutaragi produced for the SNES was an entire subsystem, combining a CPU, dedicated RAM supply and signal processor. There had been some technical streamlining to keep the cost down – the unit could handle only eight samples simultaneously and featured sixty four kilobytes of RAM – but these weren’t as bad as they might seem thanks to the units ability to handle compressed nine-bit, thirty-two kilohertz samples.

For the PlayStation we might expect that Kutaragi and team took this basic model and expanded it. That assumption is completely correct: for the Sony Playstation, we find the number of simultaneous samples jumped from eight to twenty four, The nine-bit, thirty two kilohertz samples replaced with compressed sixteen-bit forty-four kilohertz (CD quality) samples and the limited sixty four kilobytes of ram expanded to a roomier five hundred and twelve.

How did the Saturn compare? Interestingly, if the Sega machine hadn’t released first and the Sony machine hadn’t been so clearly dependant on Kutaragi’s SNES design, you might think one had directly copied the other. As it stands, Sega finished up with a similar design from a slightly different origin.

Up to the release of the Saturn, Sega had always made sure that their new consoles were backwards compatible with the older model. For the sixteen bit Megadrive this had been achieved through the audio system: combining a new FM synthesizer chip with the Master System’s aged Programmable Sound Generator and having the pair managed by the chip that acted as the Master System’s brain (the ever-reliable Zilog z80.) This worked well for everyone involved – the Master System PSG allowed the Megadrive to bolster it’s sound with four additional channels when it was running Megadrive software, while the presence of the z80 allowed the Megadrive to boot directly into a Master System compatibility mode if required (the £30 Powerbase Converter wasn’t doing much more than changing the shape of the cartridge)

Though the setup on the Saturn looks similar to the PlayStation, it also has clear evolutionary links to the Megadrive. In the Saturn, sound was dealt with by a unit, rather than a single chip. As with the Megadrive, the main focus was on a new sound chip provided by Sega’s long term partner Yamaha (the YMF). Once again, this chip was managed by the brains of Sega’s last console – in this case a cost-reduced version of Motorola’s 68000.

From there the similarity is more similar to the PlayStation than Sega’s previous consoles – swapping the previous console’s soundchip for 512kb of audio ram and a signal processing unit.

In terms of how they compare head to head, it’s an interesting one. On the surface the Saturn has one very obvious advantage: while both machines could handle CD-quality sixteen-bit samples, the Saturn could handle thirty two simultaneous sounds compared to the PlayStation’s twenty four. Theoretically, Sega’s machine could thus create an additional thirty three percent of the sounds the PlayStation could at any given moment.

However there is one important qualification here: both machines had the same amount of audio RAM. Though the Saturn was theoretically able to play more simultaneous sounds, it had to fit them into the same amount of space. On top of that, the PlayStation had one clear advantage over the Saturn: while Sony’s machine was intended to work with a compressed XA (Adaptive PCM-based) file format, while by default the Saturn worked with larger uncompressed samples only.

Does that mean that the PlayStation was actually superior to the Saturn? Again the situation isn’t quite as simple as it seems on the surface. Though the Saturn wasn’t designed to handle compression via hardware, the 68000 remained a relatively capable processor – more than capable of taking compressing samples and turning them into something the Yamaha chip could handle.

On top of this there was another dimension to creating music for these consoles. Though having the ability to compress large PCM samples was good, a number of developers (particularly Japanese developers) took things a bit further. In the early 90’s, a number of companies were dabbling with an idea PC sound giant Creative dubbed ‘Sound Fonts’. The idea was quite simple: When MIDI information was sent to a sound card (or, in our case, a games console) the midi command could instead be used to trigger a small instrument sample instead of a synthesised note – just as a controller keyboard could be used to drive a hardware MIDI sampler.

For the PlayStation and Saturn, many developers took this idea and ran with it. Sega licensed a library called CyberSound that was built around similar principals, while many PlayStation developers came up with their own solution for triggering tiny instrument samples via midi commands. If you’re wondering about how well this practice worked in the real world, you’ve possibly already heard the results: Because Sega licensed CyberSound for their official develop kit they ended up using in a number of their own games, including as Panzer Dragoon Zwei and Nights. On the PlayStation side, a soundfont-type solution was also deployed in a number of famous titles including Final Fantasy 7.

The tracks for Nights were effectively written as chiptunes

Consequently, sample playing ability ends up as a bit of a wash. Sega’s deployment of CyberSound negated the any advantage offered by the PlayStation’s better handling of ADPCM, while the ability of the Saturn’s YMF292 to handle 32 simultaneous samples doesn’t seem to offer a terrific advantage when the PlayStation carried a similar amount of audio RAM. We can argue about the relative paper benefits of each, but in the real world the strengths and weaknesses in this area seem to be broadly similar.

Round two: Signal Processing

The next big battleground we can identify is what happens when a sound was played. If you are wondering why this is relevant I ask you to consider this: the character of the “typical” Super Nintendo instrument sound wasn’t actually created by the samples that were fed to the chip – it was added after the fact by the sound processor’s incredibly distinctive reverb. Without the distinctive echo, even those well familiar with the console may fail to acknowledge the origin of a sample ripped directly from a SNES ROM.

When it comes to signal processing, we find the PlayStation sticks to the theme we’ve discovered, taking existing feature found in in the SNES and upgrading it: While Nintendo’s 16-bit machine had one distinctive echo it could apply over the entire sound, the Digital Signal Processor built into Sony’s machine had a number of delay and reverberation effects that could be configured by the composer and applied differently to different sample slots.

Aside for options for musical creativity, this granular configurability had obvious benefits for sound design. With configurable levels of echo, the same footstep sound could be made to work in both a small cavern or a grand hall. To that extent it can definitely be said that the signal processing capabilities of the PlayStation were a step ahead of its predecessor.

However, though in our current era of permanent fan-rage it can be dangerous to fully favour one side of a debate, it’s difficult to argue that the Saturn’s signal processing abilities weren’t just simply superior. Like the PlayStation the Saturn also offered configurable reverb and delay effects, but it also offered a host of other effects as including Chorus, EQ and Flange. The PlayStation was no slouch, but in this area the Saturn offered everything the competition did with a few useful extra bits on top. Though of course there were obvious workarounds (like recording things twice – once with chorus and once without), these extra effects had genuine real-world utility.

Round 3: Frequency Modulation

With the Saturn taking a slender 1-0 lead, it’s now time to look at one final area: Frequency Modulation.

As categories go, this is probably a little controversial. Frequency Modulation was the technique used to create sounds by both the Sega Megadrive and arcade boards dating back to the mid eighties. Though by the time of the Saturn and PlayStation’s release its distinctive metallic twangs had definitely fallen out of favour, it was a technique both machines could deploy. For a comparison of the overall capabilities of the chips we should definitely consider it.

What exactly is Frequency Modulation? Discovered in the US in the late 60s, FM synthesis is basically a technique of using the frequency of one or more sounds (modulators) to alter the character of an original (the carrier.) The complexity of the sound is directly related to the numbers of operators (carriers and modulators) you have a available. The more you have, the more complex a sound you can produce.

Though the PlayStation Sound Synthesizer was capable of FM synthesis, its implementation was quite rudimentary: not supported at a hardware level, the mixing required to create the effect had to be done at a software level, requiring samples to be used for the carrier and modulators. Consequently, FM on the PlayStation wasn’t widely used outside of sound effects.

Unfortunately for the PlayStation, the Saturn sound chip was produced by Yamaha, who had been the exclusive licensee of the patent for FM synthesis since the mid seventies. By the time the Saturn arrived on the market they had the best part of twenty years experience of working with the technology – and it shows.

To understand the beauty of the Saturn’s FM abilities, we should look at the setup of the FM sound chips used in its immediate predecessor. The YM2612 was capable of creating 6 different sounds simulataneously, each using four sine wave operators. When it came to the algorithms used for mixing the sine waves together, these were hard coded. The choice was generous, but still limited nonetheless. Similar restrictions were found on other FM chips of the era. Though the chip used in the Adlib and Sound Blaster could play more simultaneous FM instrument sounds than the Megadrive, this came at the cost of simpler sounds that used only two operators.

The genius of the Saturn solution was that these restrictions were entirely removed. Each of the Saturn’s 32 sound slots could be used either for sample playback or as an FM operator and it was up to the composer to decide how to divide and route them – be it sixteen two-op sounds, eight four-sounds, five six-op sounds or any combination of the above that added up to thirty two. The ability to create six-op sounds was particularly impressive: not only were these generally used by Yamaha’s flagship FM synthesizers, but any patches created for these synthesizers would have been cross compatible with the Saturn’s YMF.

Megaman 8 was one of the few games that took advantages of the extra capabilities of the Saturn’s Sound Chip

As impressive as the YMF’s abilities are on paper, Would they give the Saturn any real world advantage? In this case I think the answer is definitely yes. If We think of a composition that filled 24 audio slots and maxed the audio ram on both machines, the Saturn would still have eight slots left for FM sound generation. This could be used to either keep the original composition intact and enrich it with extra FM sounds, or it could be used to replace existing instrument samples with an FM equivalent in order to free up Ram for more samples. Though the existence of Saturn-specific soundtracks was relatively rare, I think it’s reasonable to chalk this up as a real world victory.

Saturn Wins…Friendship?

As dangerous as it can be to personally come down on one side of these sort of debates, I don’t think there’s an argument against the Saturn having the stronger sound hardware – it simply simply pushed the envelope a little further than its Sony-derived competition. If you were going to give me the tracker software needed to produce music for one platform or the other, I’d definitely rather it was designed for the Sega machine.

The Sega Saturn’s YMF292 (Licensed under creative commons. Picture taken by Wikipedian Yaca2671

Again, that’s not to say the PlayStations sound hardware was bad or incapable by any stretch of the inagination (and I suspect the relatively frugal improvements made to the original SNES design may have played a role in that famous ” two hundred and ninety nine US dollar(s)” price tag, but the YMF chip used by the Saturn was a beast. Sega’s early machines had generally been based on weaker, cost-reduced versions of technology used in their arcade machines. When it came to the Saturn this flow went the other way: The Saturn may not have been advanced enough to run “arcade perfect” ports of titles based on Sega’s System 2 hardware, but when it came to building the spec for the System 3 the Saturn’s sound chip made the cut.

Whichever was superior, The PlayStation and Saturn definitely mark something of a last hurrah for hardware-based audio processors. Though Nintendo’s N64 released just a year or so later, it didn’t include a dedicated audio processing unit at all. Instead, audio was handled entirely as a function of the N64’s co-processor. This caused its own problems: though theoretically the N64 could handle more audio samples than the PlayStation and Saturn combined, this was only if it was used as exclusively as an audio player. Once games developers started running game code the machine’s audio-playing abilities declined in correlation with the demand on the processors. In this way, the new dawn ushered in by the N64 was essentially a return to the days before dedicated sound chips: with composers and sound designers having to work with whatever scraps of memory were left over from running the main game code. That, however, is all a story for a different article.

Update 29/03/21: I erroneously labeled the Sonic CD past tracks as being generated via the YM2612 rather than the CD’s RICOH chip. Thank you Rowan Lloyd.