IF SPEAKERS ONLY COULD
Exploring The Outer Limits Of Reality Simulation In Audio
by Benny DeRose
A BRAIN-CANDY TUTORIAL FOR THE HOPELESS MUSIC LOVER
Try to imagine a time when the Victrola was actually the current state-of-the-art for hearing recorded music. At that time, listeners
transfixed by the spectacle, insisted the play-back sounded 'just like’ the original. Today it's a different world. Or is it?
The advent of digital audio has provided massive improvements for storage & retrieval and improved fidelity like never before.
We also enjoy the benefits of today's longer word-length and higher sample rates, plus a plethora of signal processing tools.
Even so, that elusive sonic utopia of drop-dead realism continues to elude even the most well-intended and sincere recording
engineers and audiophiles. Still, when describing playback sound quality, terms such as "realistic" and "life-like" are just
as freely tossed about as they were a century ago.
Many consumers today however, place more value on features such as quick downloads and the convenience of portable music players
over supreme sound quality. Yet others have high hopes for the merging super-resolution playback media to gain a foot-hold
as a sweeping panacea for hungry music lovers, by virtue of the new disc’s technical superiority to the standard compact disc -
supported by those clamoring to embrace this small but significant improvement to the listening experience.
Many of these quality-minded aficionados are delighted, certain that the ‘straight-wire-with-gain’ is finally about as straight
as it can be. Yet, many agree that today's state-of-the-art in music reproduction still leaves something to be desired compared to
witnessing an actual LIVE performance in progress.
If the straightest wire possible can not provide the breakthrough of an unmistakable you-are-there experience, then what
exactly IS the hindrance that keeps a fine recording sounding like merely a fine "recording"?
We're going on a fact finding mission to get to the bottom of this seemingly unsolvable mystery with a fresh approach to
satisfy the 'need to know' curiosity of audio-sleuths and the recording industry itself - beginning right now.
The 'life' of the original musical performance gives up the ghost at the very point where it departs the acoustic world
and is transduced (by the microphone) into an electrical signal - a state where life ceases to exist. That signal merely
represents what once was, and remains dormant throughout the entire signal chain until it arrives at the point where it is to
become acoustic again (the loudspeakers).
Regardless how evolved the recording technologies, the electrical signal representation must endure the physics of transduction
back into acoustic energy so we can hear it. The problem is, the speaker can never be what the Performance is!
For the listener, this then becomes a function of listening to a stereo – and not of experiencing a music EVENT which
had once occurred. What remains is only a single waveform (per channel) for transmitting many different sounds simul-
taniously which are competing to individually be heard – though being crowded together within that signal. Now the
responsibility of audible rendering of that cluster is borne by the loudspeakers and evaluated by the listener.
Endemic to the process of reproduction itself, the co-occupation interference know as “masking” begins with as few as only
2 instruments. So ends any further acclaim to the ‘straight wire’.
The only way for the recording to come to life again is to (within the signal path domain) enable each of the individual
sounds to carry its own sonic ‘identifier’ as it were – freeing the music from the destructive effects of masking. This ideal
would minimize the collapse at the final transfer (the speaker) where the electrical signal re-enters our world to promote a
re-born auditory genesis, as if original, again.
As a loudspeaker and electronics designer, audio-crazed musician, as well as a professional in the sound recording industry, I
have conducted a great deal of research and focus into the inner workings of this multi-layered subject – though what is presented
here is expressed as non-technically as possible. Nonetheless, this writing assumes a love of music and a basic understanding
of the sound recording process by the reader, as well as an inquisitive curiosity for the absolute.
Come along as we ask the tough questions that stump even the experts and get to the bottom of a long-standing perplexity.
As with any challenge, creating a solution demands first, understanding the problem precisely. My subjective findings
may differ from that of some readers as my point of view carries a bias drawn from numerous personal observations. My story
however, is not unique but remarkably consistent with that of my esteemed colleagues – who share parallel conclusions
from learning experiences such as mine.
While growing up in a fully musical family, complete with a superb-sounding playback system for listening to recorded music,
I was accustomed since birth to an above-average sonic standard of reference – though primarily taken for granted. Much to
my confusion, I was finding that this very necessary and enjoyable sensation was rare and unusual.
I wondered why other music systems at friends’ and acquaintances' sounded so awful by comparison - and they didn't seem to know
the difference, or even care! To me, music is meant to be listened to, not just heard. Well I guess, to some folks, just
the mental reminiscence of hearing their favorite song playing was good enough - unaware that the lackluster presentation
imposed a serious impediment to an even greater appreciation.
I was spoiled to great sound early-on and not realizing that this kid was progressively becoming an incurable critic of sound
quality - extremely hard to impress by even the finest equipment on display at the HiFi stores. Good enough wasn't good enough.
However, the most impressive sound of music to me, by far, was not of a reproduction at all, but from occasionally attending
Live Symphonic Orchestra Concerts.
As most would agree, this type of musical experience is so Vibrant, Majestic and larger-than-life, that surly NO stereo
system, even today, could possibly replace it. None. This is a key reason we attend such events in the first place because
we become part of the collective happening of the moment - not just to hear its 'sound'. It would really be something if that
kind of Power and Influence over the audience could be re-created by a home playback system one day. “But how?”, I wondered.
Compared to Live music, it was puzzling how, while listening to even the finest recordings, the activity seems to be in one
room, while the listener is in another as if needlessly separated by a barrier with two cut-out holes (the speakers) as a passage
way for all the sound to funnel through. Or in other words, like porthole openings in a separating wall - instead of granting
the unrestricted realization of a complete musical happening.
This makes the speakers (the cut-outs) a false point of origin - delivering only a sonic skeleton for mental reminiscence and
depriving the listener of the full presence of the musical event as it originally occurred. (This is a glaring handicap to the
ultimate goal of high fidelity reproduction itself.)
Let's take a closer look at some of the inherent shortcomings of the final acoustic transducer (the loudspeaker) to understand
the difficulty of its responsibility of articulating a believable re-telling of a musical event.
Here's a little science experiment you can try.
Temporarily disconnect the wires of one of your stereo speakers from the back of your stereo amplifier or receiver. (Power OFF)
Next, get a 9 Volt Battery. Now, briefly touch the ends of the two wires leading to the speaker, to both of the 9V battery's
terminals. Do this several times, briefly, and listen. 
Do you hear anything?
Sure we hear something, but what is that?
As you know, the stereo's amplifier delivers 'music voltages' to the speakers. The battery, however, delivers sudden bursts
of 'pure voltage'. This abrupt aggravation causes a response similar to tapping on a drum with a drum stick. Both of these
items vibrate the air when provoked, but one is designed to Produce its Own sound (the drum, of course) but the speaker
is supposed to only Reproduce sounds originating from another source, not itself - like from your CD player, for example.
(Or the 'sound' of a battery in this case.)
Of course, a battery has no sound, but rather, provides a signal of two extremes - a Direct Current, delivered very quickly.
These two modes (DC and essentially, RF)  are beyond the service bandwidth of the speaker system. Since the speaker can't
reproduce either of these two extremes, it "spills its guts' in error and reveals its particular sonic signature coloration instead
- and expects you to blindly accept this noble attempt as an accurate acoustic analogy of that signal - or at least, good
enough. This is definitely Not the sound of a battery, instead, the sound of the speaker's mistakes!
Transient source signals like this (which are also an abundant part of the 'music voltages' from your amplifier) provokes your
speakers to produce that same outcome (the mistakes) as the battery test revealed - concurrent with the music content.
This is bad. This parasitic noise interferes with the music you are wishing to enjoy, and turning up the volume won't rise
above it because that causes even more 'speaker sound' to occur. This is like striking the drum head even harder and wondering
why it continues to exhibit the sound of a drum. Duh.
Another way of looking at this is to consider a speaker at rest as a sonic signature waiting to happen when imposed upon - and
the output of your amp as an AC amplitude modulator (you know, as each stick-hit modulates the drum's acoustic output).
So the result is... Amplitude Modulated Speakers.
You see, any unreconciled sounds which a speaker loses or adds during this modulation, is Not conducive to sonic realism. (A definite
obstacle to what we're expressly seeking.) The liability of living with these persistent artifacts inhibits your enjoying
a more convincing listening experience.
In contrast, next time you're at a jazz club, or any recital hall where music is being played right in front of you - LIVE,
observe how this event sounds so much more ‘open’ than a Recording of the same material, because there are no speakers
honking at you during the performance! Even if you were blindfolded, you could easily decipher that you are not at home listening
to your stereo system.
Every time you listen to a recording, you are hearing the Music plus the influence of the speakers, not the pure music. In
a perfect world, speakers would behave better as if sonically invisible so we could hear more of that Pure Music.
Hmmm. If that were only possible.
 Although 9 volts is well within the safe operating range for most speaker systems, this is other than the original intended purpose and you may do so at your own risk. The author is not responsible for any resulting consequence arising from misuse or abuse. Under no circumstances should this experiment be attempted on headphones or with powered speakers such as computer speakers or powered subwoofer.
 The confirmation of the presence of Radio Frequency Energy from the battery's momentary connection, is evidenced by placing an AM radio near by (tuned between stations) - and observe.
Last time, we discovered how easy it is for a loudspeaker to reveal its unwanted coloration by aggravating it with the sudden
burst of a transient DC voltage from a battery. The output of your amplifier also provokes the same response - with the
sound of the music competing to be heard over the speaker's own sonic signature. (Not exactly the reality we're seeking.)
But if the speaker/battery test was a bit more trouble than you wanted to get into, here's an easier illustration with
Sharply slap the bottom of an empty coffee can with your fingers. The Can represents the speaker and the Slap represents the
sudden impulse from the battery. No surprise here - that sounds like an empty can. If you were to speak into it, we would now
hear you, PLUS the can. We can recognize your voice of course, but you sure sound weird, and shouting into it is like turning up
the volume - increasing the amplitude of the modulation – which is distorting your voice even more. It's only when you eliminate
the can (or the speaker) that we are able to hear you as you really are. That's much better!
The coffee can example is a gross exaggeration of how speakers cloud the music and distort reality. But if there were speakers
that sound this bad, maybe they could get work as a megaphone since they're just not cut out for high fidelity work.
As good as some are though, there are NO perfect loudspeakers. If every speaker in the world Were perfect, they would, by
definition, all sound alike - regardless of their design, and we know that's not the case. They each add their own artificial
'coloring' to the music you're trying to enjoy.
Thought: An electro-acoustic transducer which is designed to produce a very distinct sonic signature, is your car horn.
The horn has a very high resonance quality, or high "Q" - oscillating at one particular frequency when voltage is applied.
If a car horn could be driven by music voltages instead, do you suppose you could still recognize the song?
Probably, but I doubt you could tolerate the listener-unfriendly resonance honking at you Louder than the music itself. Ouch.
In contrast, the more 'invisible-sounding' loudspeakers have much lower "Q", and any resonant frequencies should ideally
be out-of-band for each driver, but in fact, they exhibit multiple 'mini-resonances' throughout the spectrum.
During the battery test, you may have noticed this yourself.
With a typical 3-way/3-driver configuration, for example, you may have observed one distinct sound coming from each of the
three drivers, simultaneously - something like, "thump", "knock" and "tick". (Sonic signatures all)
Add to that, the spurious resonance of the cabinet (box) itself. Even the crossover network and the physical orientation of
each driver influences the sound, as does the interactive behavior between a speaker system and the particular amplifier
driving it! (Complete topics of discussion on their own.)
It's obvious that speakers and Car Horns exhibit similar characteristics, the distinction being: one generates resonances
by design - the other by default. Think of your speakers as better behaved "horns" with lower 'Q', but inaccurate as a
Literal Translator of electrical analogies into acoustic.
'Good' sound is easy to find, but we're looking for something far more rare - a transcendent realism bordering on Magical.
So, since perfection is not to be found here, let's go another direction - into Fantasy Land.
Let's say we've just acquired a pair of the world's first, truly Perfect Loudspeakers - the "Mega-Tech 3000s". These babies
are great! And with the world-wide hype surrounding this audiophile's dream speaker with the big promise to make all
others obsolete, the anticipation is frightening!
Much to our dismay however, we discover that the best they can do is deliver a perfect portrayal of a "recording". That's right,
JUST A RECORDING, Not an actual re-creation of a musical performance. Of course not. Not even as a fantasy.
But why not? After all, the "Mega-Tech 3000s" are perfect!
The owner's manual says it's because this 'perfect' transducer is capable of accurately translating into acoustic energy
only the signal it is being asked to render. But even with the very finest recordings, we observe that we are still not
able to re-create breath-taking realism, but are consciously aware of a reproduction-in-progress - and not as if experiencing
an actual occurrence. What a disappointment. That must mean that perhaps the shortcomings can not be solely blamed on the
speaker, but also, the available signal itself.
If that's the case, what is it about the signal that won't allow the speakers to bring forth the music-genie-in-a-bottle?
The downfall begins immediately upon capturing the original sound source with the initial transducer (the microphone).
During a resent conversation, a Professional Microphone Rebuilder for the recording industry concurred that even the most expensive and
best engineered mics are inferior when compared to the human ear.
This means that your own ears are the ultimate arbiter being of superior hierarchy to any lesser form of pick-up. In other
words, the recording process is disadvantaged from the start, and your supreme acuity can only look down upon it. No wonder
a recording sounds only like a reproduction to us!
Any recording facility will proudly extol the special virtues of each of the superb microphones in their arsenal, such as,
"This one sounds bright and clean, this one sounds smooth, this one adds richness & warmth". In other words, each microphone
is best suited for certain applications. However, barring practical differences in pickup patterns, sensitivity, or maximum SPL,
what's apparent is that each mic sounds DIFFERENT.
This ought to provoke a thought, and that is: When listening to dissimilar instruments, we don't have to replace our ears
in order for each instrument to be heard as consistently Real.
As good as some are though, there are NO perfect microphones. If every mic in the world Were perfect, they would, by
definition, all sound alike, regardless of their design, and we know that's not the case at all.
To appreciate the microphone's job, let's visit a recording session of say, a sixty-four piece orchestra and see what happens
when a grand musical performance is downgraded to a mere electrical signal by the microphone, then amplified in hopes
of re-living that magnificent experience.
We'll do this by employing two completely different types of micing techniques for simultaneously capturing the same
performance; (1) Direct to 2-track, and (2) a 64-track Mix.
Direct to 2-track is regarded as perhaps the purest method of capturing a musical event as it happens, with no mixing involved
after-the-fact and no alteration to the sound captured by only two mics. This should work out fine because, after all, we
observe the entire sonorous world with only two 'ears' as well.
Even so, this clean signal will show its inadequacies when made audible during playback, in addition to the usual "cut-out holes'
and Q resonances waiting down-stream. In fact, all the following real-world qualities of our orchestra are compromised,
or lost completely in this pure, straightforward transfer from 'authentic’ to 'wire'...
The sense of proximity, size, texture, airiness, space, and immediacy-of-presence. (Confirming the 'you-ARE-there' certainty)
In short, the very "Life" of the sonority has expired.
Curious that all of these special qualities were present and plainly observed by our two ears in the first place - and
we assumed the microphones indeed 'heard' what we heard, but what actually happened?
These important but fragile subtleties have become embedded within the signal, mixed in with the sound of each instrument –
creating a convoluted jumble as one homologous cluster. The speakers then interpret and deliver this package as basically
one massive instrument with a very complex and confused waveform.
With all the dimensional cues occupying each other's 'space' as our electrical modulation signal, the speaker has no choice
but to place these elements all at the same location of auditory propagation (its baffle board) and expect the listener to accept
the resulting flat plane as Real(?) - when just a simple turn of your head (angle of the ears) confirms it is not . 
So much for direct to 2-track.
When considering our other recording technique (the multi-track mixed to 2-channel approach) maybe this is a better solution.
With this method, let's say we've recorded our 64 piece orchestra this time with the sound of each instrument being captured by
its own individual microphone routed to respective inputs of a multi-track recorder. So far, so good. Except that it's
a pity we can't play this back with 64 channels of amplification driving 64 individual speakers to re-create a full-scale
re-enactment portraying the true size and space of the original orchestra. Wow!
Although I like the idea, such a playback system is impractical. Instead, our task is to mix-down these separate instruments
so that each will have a lateral position across a conventional 2-channel panorama so that our 'perfect' fantasy speakers can
somehow make the orchestra re-appear from that crowded line-up.
This isn't a good idea at all, because two baffle-boards (or two cut-out holes in a wall, as it were) are completely
inadequate to convincingly re-establish 64 individual instruments and their unique positions and proximity to the listener. All
that, through a conduit with ONE ambitious waveform per channel!!
Would you call that a High Fidelity Reproduction, or even an accurate replication of the original event?? Of course not.
Think of 64 lanes of traffic attempting to occupy a medium of only 2 lanes - all at the same time.
Wait a minute. This is the 21st century. We want orchestra in the round! So instead of two channels, suppose we were to
make a new 5.1 channel surround mix to accomplish reality.
What a great way to spread the acoustic action all over the room by providing more points of entry.
At least this is a step in the right direction and is certainly helpful, but it doesn't quite deliver what we're seeking because
the orchestra's total sonic architecture is still compromised this way, and exalting a mere perimeter of port-holes as reality is no
celebration. At a concert, there are certainly more than six points of origin. Who is this fooling?
Remember, we're not just seeking a sound which reminds us of an orchestra (that's easy to do) but of re-creating the
experience of its awesome grandeur as if we're actually present.
That's a tall order, and it seems that even using three pairs of our beloved Mega-Tech 3000 perfect speakers can't help us
now, regardless. If jaw-dropping realism is our goal, we're sure not finding it here.
The ear knows (and prefers) reality when it hears it, so since we're not about to lower that standard, let's see what
can be done to raise the quality to a point which can meet our intrinsic level of discernment.
 This simple turn of the head is a great way to see how few, if any, of the fragile, dimensional subtleties have endured. In our 64-piece orchestra example, this would reveal 64 individual points of origin, where as, a recording would reveal only as many as there are playback channels.
So far on our trek to Musical Mecca we've encountered a few snags alone the way. While hoping to uncover Sonic Reality
Emulation, we've discovered another reality of sorts – the limitations with the input and output transducers which are
necessary for the reproduction process (microphone & speakers).
We had high hopes that with care we could coax our imagined fantasy speakers, the Mega-Tech 3000s to come through for us
with sensational realism as we oversaw a recording session ourselves. Still hopeful, let's take it to the next level.
Suppose the answer would be for this audio signal to be as close to a 'straight-wire-with-gain' as possible - relying on
the longer digital word-length and higher sampling rate of the best available recording technology. Seems logical, except
that this only extends resolution and doesn't provide correction of the primary cause of the signal's collapse .
Now, with the wire as straight as it can be, let's consider Resolution vs Playback Volume. Although this may be a new
concept for some, it's easier to understand in visual terms.
Imagine a color photo-copy of a tree leaf. We know that a single leaf exhibits a special texture and contains minute details
right down to the microscopic level. Amazingly though, the photo copy looks virtually indistinguishable from the real leaf
at arm's length. Amazing - even when the two are side by side for a direct A/B comparison! Everything seems fine with our
look-alike until we bring it closer to see more detail - making it appear larger to the eye. (Analogous to turning up the volume.)
With anticipation, we hope to discover more of what a leaf is all about and hail our replica as Real. However, the greater
the magnification (or amplification) the less real it becomes, not more - disproving its authenticity. The degree of
magnification has plainly exceeded the available details. Disheartened by the infidelity, we now prefer this wannabe
substitute only from afar - or perhaps, not at all.
What this means for Music Reproduction, is that beyond a certain playback volume, there is no further detail to be discovered.
This is the point of No Further Resolve - or point of Dissolution and Disillusion. Any greater increase in volume is of no further
benefit, not desired and no longer fun to listen to. Exceeding this level causes electronically amplified sound to become
irritating to the listener - even to the point of causing pain.
We observe that naturally occurring sounds can be tolerated much louder than this, so if we wanted recordings to be truly
life-like, all we would have to do is come up with a system which offers infinite resolution - and some perfect speakers
wouldn't hurt either.
You didn't really think it was that simple, did you? Besides, this isn't the whole picture.
For those who are optics experts, you already know that even if our photo copy had infinite resolution, that does nothing
for supporting the leaf’s light-source-influenced translucency, refraction, or specular highlights of its unique texture.
Resolution is only a starting point to realism.
The same applies to Natural sounds - which also contain features analogous to texture and illumination as well. But being devoid
of these qualities, a phono-copy sounds "starched" and irritating when magnified to high volume levels.
I remember some years ago, a full page ad in one of the popular stereo magazines for a high powered audio amplifier. The caption
read something like this: "When your wife says turn it down, she really means, “Turn Down The Distortion".
We know by now, that amplifier distortion is not the real issue here, but rather, the entire Process - which is made even more
upsetting, in this scenario, with an inferior amp deteriorating the signal first - therefore, guilty of being the weakest link.
To witness an example on a grand scale of the annoyance caused by provoking this state of dissolution with no further reward,
just go to the movies. Here we have many speakers, wide dynamic range, and lots of power. When the audience complains that
the movies are too loud, what they really mean is, too annoying to the ears and an insult to their sensibilities.
I thought you might find it interesting, that because of this resolution breakdown, some of these sound-intensive films are
mixed while the sound engineers are actually wearing ear plugs! Doing so makes perfect sense because of this occupational
hazard. However, unless the audience is provided with the courtesy of hearing protection as well, how could that practice
not be annoying??
In all fairness, the sound professionals and filmmakers really aren't the 'bad guys', nor is it only an issue of
overstepping the resolution barrier. The high volume of textureless sound forcing a sonic 'traffic jam' of convolution
over a mammoth Public Address System is what's preventing the experience from being more realistic and enjoyable. 
Well, at least the seats are comfortable.
In the theater, I have personally observed, as the opening sound logos appear, the audience actually snickering from
indifference. (As if to say, "Been there, heard that".)
Impressing the audience today demands something beyond celebrating sound for sound's sake, it's about better assimilation
- this demands being wiser than the audience’s own awareness.
Other competitive industries understand the importance of presenting a more inviting product to the market and will devote
considerable effort, carefully designing their products to provide a proper and pleasant interface with the customer.
This is true for everything from the cockpit of a sports car to something as simple as an ink pen. Why? Because better
ergonomics makes for better business! Duh.
So why does high-powered theater sound continue to be irritating and how is that expected to serve the story-telling process
or drive the box office?
It's simply because, in the world of amplified sound, little is known or understood about the art and science of waveform
propagation modeling, specifically crafted to appeal to the ear of the customer. Think of this as EARgonomics.
This condition of high sound pressure levels exceeding the available details from the signal, also exists with your home
or car playback system when pushed beyond the point of resolve - in addition to the loudspeaker's usual inadequacies.
With such obvious distortion of the original source being so prevalent, from both the signal structure and the speaker,
doesn't the industry know about this universal problem?
Well, the loudspeaker manufacturers certainly do – if they only knew what to do about it, other than to stylize the speaker's
sonic personality to obliterate the chronic deficiencies of the available signal. Here is precisely what I mean...
Over the years we've seen many novel, and sometimes bizarre loudspeaker designs developed as an attempt to break up the
signal convolution by simply spraying the sounds around the room in different directions - and asking the listeners to
'buy into' this as authentic to the original.
Some popular examples for home playback are Dipole/Bi-Polar, and Direct/Reflecting designs and some with drivers positioned
on the back of the cabinet. There are even designs where the HF and Mids of a stereo pair are conventionally front-mounted,
but the two LFs are facing each other from either cabinet.
Although these attempts are clever and the dispersions produced are novel to the ear of the listener, this is definitely not
the way sounds propagate from musical instruments. (Or any other source for that matter)
The problem now is the speaker's particular radiation pattern. This unfortunately contributes to its total sonic signature coloration
as well (which we're trying to avoid) compounding the issue even further - and never restoring authenticity.
In the business of loudspeaker manufacturing, the ultimate goal of course, is simply to produce a product line that will sell.
Design engineers realize that perfect accuracy may never be achieved and it's too difficult to attempt anyway. So in many
cases they go the other direction and intentionally 'craft' the idiosyncratic sonic behavior as a way to subjectively stylize the
performance of a particular speaker system in hopes of out-dazzling a blander-sounding competitor's product - and
hopefully make a sale to the non-discerning consumer.
You may have noticed how some speaker brands exhibit an audibly distinguishable 'family resemblance' throughout the line as
'their' sound - in other words, the ‘sound’ of their speakers. (Something again we're trying to avoid)
For satisfying the relentless pursuit of passionate music lovers and audiophiles seeking to upgrade to a more persuasive playback
experience, the task of selecting a new speaker is not just about finding the one you like the most, but rather, dislike
the least. While all your candidate choices are being driven by a deficient input signal. This is a rude awakening for those
who would dream of perfection, such as with our imaginary M T 3000s.
But have you heard the new "Neapolitan 4000s"? - Very Colorful
 A telephoto lens can make a distant object appear larger to the eye, but flatter - making depth & texture nearly impossible to perceive. Likewise in the theater, the speakers are also far away, and being magnified (made louder) also appear larger to the ear , but still flatter - which also makes depth & texture nearly impossible to perceive.
Previously, we’ve observed that at both ends of the signal chain (where a live music event in the acoustic world enters the
electronic world, then back again) is where a musical performance becomes downgraded and deformed into an inanimate representation.
This is why recordings no longer sound alive to us anymore.
We’ve noticed that both types of transducers face a substantial challenge when making this transfer because a microphone does
not act like a human ear in the first place, and no loudspeaker IS a musical instrument – only a messenger, at best.
In designing loudspeakers, the intended objectives vary widely from perfectly replicating a lifeless signal on one hand, to
the artificial stylings of kaleidoscope coloration on the other. Since our ears crave reality, which one would you dislike less?
One extreme causes listener disinterest to a sonic flat-line, while the other causes disinterest from a translator with its
own interpretation (not reliably accurate to the original).
Believe me, if your speakers truly could perfectly reproduce what the lifeless signal actually consists of, while adding no
impressionistic sound of its own (In other words if they didn’t “hum” along with the music), you would not accept the cold,
dry, impersonal and distant harshness as Reality.
So until a microphone IS a human ear, and loudspeaker IS an orchestra, the last possible hope to overcome this handicap
is to consider re-animating that dead signal. This could only be accomplished by effectively recovering those properties which
failed to endure the transduction. How tough could that be?
As our quest for phono-realism continues, let’s focus in on just one solo instrument and consider the difficulty involved
in capturing and convincingly replicating a “musical life-form” as observed in its natural state. We’re going to figure out
exactly what makes it sound alive in the first place.
Let’s say someone is playing an Acoustic Guitar right in front of us. Seeing it is not necessary. With your eyes closed there is no
mistake – your ears verify its identity as being absolutely genuine.
Now we’ll have the musician step aside because we have recorded the performance using the finest microphone and recording gear.
To play back our acquisition, let’s put a reference monitor exactly in place as the guitar and listen at the same volume level – and again
close your eyes. Is it alive? In other words, Really Happening?
You’ve just witnessed the stillborn delivery of a guitar. What remains is only a sound which reminds us of a guitar –
a very good facsimile. But something’s not quite right.
This becomes an interesting study - analyzing the perceptive differences to the ear between Authentic vs Replica.
Let's examine what's really going on. The pure sound of the real guitar being played by our musician can be observed as
having 3 main elements: "Point Source", "Body" and "Diffusion" (which helps establish its placement and significance within
an acoustic setting).
Let's define these terms.
1) The POINT SOURCE. This is the first early arrival of the pick striking the strings. The direct line-of-sight enables
the listener to identify exactly where the sound is coming from - the point of origin itself.
2) The BODY. As a reaction, the strings now vibrate the wood box and establish the sound of that particular guitar. This
also suggest the size and shape of the instrument being played.
3) The DIFFUSION. The 'after effect' of reflection vs absorption of remnant sound waves within the listening environment aids
the listener as to a sense of proximity and depth perception. Although mostly 'felt' rather than directly heard by the casual
observer, 'room signature’ greatly affects the perceived musical value of an instrument. If this sound field were neglected
or altered, we are reminded of its important, supportive role.
To illustrate, which extreme do you suppose is a better setting for our guitar, an anechoic chamber, or a gymnasium?
Of course this is a silly question because one extreme 'swallows up' the sound - leaving it too 'dry', while the other becomes
de-localized being widely pulled apart. You see, only the right acoustic setting (ambient diffusion surrounding the 'body')
is appropriate in providing proper sonic value and musical significance.
Please note the obvious, that these 3 facets; pick/strings, body/box, and room/surroundings are of 3 different sizes and
appear in three different places - at slightly different times.
These qualities of an original musical life-form are all easily verified by the ear as natural and authentic. However, because
the recording process has distorted size, position and timing - and collapsed these important components all on top of each
other, it's no wonder our play-back doesn't sound like a real guitar. The task of rejuvenation is now left up to the
loudspeaker itself, and we know that speakers are already preoccupied with their own dysfunction. Hmmm. What to do next.
Perhaps now is a good time to introduce some kind of creative compensation to our signal path - as a hero to save the day.
With that ideal in mind, and having just scrutinized Live vs Playback right in front of us, how would you go about correcting the
waveform of the signal to successfully recover this collapse and restore the natural architecture of just this one instrument?
It's okay. Take your time. In short, you can't. Just ask any Recording Engineer or Mastering Engineer who's ever tried - using any combination
of the countless tools of the trade.
Without succeeding in persuasively re-establishing these definitive life-form indicators to the ear's satisfaction,
there have been all sorts of attempts at signal alterations, hoping that the addition of further coloration might somehow
make up for the loss by adding even more chaos to confusion.
Some of the more popular devices are available as stand-alone processors, or as computer programs for digital audio work-
stations - with a variety of schemes such as special EQ curves, phase rotation, artificially generated upper harmonics, speaker
emulation, and synthesized tube/valve overdrive emulation.
Raise your hand if I happen to mention even ONE which you feel might be capable of reconstructing the architecture and restoring
life to the original - instead of going the other direction!
I'm not against utilizing such effects as needed, but the result comes across as more convincing of a recording to the listener -
and less like the real thing. But we're searching for stunning realism in reproduction, not just the next flavor of the month.
My particular position in the industry is in post-production. This affords me the luxury of a bird's eye view of many col-
lective talents and technologies - as productions come to my attention for final detailing before being ready for the world.
The universal desire of all production people is to release a more competitive and attention-getting product, and to compel
the listener to want to enjoy the material over and over again – though the specifics employed to accomplish that end may be
as varied as the diversity of the productions themselves.
While some Producers and Engineers maintain the less-is-more approach to signal processing to keep pollution to a minimum,
others tend to over-use such effects out of habit, making their productions sound less like a performance - which may actually
be repelling, rather than attracting the very audience they're wishing to impress.
My point of view maintains that the storage media is capable of much better and, in many cases, the productions are certainly
deserving of much better as well.
Regardless, no artificially added 'trick', nor lack thereof, makes the dream a reality unless the life-like qualities,
persistently lost in translation, could be effectively re-established. Plainly put, if the ear is deprived of the
natural properties, the output can not be lauded as real! (With "real" being the very aim of high fidelity itself)
In a casual, face to face conversation with an Audiophile Label’s famous jazz saxophone recording artist (whom you
all know ) I asked him if, in the studio, the playback of any of his tracks had ever been mistaken for an actual saxophone.
He responded, "Of course not".
And this is the result? - utilizing exceedingly high quality reference monitors and professional-grade microphones??
Yet, convention repeatedly assures us that this is sonic realism at its finest and the best the planet has to offer -
so don't expect anything more - Period.
Please humor me while I pause for this personal interjection...
Assuming my fellow audio systems designers have not lost their passion, it seems strange that other technology’s
advancements have far surpassed that of Audio – especially since we’ve had over a hundred years to get it right.
If there is a better way, would it have to come from another planet – or is there something here we’ve overlooked?
Last time, we considered two different persuasions within the sound recording community - the noble, straight-wire minimalist,
and the habitual, special effects knob-turner.
We've identified three naturally occurring sonic 'life-form' properties of a single acoustic instrument, and how difficult
this is to maintain - if the goal is ‘authentic’ reproduction.
Furthermore, the challenge of convincingly re-creating a music event becomes even more overwhelming when we consider the
incredibly complicated waveform of many different instruments all playing at once, such as with an ensemble or full orchestra.
Even the notion of using the same number of playback loudspeakers as the total number of instruments is optimistic, because we've
witnessed right in front of us that a single speaker can not convincingly personify even one instrument.
Up to now, we've been avoiding the further challenge of reproducing a very special musical instrument indeed - The Vocalist.
The human voice is certainly the most natural and perhaps most emotionally expressive of all acoustic instruments,
yet in the recording world, it's often the most abused. (Or at least, "Doctored")
Beyond the 'Big Three' - (point source, body and diffusion), I've observed the human voice to exhibit at least five individual
aspects which are unique to vocals. These separate, but smaller details meld together to form an imprint we identify as an actual
human voice - which must be respected throughout the recording process if the intent were to have the performer appear to be
virtually present during playback.
This is practically impossible and may not even be desired from the outset by the production personnel at all.
How could this be in a world striving for even higher fidelity?
Regardless, it's a common practice to routinely over-process vocal recordings, staging the singer as more unnatural and less
real by obscuring these fragile and otherwise elegantly proportioned complementary components.
This is actually not a terrible offense, but the focus of this overview is exploring new approaches toward perceived aural
realities. So keep in mind that the real-world sonic elements we observe from vocalists, instruments or an ensemble, appear
in different positions, proximate to the listener, as well as size and depth perception provided by micro-timing arrival cues,
as an active texture within a surrounding ambient environ.
Captured to the domain of electrons on a wire, this represents
countless (and specific) micro-parameters to be responsible for, and the art of getting it right is so crucial in satisfying the
ear's scrutiny so as to infer artificial as no longer artificial. This could be difficult! The ear/brain is quite certain when
sound sources are genuine and, efforts of appealing to psychoacoustic perception are an interesting attempt at audio's
"virtual reality" - though still leave something to be desired.
To push our sonic realism quest even further, bring your best ears with you as we take a quick look inside the playful
fun-house of 3-D Audio, and Binaural vs Transaural.
First, in contrast, Monaural playback pays no particular respect to the left/right, nor three-dimensional aspects of perception,
where as, Binaural and Transaural do - with some big differences.
Binaural recording techniques intend left channel information to be heard only by the left ear, and vice versa. The method
for recording binaural sound is generally accomplished by "dummy-head" recording - where small microphone modules are
placed within the ear canals of a life-size dummy head.
For playback, the discrete left/right information must remain separated for the effect to be properly realized by the listener.
That means that loudspeaker playback is incompatible in this case because both speakers are heard by both ears - nullifying
the effect. For binaural recordings to be fully appreciated, the listener must agree to wear some sort of listening apparatus
for keeping left and right sounds separate from each ear (stereo headphones). But turn your head and the whole perceived
room also turns, or when you take a few steps forward, you haven't moved any closer to the source. This ain't real!
You may wonder why reality simulation requires the audience to be dependent on a decoding device such as headphones, or
in the case of 3-D Movies, polarized or 2-color glasses. Good question, because experiencing true reality in nature
requires no such 'appliance' and is consistently absolute. Yet such simulations are so carelessly described as "realistic".
So let's consider Transaural. Assuming 2-speaker playback, small amounts of left and right signal co-occupy each other's
channel - out of phase. This technique is known as crosstalk cancellation, or sometimes called, cross-phasing. This is
employed to expand the perceived width of the soundstage.
Continuing beyond the simply widened panorama are the schemes which propose virtual surround from only two speakers.
This approach applies a considerable amount of fore/aft phase manipulation within a controlled frequency range to suggest
source positions residing within an O-shaped, or U-shaped configuration before the listener.
Although a certain amount of head rotation is permitted, the severe phase angles necessary to produce the orbital pattern
eventually plays itself out and becomes predictable and tiring. (listening fatigue)
The simulation of phantom images, displayed by this effect can sound quite amazing as long as the listener is willing to
remain fixed at a specific "sweet-spot" between the speakers. However, the illusion will be greatly compromised or lost
completely if the listener were to freely move about the room as one might do in a natural setting. Odd, this parallax
breakdown doesn't happen when listening to Real sounds. Besides, your own ears affirm that the exploited phase-angle
distortion of this mirage is not a product of nature.
The unfortunate downfall of these types of effects aimed at expanding the usual stereophonic experience, is obviously their
conditional playback requirements. Again, the naturally occurring 3-dimensional depth observed when listening to a live
event involves no such contingent or restrictions in order to be fully enjoyed and recognized as authentic. In other words,
real is real because it behaves real - and it endures.
This is how difficult reality simulation is - and that only indisputable mastery of the art could truly bring it home.
But let's consider why even some audio-savvy listeners don't seem to be particularly reactive to disturbances caused by…
(1) Artificial signal tampering,
(2) Shortcomings of the microphone, or
(3) Honking Loudspeakers.
It's not that they lack the acuity, it's about pre-established acceptance - illustrated here by an extreme example:
Could you imagine someone answering the telephone, recognizing the caller’s voice, but saying "Ooo, you sound like a telephone”?
Of course not, because we have already learned that this is what a telephone sounds like. Never are we shocked by what
we're already familiar with.
We all possess a learned ‘reality curve' of expectation to each media of conveyance such as the Telephone, AM Radio,
FM Radio and TV sound. I suspect none of us confuse a news broadcast on the radio with someone actually speaking to us.
It's not necessary - and it's certainly not expected to.
But what if it did? That would surely grab our attention, because, "We weren't expecting that!"
Now we have a reaction!
The same applies to the high-end realm of both Compact Disc
playback and the Home Theater experience - which only afford a limited level of sonic expectation as well. So, never are
such reproductions expected by the listener to replace the fulfillment and satisfaction of Real Life. We already know
this long before we push the Play button.
Being accustomed to less-than-real performance is universally accepted by many as perfectly normal. In other words, we
don't say, "Ooo, this CD sounds like some sort of facsimile". Even the obvious limitations of the finest recordings are
accepted as 'just the way it is' - having met the expectations, therefore absolved with no further thought.
Compared to Visual Arts where the slightest flaw is unacceptable, I must confess that many of us who work in the Sound Industry
are sometimes lulled into accepting the existing standards as good enough for us – so it must be good enough for the consumer.
On behalf of the industry, my apologies to all.
But those of us who symbolize the extreme outer limits of sonic reality ask why this should be tolerated. Only we audio-
crazies would dare demand something more of our playback experience – especially considering that other industries are insistent upon
continually raising the bar, shifting the paradigm, and thinking outside the box. Why? Because good enough isn’t good enough.
Can you imagine an artist (of oil paintings) being perfectly content if the Print Reproductions of their fine art appeared
slightly degraded from their original creation? Unthinkable! Yet artists of musical works seem accustomed to allowing their
fine art reproductions, to sound no where near as compelling or entertaining as it should. Playing a CD is not the same
as experiencing the artist performing right in front of you!
Plagued with problems, the process itself is its own worst enemy, which inhibits what presentations potentially have to offer.
Even though we're all desensitized to these shortcomings as an everyday occurrence, the demand still remains for satisfying
the insatiable cravings of the crazed, as the passionate insist that music is intended to be experienced - not just heard.
Here's what I'm referring to. Unique to all other beings, Humans respond emotionally to music, whereas other creatures may respond
simply to its changing frequencies and patterns. Hmmm. Maybe that's why there are no lower life-forms studying this subject.
But we do know for sure that an herbivore would never accept a photo-copy of a leaf when the innate desire demands far more.
Being human gives us a special distinction of appreciating the endless variety of our own universal language, and yet it seems
odd that such a sophisticated beast would settle for a mere phono-copy of an otherwise sensational experience uniquely ours.
A Feature Film Director shared an interesting story with me regarding a particular movie-music recording session for one
of his pictures.
It’s customary for the director to sit along side the Recording Personnel at the mixing console – looking through the glass of
the control room into the scoring stage (seeing the orchestra and conductor). The sound of the orchestra is blocked and
can not be heard except through the studio monitors - only.
With the tone in the air being strictly business, the task at hand is to evaluate the movie’s various music cues as would be
heard from the theater audience’s perspective.
As a musician himself, this director’s deprivation of seeing the performance taking place, but ‘not’ hearing it, went on for some
time until something unusual happened.
In an unorthodox move, this director asked if he could suspend his duties and leave his post. He was compelled to
escape the control room and enter the scoring stage to personally enjoy a few moments of Full Immersion of the
orchestra itself. This could only be fulfilled of course, in the presence of the Real Thing – not the studio monitors.
This is how hungry that passionate music lovers are for absolute Realism – and how no one with an affinity for the
sublime regards standard electronically amplified sound as a satisfactory substitute for pure, transcendent splendor -
unless an alternative presents itself. Such a solution was precisely what I had set out to accomplish.
We Creative Sound FanaticsTM maintain support of superb excellence to delight the senses of the sound-conscious
audience – as well as to ingratiate today’s music lover by providing a ticket to finally escape standardism and exceed
After all, if it weren’t for the never-satisfied historic audio visionaries of the past promoting “good enough isn’t good
enough”, there may have been no desire to relinquish our outdated standards and dispel the Victrolas of yesteryear
as our ‘ultimate’ playback experience.
Now, with the glass ceiling effectively removed, who says you can’t have it all?