Um den Zusammenhang nicht zu verlieren , voriges Posting stammt aus demselben Thread wie die restliche Xtrahierte Essenzlichkeit :
I’ve been thinking about something lately. What I’m proposing here is an explanation, based in science (for those who demand such things), for why a gently tailing top end response is perceptually more accurate than a flat on axis frequency response. Remember you read it here first!
There’s a school of thought that rigidly prescribes to the target of a flat frequency response as the ideal for any loudspeaker. Minor variations are sometimes accommodated: long high frequency reverb decay times are “allowed” some high end tailoring, to tilt down the top end. Voicing my own personal designs over the years, I’ve never been able to accept this over-all approach.
For example, take your standard two-way with 4th order acoustic Linkwitz Riley xovers: you get a beautiful graph. However, for me, this approach never sounded completely accurate. Over time, in differing rooms, I tend to hear this as excess energy centered around 5 kHz, even with a flat on-axis response. I tend to tip the top end down a bit, sometimes allowing it back up after 10 kHz. I know I’m not the only one that favours this tailored on-axis target.
I can finally explain why this is a more accurate approach (in fidelity, not preference).
Imagine typical stereo creating a phantom center image with speakers set at some angle of incidence to the head. Well, the speaker playing at the right, into the right ear, projects more treble into the right ear than a real center sound source would. Ditto for the left ear.
The only place where the tonal balance is really correct is for an image right at one of the speakers.
To get an idea of the phantom center tonal error, see this head related transfer function (HRTF) graph (averaged over a small population), scanned from a 1966 JASA article by Edgar Shaw:
http://www3.sympatico.ca/dalfarra/HRTF.jpg
For an equilateral triangle set up, a speaker would be at 30 degrees trying to replicate a phantom in the center. The perceived artificial boost at 7 kHz is on the order 3 to 4 dB, with a gently rising characteristic starting at 2 kHz, peaking at around 7kHz, then reaching equality again above 10 kHz. There it is, the dreaded subjective high-end hotness with flat on axis designs.
By compensating using a gentle high-end roll off, our center phantom image perceptually sounds tonally correct again. Of course 3 to 4 dB compensation at 7 kHz adds the inverse error for images at the speakers, so a compromise of, say, 2 dB at 7 kHz sounds completely reasonable. Using a tweeter with some rise after 10 kHz then brings the difference back, and everyone is happy.
Of course this is tricky, as everyone’s HRTF is as unique as his or her fingerprint. However, it’s a very reasonable assumption that everyone will hear more treble from a source 30 degrees incident than 0 degrees. Tailor to taste.
The repercussions of this effect are wide ranging. For example, the HRTF at 30 degrees is also hotter from 200Hz to 1 kHz than the response at 0 degree (note: all curves converge below 100Hz, as the head becomes small in relation to the wavelength and head diffraction effects minimize). Perhaps there should also be less baffle diffraction compensation than a flat measure would indicate. Indeed, many voice to 4 or 5 dB, even for nominally flat acoustic power designs.
It would also result in perceived tonal changes with changes in speaker placement geometry that are independent of room effect and speaker toe-in. Different angles, different perceived phantom image error.
The difference between stereo and single speaker mono was always ascribed to the inherent picket fencing and crosstalk error inherent in stereo. I’d wager that the HRTF difference for phantom image generation is also a large factor.
Finally, and I know Lynn will love this last one as it’ll dovetail nicely with his far eastern philosophies. This effect explains why different people hear different tonal balances from the same speaker/room, and why there is objectively no one “right” frequency response across the population. Everyone needs a response tailored after their own HRTFs, if phantom images are to sound tonally correct to them.
To me, this is a “very big deal”. Almost makes me want to run out and buy Etymotic in-ear mics and get the HRTF characterized.
DDF
>Nah, if the 0 degree angle of incidence is used as a reference and normalized to a straight line, the peak around 2kHz would practically disappear and the peak around 5~7kHz would also be reduced. I agree that everyones hearing is different, but that's beside the point. A clarinet wouldn't be equalized in real life.
And the other thing is: why should a phantom image be only in the centre? I agree that a lot of multimedia sounds are monophonic, but there's no 'standard' listening position, so there are other factors to consider such as the off-axis speaker response, reverberation of the room, and the polar patterns of the microphones used for the recordings.
>You are right in that stereo reproduction of a centered sound image requires a frequency response correction due to the HRTF. You are not the first to realize this, though... Correct me if I am wrong, but didn't the "BBC dip" have an explanation like that? SVANTE
>Interesting idea, Dave. The problem I see with it is both ears are hearing sounds from both speakers, not just right-speaker, right-ear. Your graph doesn't show 30 degrees so let's take 45 instead. What the ear would hear would be a sum of the 45 and 315 curves. And the power spectrum reaching the ear would be even less than a linear addition of the curves because the short wavelengths make the signals uncorellated. Given all that, I think any boost would be MUCH smaller.
>
Sony once sold a pair of headphones with a gyro sensor, to detect head motion and change the HRTF according to head rotation angle. You first hit a “reset” button, then let the gyro do its thing. We bought a pair (rudely pricey) tried it at length, and didn’t work well at all. Problem with HRTF and binaural encoding is that HRTFs are individualized, and the illusion can work anywhere from extremely well, all the way terribly poor, depending upon your HRTF. I’ve sat down and listened to live feeds through Kemars and HATS, played back through speakers with crosstalk cancellation algos. Great effects but it destroys the music. My HRTF doesn’t match the standard curves the industry seems to favour, and I find most binaural recordings to generally sound cupped or honky, if not downright phasey.
Sqlkev: I’m glad someone tried it, thanks! Did you do it with an acoustic measurement: i.e. the absolute response hits this target? Or did the DEQ dial up just the relative tilt between the two angles? What was the original speaker on axis target?
CeramicMan/panomaniac: center was chosen as worst case to illustrate the concept but of course it varies over angle. Some compensation is better than none, and I personally would chose the center as center fill images are the most typical. Of course the room, directionality, recording etc affect things, but don’t give up hope yet. See my response to AJ below, to understand how to best apply this concept.
Oshifis: many experienced builders will build in a tilt like this from day one.
Svante: I’m a voracious consumer of speaker design literature (for over 30 years) and I’ve never seen this concept in print before. I can’t claim to be the first to think of it, but it’s certainly original to me, and not by any means common knowledge. ie in 30 years, I’ve never heard of this.
The BBC dip is interesting. I’ve yet to see any reference describing exactly what it is. It’s the Loch Ness monster of speaker philosophies: everyone claims to know it, but no one can draw it. If you have a reference, please post it. My understanding of it is a depression in the mid band, whose purpose is to add some diffuse field equalization to the response. The idea being the mic picks up incident but also non-incident sound, and a more natural tonal balance has the playback chain apply some diffuse field response weighting.
Hi catapult, as rdf mentions, the “crosstalk” signal shows additional inherent delay, and the tonal summation isn’t the same as if it were non-delayed. I view this as a second order effect and one of the inherent errors in stereophony.
AJ: Here, you’re repeating an argument I used to make on the MAD board. I guess I had that coming. However, the argument is misapplied in this case. The argument goes something like this: If we want to replicate the exact “message” the recording engineer provided, our speakers, speaker set up and room would be exactly like his used in the final mastering, assuming they were targeting achieving having their system sound as close to the “live” as possible (which isn’t that common a case actually). We could look at this in despair and throw our hands up, knowing that our reference is unknown. However, this argument is a justification to reject the concept of “absolute sound” and allow your own personal experience to dictate what is accurate. Since each recording varies, we should more target our home systems to sound as real to us as possible, with what we perceive as accuracy, and with the recordings we chose as reference, rather than some mythical absolute. We could apply this to each CD to decode the difference between each recording’s vision, and ours, but isn’t that a bit impractical? I’d rather play with the kids.
But all is not lost. Here is how this concept should be applied: In the absence of some unknown reference, I look at this way: if you voice in stereo just appreciate the fact that images glued to the speakers will sound a tad hotter than those more towards the center phantom image assuming you’re looking ahead. Try and strike a tonal balance that best trades of the tonal balance at differing phantom image locations. That’s it. IME, the resultant on axis response which sounds the most “right” invariably has the 4 to 7 kHz tipped down then some rebound around 10 kHz. Some less, some more, depending on f3, dispersion, and which room its going to go in. Its the beauty of diy hifi, we all get to roll our own to taste. This effect is real, consider or discard at your leisure.
Tubee, unfortunately we bring our ears to the live gig, and into the stereo room. So it’s a wash, if you’re trying to make it sound like what you think of as being real. I’m sorry to hear about your situation, but it’s a bit unique; compensating for hearing damage is trying to be “better” than live. DDF
I’ve been thinking about something lately. What I’m proposing here is an explanation, based in science (for those who demand such things), for why a gently tailing top end response is perceptually more accurate than a flat on axis frequency response. Remember you read it here first!
There’s a school of thought that rigidly prescribes to the target of a flat frequency response as the ideal for any loudspeaker. Minor variations are sometimes accommodated: long high frequency reverb decay times are “allowed” some high end tailoring, to tilt down the top end. Voicing my own personal designs over the years, I’ve never been able to accept this over-all approach.
For example, take your standard two-way with 4th order acoustic Linkwitz Riley xovers: you get a beautiful graph. However, for me, this approach never sounded completely accurate. Over time, in differing rooms, I tend to hear this as excess energy centered around 5 kHz, even with a flat on-axis response. I tend to tip the top end down a bit, sometimes allowing it back up after 10 kHz. I know I’m not the only one that favours this tailored on-axis target.
I can finally explain why this is a more accurate approach (in fidelity, not preference).
Imagine typical stereo creating a phantom center image with speakers set at some angle of incidence to the head. Well, the speaker playing at the right, into the right ear, projects more treble into the right ear than a real center sound source would. Ditto for the left ear.
The only place where the tonal balance is really correct is for an image right at one of the speakers.
To get an idea of the phantom center tonal error, see this head related transfer function (HRTF) graph (averaged over a small population), scanned from a 1966 JASA article by Edgar Shaw:
http://www3.sympatico.ca/dalfarra/HRTF.jpg
For an equilateral triangle set up, a speaker would be at 30 degrees trying to replicate a phantom in the center. The perceived artificial boost at 7 kHz is on the order 3 to 4 dB, with a gently rising characteristic starting at 2 kHz, peaking at around 7kHz, then reaching equality again above 10 kHz. There it is, the dreaded subjective high-end hotness with flat on axis designs.
By compensating using a gentle high-end roll off, our center phantom image perceptually sounds tonally correct again. Of course 3 to 4 dB compensation at 7 kHz adds the inverse error for images at the speakers, so a compromise of, say, 2 dB at 7 kHz sounds completely reasonable. Using a tweeter with some rise after 10 kHz then brings the difference back, and everyone is happy.
Of course this is tricky, as everyone’s HRTF is as unique as his or her fingerprint. However, it’s a very reasonable assumption that everyone will hear more treble from a source 30 degrees incident than 0 degrees. Tailor to taste.
The repercussions of this effect are wide ranging. For example, the HRTF at 30 degrees is also hotter from 200Hz to 1 kHz than the response at 0 degree (note: all curves converge below 100Hz, as the head becomes small in relation to the wavelength and head diffraction effects minimize). Perhaps there should also be less baffle diffraction compensation than a flat measure would indicate. Indeed, many voice to 4 or 5 dB, even for nominally flat acoustic power designs.
It would also result in perceived tonal changes with changes in speaker placement geometry that are independent of room effect and speaker toe-in. Different angles, different perceived phantom image error.
The difference between stereo and single speaker mono was always ascribed to the inherent picket fencing and crosstalk error inherent in stereo. I’d wager that the HRTF difference for phantom image generation is also a large factor.
Finally, and I know Lynn will love this last one as it’ll dovetail nicely with his far eastern philosophies. This effect explains why different people hear different tonal balances from the same speaker/room, and why there is objectively no one “right” frequency response across the population. Everyone needs a response tailored after their own HRTFs, if phantom images are to sound tonally correct to them.
To me, this is a “very big deal”. Almost makes me want to run out and buy Etymotic in-ear mics and get the HRTF characterized.
DDF
>Nah, if the 0 degree angle of incidence is used as a reference and normalized to a straight line, the peak around 2kHz would practically disappear and the peak around 5~7kHz would also be reduced. I agree that everyones hearing is different, but that's beside the point. A clarinet wouldn't be equalized in real life.
And the other thing is: why should a phantom image be only in the centre? I agree that a lot of multimedia sounds are monophonic, but there's no 'standard' listening position, so there are other factors to consider such as the off-axis speaker response, reverberation of the room, and the polar patterns of the microphones used for the recordings.
>You are right in that stereo reproduction of a centered sound image requires a frequency response correction due to the HRTF. You are not the first to realize this, though... Correct me if I am wrong, but didn't the "BBC dip" have an explanation like that? SVANTE
>Interesting idea, Dave. The problem I see with it is both ears are hearing sounds from both speakers, not just right-speaker, right-ear. Your graph doesn't show 30 degrees so let's take 45 instead. What the ear would hear would be a sum of the 45 and 315 curves. And the power spectrum reaching the ear would be even less than a linear addition of the curves because the short wavelengths make the signals uncorellated. Given all that, I think any boost would be MUCH smaller.
>
Sony once sold a pair of headphones with a gyro sensor, to detect head motion and change the HRTF according to head rotation angle. You first hit a “reset” button, then let the gyro do its thing. We bought a pair (rudely pricey) tried it at length, and didn’t work well at all. Problem with HRTF and binaural encoding is that HRTFs are individualized, and the illusion can work anywhere from extremely well, all the way terribly poor, depending upon your HRTF. I’ve sat down and listened to live feeds through Kemars and HATS, played back through speakers with crosstalk cancellation algos. Great effects but it destroys the music. My HRTF doesn’t match the standard curves the industry seems to favour, and I find most binaural recordings to generally sound cupped or honky, if not downright phasey.
Sqlkev: I’m glad someone tried it, thanks! Did you do it with an acoustic measurement: i.e. the absolute response hits this target? Or did the DEQ dial up just the relative tilt between the two angles? What was the original speaker on axis target?
CeramicMan/panomaniac: center was chosen as worst case to illustrate the concept but of course it varies over angle. Some compensation is better than none, and I personally would chose the center as center fill images are the most typical. Of course the room, directionality, recording etc affect things, but don’t give up hope yet. See my response to AJ below, to understand how to best apply this concept.
Oshifis: many experienced builders will build in a tilt like this from day one.
Svante: I’m a voracious consumer of speaker design literature (for over 30 years) and I’ve never seen this concept in print before. I can’t claim to be the first to think of it, but it’s certainly original to me, and not by any means common knowledge. ie in 30 years, I’ve never heard of this.
The BBC dip is interesting. I’ve yet to see any reference describing exactly what it is. It’s the Loch Ness monster of speaker philosophies: everyone claims to know it, but no one can draw it. If you have a reference, please post it. My understanding of it is a depression in the mid band, whose purpose is to add some diffuse field equalization to the response. The idea being the mic picks up incident but also non-incident sound, and a more natural tonal balance has the playback chain apply some diffuse field response weighting.
Hi catapult, as rdf mentions, the “crosstalk” signal shows additional inherent delay, and the tonal summation isn’t the same as if it were non-delayed. I view this as a second order effect and one of the inherent errors in stereophony.
AJ: Here, you’re repeating an argument I used to make on the MAD board. I guess I had that coming. However, the argument is misapplied in this case. The argument goes something like this: If we want to replicate the exact “message” the recording engineer provided, our speakers, speaker set up and room would be exactly like his used in the final mastering, assuming they were targeting achieving having their system sound as close to the “live” as possible (which isn’t that common a case actually). We could look at this in despair and throw our hands up, knowing that our reference is unknown. However, this argument is a justification to reject the concept of “absolute sound” and allow your own personal experience to dictate what is accurate. Since each recording varies, we should more target our home systems to sound as real to us as possible, with what we perceive as accuracy, and with the recordings we chose as reference, rather than some mythical absolute. We could apply this to each CD to decode the difference between each recording’s vision, and ours, but isn’t that a bit impractical? I’d rather play with the kids.
But all is not lost. Here is how this concept should be applied: In the absence of some unknown reference, I look at this way: if you voice in stereo just appreciate the fact that images glued to the speakers will sound a tad hotter than those more towards the center phantom image assuming you’re looking ahead. Try and strike a tonal balance that best trades of the tonal balance at differing phantom image locations. That’s it. IME, the resultant on axis response which sounds the most “right” invariably has the 4 to 7 kHz tipped down then some rebound around 10 kHz. Some less, some more, depending on f3, dispersion, and which room its going to go in. Its the beauty of diy hifi, we all get to roll our own to taste. This effect is real, consider or discard at your leisure.
Tubee, unfortunately we bring our ears to the live gig, and into the stereo room. So it’s a wash, if you’re trying to make it sound like what you think of as being real. I’m sorry to hear about your situation, but it’s a bit unique; compensating for hearing damage is trying to be “better” than live. DDF
Kommentar