A speaker should produce a flat frequency response in an anechoic room.
The loudspeaker should be designed such that its directivity (how the sound is radiated to directions other than to the front) makes it so that when the same speaker is placed in a “normal” (slightly reverberant) room, the frequency response will be a little tilted – about 4 dB more bass, and about 2 dB less treble.

The debate about this is basically over, the question has been answered, and indeed, virtually all “good” speakers show this behavior (flat on-axis, controlled sound power output).

And since recording studios use good speakers (studio monitors) to record, monitor, and mix the music that consumers listen to later, it makes intuitive sense to listen to the music on similarly performing speakers – because that is what the music is supposed to sound like, this is what the artist and recording engineers decided “sounded good”. So the target for speakers is Flat on-axis, controlled sound power output (smooth directivity).

Now, the same question can be asked for headphones: “what should a headphone sound like?” (in terms of: What is the ideal frequency response of a headphone”), and the short answer is: “it’s not that simple”.

The answer is simple for speakers (not that simple really, but it has been answered), but for headphones, it is much more difficult.

The first difficulty is “how do you measure it?”. It’s easy with speakers – put a calibrated microphone at a standardized distance, in a standardized room (e.g. anechoic). With headphones, this isn’t possible, because much of the sound depends on the shape of the head. The general consensus is to measure headphones on artificial heads, with artificial ears and artificial ear canals. The problem with this is, that head shape, ear shape, and ear canal has a significant influence on the acoustics, most prominently a 10-20 dB boost in the upper midrange / treble part of the spectrum, depending on the angle of incidence:

The important thing is: We “hear” this boost even when listening to speakers – because our ears are always there. When the artificial head measurement shows a high boost at 3 kHz, this sounds “flat, linear” to us, because this is what our ears hear. But how should this boost look exactly? What is the target frequency response?

Enter a scientist named Sean Olive. His hypothesis was that the best way to come up with a target response for headphones was to place a pair of good speakers in a “regular” listening room similar to the control rooms of recording and mixing studios, and measuring the frequency response with an artificial head setup. Harman’s reference listening room is neither fully reverberant nor fully anechoic, it features a reverberation time of about 0.4 seconds, very similar to what professional recording and mixing studios use.

Now if we measure a headphone using that same artificial head and the headphone was to have the same frequency response that we previously measured in the room, then this frequency response would be ideal, or so Sean Olive proposed. And further research proved that he was right, the majority of both trained and untrained listeners prefer this target curve over any other target curve.

This allows us to escape the circle of confusion, and get closer to the ideal of hearing the same thing the artists, producers and engineers heard when creating the music that we listen to. But the topic does not end here!

It’s important to understand that the result of this research is – contrary to popular belief – is not a single fixed frequency response curve that needs to be followed to the letter. In fact, the research confirmed what listeners had suspected for quite some time: The preferred amount of bass varies from person to person.

Preferred bass-response on in-ear headphones

Additional research performed by USound and IEM showed that in the presence of background noise (such as distant traffic), listeners tend to prefer more bass and slightly less treble compared to the original Harman Target.

For this reason, USound’s USOUND1V1 reference target was tuned by expert listeners to contain more bass energy below 300 Hz, at the same time exhibiting a more smooth and timbrally accurate reproduction of the upper midrange and treble response. This results in a warm and punchy sound without sacrificing clarity, which maintains the original timbre of voices.

Comparison of target frequency response curve.

The predictable and highly precise performance of MEMS loudspeakers enables more accurate tuning with digital filters. The precision of the MEMS actuators makes them the ideal transducer for an acoustically optimized system and allows using more natural-sounding filters to achieve the target sound.