Real sounds are composed of combinations of sine and cosine waves of multiple frequencies. Modern stereos often show those frequencies in octave or band form. A special frequency called the fundamental frequency determines the pitch of the sound. Harmonics are multiples of the fundamental frequency that determine the timbre or richness of the sound. Determination of the fundamental frequency is useful to tune instruments and voices. Automated algorithms for determination of a fundamental frequency and any associated harmonics are generally limited to specific contexts: music, voice, and number and magnitude of harmonics. Algorithms for determination of fundamental frequency are discussed at length. Music and voices cover a very different range of frequencies. Fundamental frequencies and harmonics of component sine waves and cosine waves are used to obtain very specific effects with both music and speaking.
These days, audiophiles can obtain equipment with all sorts of automated features. Most surround sound processors have a built-in ability to boost or cut dB at several frequency bands. Or, the processors perform automatic room correction, which includes compensating for speaker frequency response across the entire audible spectrum. The listener may not even know that the details of the corrections that are being applied. For instance, I have an older amplifier that allows me to enable/disable low frequency and high frequency. When I set one of these, no physical display of the effects are provided. A listener just hears a difference.
Back in the day, I bought my first audiophile unit. This low cost unit possessed a number of important features for me. I had control over a low pass filter and a graphic equalizer. I could activate and control the level of the filter myself. I could set the frequency levels myself using the equalizer. A frequency band display would show me the effects of operating the controls instead of doing work automatically on my behalf. I prefer low frequencies over high frequencies so having these controls and the frequency band display made sure that the settings had the actual affect that I desired and that sounded good to me.
Searching a bit on the web, I found only a single option to purchase a combined amplifier/equalizer/filter component that I could purchase. Others may exist.
Figure 1: Technical Pro RX55UriBT Receiver, Integrated Equalizer
On the right appear the equalizer sliders, each slider being linked to a frequency band. At the left of these sliders, a display shows the "levels" of the frequencies in each band. As the listener moves the sliders, the related frequency band level displays the effect of that change — in addition to modifying the sound.
Secrets Sponsor
Usage of this receiver does not connote an endorsement of this equipment. I have not used this receiver. Given its low cost and its reliance on 1/3 octave, I would guess that many high end audiophiles would be unhappy with its sound quality. Its usage here is simply to illustrate the feature set.
This paper delves into the determination of actual component sine and cosine wave frequencies that provide the foundation for design and operation of frequency band displays, filters, and graphic equalizers, as illustrated above.
The sections that appear in this paper include the following:
- The Composition Of Sound
- Determining Frequencies and Coefficients
- Analysis Of A Real World Sound
- Fundamental Frequency And Harmonics
- Benefits Of Frequency And Harmonic Identification
- Fundamental Frequency Identification
- Harmonic/Overtone Identification
- Music vs Voice
- Demonstration Software Program
Along the way, a number of mathematical diversions discuss the underlying mathematical theory involved. These diversions are carefully isolated and may be skipped by the uninterested reader.
Analysis of physical phenomena as waves came from a mathematician/physicist. Jean-Baptiste Joseph Fourier. Fourier was attempting to describe the physics of heat transfer and vibrations. In 1822, Fourier published The Analytical Theory of Heat.[1].
This work was translated into English in 1878. In this work, Fourier claimed that any function of a variable, whether continuous or discontinuous, can be expanded into a series of sines and cosines of multiples of the variable. In 1930, this whole concept of decomposition of a wave into sines and cosines that are multiples of each other was extended to any kind of wave phenomenon characterized by frequencies.
An MIT mathematician, Norbert Wiener, mathematically proved this extension to sound in his articleGeneralized Harmonic Analysis. [2] in 1930.
Weiner’s article proved specifically that Fourier’s expansion of an arbitrary function into sines and cosines applies when the variable is a frequency. This article became the foundation for frequency analysis of sounds. By 1964, generalized harmonic analysis had become fully disseminated into the mathematics and engineering communities.
Manual calculation of the coefficients of the component sine and cosine waves and the resulting magnitudes is extremely time consuming.
Attempts to obtain speed by performing fewer calculations when determining the sine wave and cosine wave coefficients first appeared in print in an article by Carl Runge in 1903 [3]. Efficiency of Runge’s algorithm was derived from his recognition of the periodicity of sines and cosines when the number of points in the sound wave is a power of 2.
A further advance in the performance of the algorithm occurred in an article by Gordon C. Danielson and Cornelius Lanczos in 1942 [4]. Using a concept called successive doubling, or continuously dividing points into half size groups, algorithm efficiency was further increased.
In 1965, James Cooley and John Tukey published the version of the Fast Fourier Transform recognized as the current version typically used today[5]. Algorithm extensions provided a generally efficient approach when the number of points in the sound wave is any number, not necessarily a power of 2.
Each of these advances typically extended the number of frequencies that could be more efficiently computed as a result of the significant increases in computational efficiency.
Real sounds, such as music and voice, are composed of combinations of sine waves and cosine waves. These waves are characterized by their frequencies and amplitudes and the relative contribution of each of these waves to the total sound.
A frequency band display in a graphic equalizer or a spectrum analyzer provides a summary of the frequencies of the sine and cosine waves in the sound. The summary is based on bands of frequencies.
The figure below shows a typical frequency band display.
Figure 9: A Typical Frequency Band Display
Along the bottom axis appears a set of frequencies in Hz and KHz. Each frequency represents the center of a frequency band.
Along the vertical axis, each band shows the "level" of the frequency in the sound. In the center of the sound level scale appears the value "0". This value represents the unchanged "level" of the sound at the center frequency assuming that no adjustments are made to the component frequency level.
A sound level of 5 indicates that the "level" of all the frequencies of the sine and cosine waves in the band have been adjusted by 5dB. In other words, the vertical axis represents the relative "level" of the contribution of the component sine and cosine waves in the frequency band when compared to the contribution in the incoming actual sound.
As used in this context, dB does NOT represent the loudness of the sound. Instead, this dB level is used to represent the relative change to the contribution of the sine and cosine waves of the frequencies in the band.
Frequency bands can be presented because of a basic fundamental truth: all sounds are composed of combinations of sine waves and cosine waves across a broad range of frequencies.
A very simple multiple frequency sound along with its component frequencies is shown in the figure below.
Figure 10: A Simple Sound With Component Frequencies
Time in seconds appears along the bottom axis of this plot. Amplitude in the range -2 to +2 forms the values for the vertical axis. Careful evaluation shows a very specific repeating pattern – a double peak followed by a triple peak.
This sound is a special sound constructed to demonstrate how specific frequencies can contribute to a total sound. Each point on this sound plot is computed by the following formula:
Two sine curves are used. One sine curve has a 200 Hz frequency. The other sine curve has a 500 Hz frequency. Both curves have a maximum amplitude of 1.0 and a minimum amplitude of -1.0. Not surprisingly, the combined curve amplitudes range from -2.0 to 2.0, the sum of both sine waves at any point in time.
This curve was submitted to a software program that determines frequencies of a sound waveform. The resultant frequency plot appears in the figure below.
Figure 11: Component Frequencies of the Simple Sound
Frequency in Hz is the dimension of the horizontal axis. Along the vertical axis is the value of the magnitude at that frequency. For practical purposes of this example, these magnitudes represent the amplitudes of the component sine curves. Mathematically, the magnitude represents something a bit different but for now, this simplification is justified because the component waves are known to be sine waves.
Heights in this frequency diagram differ from the heights of the frequency bands shown for the graphic equalizer. Equalizer heights are used as relative indicators to identify the relative contribution of component sine and cosine waves in that frequency band of an incoming sound wave. Frequency plot heights in this figure provide an absolute measure of the magnitude of a specific frequency. Or, in this simple example, the heights represent the absolute contribution of the component sine waves at the indicated frequency.
Two peaks appear in the frequency diagram. One peak is at about 200 Hz. Another peak is at about 500 Hz. These peaks should come as no surprise, since this sound was constructed from two sine waves at those frequencies. Magnitude at each of these peaks is about 0.44. According to this program, each component sine wave contributes approximately 0.44of the sine wave amplitude to the total sound curve at any point in time.
A quick glance at the equation used to generate the combined sound in this example reveals what appears to be a shortcoming. The equation says that each sine wave component frequency contributes 1.0 towards the combined sound wave amplitude. However, the analysis program output states that the contribution of each sine wave is only 0.5. In fact, the value predicted by the computer program is only 1/2 of the contribution of the component sine wave. In this example, the program output needs to be doubled!! The computer program that predicts these contribution coefficients produces both positive frequencies and negative frequencies. Predicted negative frequencies are a mirror image of the positive frequencies. A positive frequency and its mirror negative frequency each contribute 1/2 of the sine wave contribution.
Thus, according to the computer program, the combined sound wave really is decomposed into the following equation:
A sine wave of 200 Hz contributes 0.5 of its amplitude to the total combined sound. And, a sine wave of -200 Hz also contributes 0.5 of its amplitudes to the total combined sound. The same can be said of 500 Hz and -500Hz sine waves.
Since the combined sound wave was constructed from 200 Hz and 500 Hz component sine waves, introduction of the negative frequencies into the decomposition seems inconsistent with knowledge of the actual component sine wave frequencies.
Another issue with the results of the computer program is that the frequency location of each peak does not exactly match the known sine wave frequency location. Additionally, contribution factor (in this simple case, the magnitude) does not exactly match the percentage contribution of each known sine wave. Furthermore, these plots are continuous curves appearing to indicate that other sine wave frequencies near the known frequencies also contribute to the sound.
These inaccuracies result from the digital algorithms used to generate the frequency plots. Yes, as counterintuitive as this statement may seem, digital implementations of frequency analysis have a number of limitations. These limitations are one of the reasons that equalizers utilize bands. Combining frequencies into bands masks the inaccuracies introduced by digital implementations of frequency analysis algorithms.
Of course, a trade exists between the number of bands, the depth of control, and the amount of work that has to be performed by the audiophile. More bands gives an audiophile greater depth of control at the cost of more work to adjust the sound to his/her liking.
In the example above, a periodic sound was used. This sound repeats the same pattern over and over for an infinite amount of time. Careful evaluation shows a very specific repeating pattern – a double peak followed by a triple peak.
Real sounds are aperiodic. Simply stated, an aperiodic sound does not repeat the same pattern over and over for an infinite amount of time. A typical musical piece exhibits a structure consisting of phrases and is often broken into known segments, such as verses and a chorus between each verse. Generally speaking, however, the same exact pattern does not repeat over and over for an infinite amount of time. Even the chorus only repeats from time to time.
All sounds, both periodic and aperiodic, can be shown to be constructed from combinations of sine waves and cosine waves.
The mathematical description of this combination of component sine waves and cosine waves is as follows:
where
- x(t) = any signal of a combined sound wave, periodic or non-periodic
- K = number of component cosine and sine waves
- fk = sine/cosine component frequency k, specified in Hz or Cycles/Second
- ak = amplitude of cosine wave component frequency fk [REAL]
- bk= amplitude of sine wave component frequency fk [IMAG]
- fs= sampling frequency used to digitize the combined sound wave in Hz
- t = time in seconds
∑_(k=1)^K▒= summation across all terms with an index k, ranging from k = 1 to k = K.
In the equations above, the akand bk coefficients are the contribution factors associated with each component sine wave frequency in the previous discussion.
Furthermore, the magnitude of the combined sine wave and cosine wave at a specific frequency fkis computed as follows:
Many prediction algorithms report the magnitude in decibels. Decibels are calculated using the following formula:
This version of decibel differs from the decibels associated with loudness. Loudness decibels were used to express the sound pressure level relative to a specific loudness — the human voice speaking at a whisper. These decibels are simply used to enable the relative power of one frequency compared with another.
Using this magnitude as a relative measure of a strength has an important benefit. Both sine and cosine impacts of the frequency are combined into a single measure. Moreover, all magnitudes are mapped into the range of values greater than zero. If this combination were not utilized, and just added together, negative and positive values would counteract each other and would not serve as an effective relative measure of strength of the frequency.
Returning to the simple combined sound wave example used above, the cosine wave coefficients and the sine wave coefficients are shown in the figures below.
Frequency is shown along the horizontal axis. Coefficient value appears across the vertical axis. Negative signs can be ignored. These signs are an artifact of the plotting/calculating technique used by the program performing the calculations.
Cosine wave coefficients show up at 200 Hz and 500 Hz. However, the cosine coefficient at 200 Hz is 0.007 and the cosine coefficient at 500 Hz is 0.018. These values are basically noise created by the calculation algorithm and may be safely ignored. A value of 0.0 is appropriate for the cosine wave coefficients at these two frequencies.
Since this example is contrived from two known sine waves, the appearance of small cosine coefficients below a threshold are clearly artifacts produced by either the digital implementation of the software or by choice of parameters input to that software. An extensive evaluation of the choice of software implementation and inputs into the software will appear in a future article.
Ignoring frequency contributions when the magnitude is below a threshold may or may not be appropriate, depending upon the situation. Often, when music waveforms are being allowed, no frequencies are ignored regardless of small magnitudes. This strategy maximizes the quality of the reconstructed signal after modifications based on the frequencies and their magnitudes. However, the author helped to design and implement the F-14 radar software. The pattern of frequencies and magnitudes of the reflected radar waves (which are sound signals) was used to identify the object which caused the reflection. In this situation, frequencies with magnitudes below a preset threshold had to be eliminated because the patterns were used to identify the source of the reflection. Failing to inform the pilot that he was facing an Russian MIG fighter diving through the clouds would have led to disastrous consequences.
Sine wave coefficients are far more revealing. Sine wave coefficients also appear at 200 Hz and 500 Hz. At both frequencies, the sine wave coefficients are effectively 0.50. These values indicate that each frequency contributes 0.5 of the amplitude of the associated sine wave to the value of the combined sound wave at any point in time. These values need to be doubled to accommodate the negative frequencies, which are not shown.
In the earlier discussion, a claim was made that the magnitude could be used as an estimate of the sine wave contribution percentage in this example. Since the cosine wave coefficients are shown to be effectively 0.0, only the sine wave coefficients contribute to the magnitude. Thus, assuming that the magnitude is an estimate for the sine wave coefficients in this example is clearly justified.
Where
- a200Hz = amplitude of cosine wave component frequency fk= 200 Hz [REAL]
- a200Hz = 0.0
- a200Hz = 0.0
- b200Hz = amplitude of sine wave component frequency fk= 200 Hz [IMAG]
- b200Hz = 0.50
A similar analysis proves the same for the magnitude at 500 Hz.
The computer program used to generate these examples actually generated negative values for the contribution coefficients. Inappropriate negative signs are also a result of the mathematical algorithms. Depending on the FFT implementation, the signs of the coefficients can be effectively ignored.
Magnitudes, sine wave coefficients, and cosine wave coefficients that compose a sound are generated using an algorithm called the Fast Fourier Transform (FFT). Unfortunately, no single standard exists for this algorithm. Multiple published and implemented versions of this algorithm are available. Worse yet, the various implementations yield inconsistent results when used to analyze the same sound waves. Also, a number of parameters can impact the resultant calculations. The next article in this series will specifically address these issues.
A discussion of the internal operation of the FFT algorithm is really not appropriate here. Lots of online resources exist which describe the workings of the algorithm. Furthermore, extensive implementations exist with actual source code for study and usage in any program. A large number of commercial sound preparation programs exist that incorporate FFT capabilities.
Actual usage of the FFT during real time sound play follows a very specific process.
- Select an FFT algorithm implementation
- Select a section of the waveform to evaluate using windowing
- Apply the FFT algorithm
- Eliminate or modulate frequency coefficients using a filter, smoothing algorithm, or an effect, depending on the application
- Apply the inverse FFT algorithm
- Submit the inverted sound to the amplifier
A number of issues and techniques are involved with each of these steps. These issues and techniques will be addressed and discussed in future articles. Additionally, the limitations described above will also be evaluated.
In real usage during audio processing, these inaccuracies actually do not have any impact on the sound, for reasons that will be discussed in the next article.
As sound plays while watching a frequency band display, the height of each band varies. Height variation in each band is the result of the changing contribution of the sine and cosine waves within the band.
A simple example clearly demonstrates the changing composition of the sine and cosine waves. Beethoven’s 5th Symphony contains a four note opening motif known to most people. The key note of the motif is C Minor which has a frequency of 277.18 Hz.
Using a frequency band display control, change in frequencies of sine and cosine wave magnitudes can easily be seen in the first 3 notes and the 4th note of this motif.
The frequency bands in these figures are 1/3 octave. In simple terms, that means that 1/3 of the frequencies in the octave band are grouped together. Frequency bands for Note 4 appear quite different from frequency bands for the first 3 notes. Note 4 is composed of a larger number of higher frequencies of greater magnitude. Moreover, the middle frequency bands for note 4 exhibit much higher frequency magnitudes. Note 4 is composed of a lot more, much stronger sine and cosine components than the first 3 notes of the motif.
In the figure below, a segment of the first 3 notes of the motif has initially been selected for frequency analysis.
Figure 16: Beethoven’s 5th, Segment 1, Motif First 3 Notes
In the Waveform window, a transparent window appears above the opening 3 notes of the motif, indicating that this segment has been selected. Displayed in the Spectrum window are the individual frequency magnitudes. Actual frequencies and their magnitudes appear in the grid below the spectrum window. Frequencies in the previous examples were collected into 1/3 octave bands and thus were summaries. This display provides the actual frequencies and magnitudes computed for the selected data in the window.
Secrets Sponsor
According to the FFT implementation used to identify the sine and cosine frequencies, the maximum magnitude occurs at frequency 173.61 Hz. Clearly, the opening 3 notes do not determine the key note.
In the figure below, the final note of the 4 note motif is subjected to frequency analysis using the same FFT implementation.
Figure 17: Beethoven’s 5th, Segment 2, Motif 4th Note
Comparing the shapes of the frequency spectrum in these two figures clearly reveals that different frequency sine and cosine waves contribute to the sound in the 4th note. According to the FFT implementation used to identify the sine and cosine frequencies, the maximum magnitude occurs at frequency 292.04 Hz. Clearly, the 4th note of the motif does determine the key note.
Again, some of the inaccuracy in the FFT algorithm is exposed. In fact, the known key note for this motif is 277.18 Hz. This FFT implementation over predicts with a result of 292.04 Hz. Since a number of factors impact the calculated frequency, some experimentation with those factors would likely result in a more accurate prediction. However, 292 Hz frequency is closer to the actual value of 277.18 Hz than the 173.61 Hz frequency identified for the first 3 notes of the motif.
At first glance, usage of the term maximum magnitude may be misinterpreted. Most persons would interpret this criterion as the absolute maximum of the magnitudes across all the frequencies in the computed spectrum.
If this criterion were used, maximum magnitude would often be at 0 Hz. Many spectrum calculations result in the largest absolute maximum magnitude occurring at 0 Hz. Obviously, selecting 0 Hz as the maximum absolute magnitude makes no sense.
A review of any of the spectrum examples so far demonstrates that a calculated frequency spectrum generally exhibits a number of local absolute maximum. Practically, then, the absolute maximum occurs at the maximum of all the local absolute maximum. In general, using this criterion always results in a maximum magnitude at a frequency greater than zero.
Identification of a single, local absolute maximum magnitude occurs at frequency fk under the following conditions:
All of the local absolute maximum magnitudes are identified. Local maximum are then ranked in ascending order by magnitude. The frequency with the maximum magnitude over all the local maximum magnitudes is the frequency at which the maximum magnitude occurs.
Since the technique used to calculate the frequency spectrum is digital, the identified maximum may be a bit off from the actual maximum magnitude. Using a digital approach, the actual frequency of with the maximum magnitude may not be at one of the discrete frequencies generated by the calculation algorithms. In order to obtain a more accurate estimate, interpolation techniques are used. Typical interpolation techniques include 3-point parabolic interpolation and sinc convolution interpolation.[6]
Every sound wave form possesses a fundamental frequency and harmonics. The fundamental frequency is the input that is the greatest determinant of pitch. Pitch itself is the perceptual human response to that fundamental frequency. Harmonics or overtones are dependent upon the fundamental frequency. Number and magnitude of the harmonics determines the timbre of the music. Timbre itself is also the perceptual human response to those harmonics/overtones. Thus, pitch and timbre may differ across individuals for the same fundamental frequency and attendant harmonics/overtones.
Written musical notation is organized around scales of notes. Each note represents a fundamental frequency. All of the musical notes that can be played are organized into octaves. Each octave contains seven (7) notes. As a result, each octave starts on an eighth note. The term octave derives from the Latin rootoctavus which means "eighth". Starting each octave on an eighth note is actually based on the associated frequencies.
Notes and associated frequencies for two octaves side by side appear in the tables below:
Table 1: Notes and Frequencies, Octave 4; Table 2: Notes and Frequencies, Octave 5
Data Source for Table Entries: see [7], p. 29
Octave 4 is often know as the Middle C Octave. Recall that the key note for Beethoven’s 5th open motif is C minor, with frequency 277.18. A little bit of math with show that this frequency is exactly midway between notes C and D of octave 4 above.
By comparing the octaves side by side, a relationship between the starting note is visible. Each octave starts with a note whose frequency is double that of the starting note of the previous frequency. So, the selection of octaves that start eight notes apart possessed some scientific foundation.
Many sources refer to these notes as a pitch. Usage of this term is incorrect. These notes represent the fundamental frequencies of the sound and are correctly called the key note. Pitch is totally different. Pitch is not a purely objective physical property measured as a frequency. Pitch is a subjective psycho-acoustical attribute of sound. Historically, the study of pitch and pitch perception has been a central problem in psychoacoustics, and has been instrumental in forming and testing theories of sound representation, processing, and perception in the auditory system. In simple terms, pitch is the human perceptual response to a fundamental frequency[8](pp. 145, 284, 287).
A closer look reveals that the frequency of every note in an octave is double the frequency of the corresponding note in the previous octave and the successive octave. Frequency of note A, octave 4 is 440 Hz. Note B, octave 5, is 880 Hz, double the frequency of note A in octave 4.
Simply playing the key note or a note with the fundamental frequency would lead to some pretty boring sounds. Playing a 277.18 Hz sound wave sounds like a hum. This frequency is the fourth harmonic of a 60Hz electrical line, which is what you would hear from line interference. A 500 Hz sound wave sounds like a whine. Harmonics change the nature of the sound so that a listener perceives those sounds and musically entertaining. In distinguishing the sounds of these two frequencies as a hum and a whine is my own personal psycho acoustic response to these frequencies.
Beethoven surely knew the concept of harmonics when he created his 5th Symphony motif!
Harmonics, sometimes called overtones, are easily identified once the fundamental frequency is identified. Each harmonic is an integral multiple of the fundamental frequency. The first harmonic is double the fundamental frequency. Harmonic two is triple the fundamental. In general, but not always, the magnitudes of the harmonics are smaller than the magnitude of the fundamental frequency.
Both fundamental frequency and harmonics of the 4 note motif from Beethoven’s 5th Symphony appear in the figure below: Since the program used to generate these displays was for providing examples, interpolation was not used in generating these results.
Figure 18: Beethoven’s 5th Motif Fundamental, Harmonics
In this analysis, the fundamental frequency is 292 Hz. Harmonics appear at 584 Hz and 876 Hz. The first harmonic is twice the fundamental frequency. The second harmonic is three times the fundamental frequency.
Magnitude of each of the harmonics is less than the magnitude of the fundamental frequency. Moreover, these magnitudes are relatively significant. Sine and cosine wave components at these three frequencies contribute at different levels to the distinctness of the sound of the motif.
Pitch of this motif is controlled by the fundamental frequency. Those harmonics control the timbre of the motif and are a major contributor to the individual personal response to the sound.
Fundamental frequencies in the tonal scale are used to create instruments for tuning. Voices are tuned, musical instruments and devices are tuned.
Tuning typically takes place in the middle octave, octave 4. For a string instrument, such as a piano, each vibrating element is tuned to the associated frequency of the note in the octave. Tools used include a tuning lever, rubber wedge mutes, and a chromatic tuner. The tuning process involves getting the frequency of the sound produced by the vibrating element to match the frequency of the matching note in the octave. Chromatic tuners compare the frequency of the sound produced with the desired frequency of the octave note. Tuning consists of successive modifications to the vibrating element of the musical instrument until the frequency of the vibrating element matches the frequency of the octave note.
In simple terms, a tuner matches the frequency of the vibrating element to the fundamental frequency of the octave note.
Perhaps not surprisingly, software exists for tuning the human voice to these fundamental frequencies. Some options include Melodyne, Waves Tune, Nectar, Canta, GSnap. A lot of options on the Internet, across all price ranges, including some free.
For instance, with Melodyne, your voice is drawn into a piano roll.
Figure 19: Melodyne Voice Tuning Assessment
Each shaded area represents the fundamental frequency of a specific note in octave 4, the middle C octave. A line on top of the note shows the performance of your voice relative that note. The closer the your voice performance line is to the center of the key/note, the closer you are to that note. Of course, you need a voice teacher to help you modify the sound you produce at each note.
The vibrating element that is the source of your sound at the particular fundamental frequency is your vocal chords. In simple terms, a this software matches the frequency of the vibrating vocal chords to the fundamental frequency of the octave note.
Identification of the fundamental frequency of a sound has been an ongoing research topic for many years. Techniques employed are generally broken into two broad categories: frequency domain techniques and time domain techniques. Frequency domain techniques utilize frequency spectra produced by an implementation of the Fast Fourier Transform. Time domain techniques use the actual sound waveform amplitudes as the wave form varies over time.
A major problem exists with automated detection of a fundamental frequency. If the waveform has few higher harmonics or the magnitude of the higher harmonics is small relative to the magnitude of the larger harmonics, fundamental frequency is easy to detect.
Analysis of the 4 note motif from Beethoven’s 5th Symphony clearly illustrates that this statement is correct. Harmonic analysis of the 4 note motif shown earlier yields the following overtones and magnitudes:
Table 3: Fundamental Frequency and Harmonics, 4 Note Motif
This motif appears to have only two harmonics. Relative magnitudes of these harmonics are well below the magnitudes of the fundamental frequency. This fundamental frequency with corresponding magnitudes was easily determined using the simplest peak detection frequency domain approach.
Unfortunately, this situation is just not correct for most sounds. Moreover, sound complexity and richness (context) of the sound makes automated detection more difficult. Detectors that work well for voice do not work well for music. And, detectors that work well for music do not seem to work well for voice sounds.
Initially, the problem of fundamental frequency identification was approached from the frequency domain. In 1979, Martin Pisczczalski and Bernard Galler published an article in the Journal of the Acoustical Society of America. In this article, the frequency spectrum generated by an FFT algorithm was submitted to peak detection to identify the fundamental frequency.[9]
The goal of peak detection is to determine a list of local magnitude maxima over all of the frequencies. Local peaks were then paired based on integer multiples of frequency. The lowest paired frequencies were determined to be the fundamental frequency and the first harmonic. This approach is the one described in a mathematical diversion above.
Other frequency domain approaches include using an optimum comb filter, a tunable infinite impulse response (IIR) filter, a cepstrum analysis, and multi-resolution methods. Multi-resolution methods use a specific algorithm at multiple resolution levels for confirmation. This approach involves recomputing the same algorithm, costing more time and is generally not useful during real-time audio processing.
Time domain analysis also utilizes a wide range of approaches. Specific approaches include time-event rate detection, zero-crossing rate, peak rate, slope event rate, autocorrelation, and the YIN estimator. Since these approaches operate on the actual amplitudes of the same waveform, processing is done with greater speed but may not be as accurate.
An excellent overview of the various approaches can be found in [9].
People speak, sing, and play musical instruments. In performing these activities, sounds are being created. When people create these sounds, manipulation of component sine wave and cosine wave frequencies is being performed.A lot of data has been collected of that characterizes the fundamental frequencies of each of these sound producing activities.
Under normal speaking conditions, human males speak using sine wave and cosine wave fundamental frequencies in the range 70 Hz – 200 Hz. Females speak under normal conditions using sine wave and cosine wave fundamental frequencies in the ranges 140 Hz – 400 Hz. Women generally speak with higher frequencies than men. When speaking with specific purposes, humans can and do exceed these fundamental frequency ranges.
Singers generally broaden these fundamental frequency ranges when singing. Singers exercise far more control over the vibrations of their vocal chords. Additional factors go into generating a song such as the shape of the mouth cavity and shape of the lips. But, the musical sound begins with the vibration patterns of the vocal chords.
Singers are divided into categories based on the notes used — and thus, the fundamental frequencies of the component sine waves and cosine waves produced in their songs.
Table 4: Categories Of Singers By Fundamental Frequency [10]
From this table, the component sine wave and cosine wave frequencies utilized by singers range from 87 Hz to 1047 Hz. While persons speaking normal utilize frequencies in the lower ranges, a much larger range can be reached by a person speaking. As will be demonstrated below, persons often do stretch voices beyond the normal range of frequencies for very specific purposes.
Musical instruments utilize component sine wave and cosine wave frequencies over a far greater range — 20 Hz to about 20,000 Hz. This range is the range of notes that appear from the beginning of octave 0 to the ending of octave 9.[7], p. 29.
Typical fundamental frequencies of component sine waves and cosine waves for a number of musical instruments can be found in the table below:
Table 5: Categories Of Musical Instruments By Fundamental Frequency [11]
These instruments pretty much cover the broad range of fundamental frequencies. But, musical instruments utilize component sine wave and cosine wave frequencies over a far greater range — 20 Hz to about 20,000 Hz. Higher frequencies produced by musical instruments appear as harmonics to these fundamental frequencies!
An analysis of a real piece of music with know instruments demonstrates these fundamental frequencies. On June 28, 1928, Louis Armstrong and his Hot Five band recorded "West End Blues." According to jazz historians, this recording gave jazz stature as a legitimate musical form. (Since Louis was born on July 4, 1900, he had this major accomplishment at the ripe young age of 28.)
Several selections inside this piece of music help to demonstrate the fundamental frequencies at play and the usage of those frequencies to obtain a specific effect. At the opening, Louis himself plays a trombone solo to build intensity. Later in the piece, Earl "Fatha" Hines performs a piano solo to calm the listener after the trombone solos. Spread throughout the recording are a number of "scat" voice performances by Louis in combination with a specific instrument in the background. These voice performances were also used to calm the listener from the intensity of the other solos, especially the trombone performances.
A harmonic analysis of the opening trombone solo appears in the figure below:
Figure 21: Harmonic Analysis, West End Blues, Opening Trumpet Solo
[Segment Range: 00:02.60 — 00:14.13]
Fundamental frequency of the trumpet ranges from 165 Hz to 988 Hz. In this segment, Louis plays at a fairly intense fundamental frequency of 312 Hz with a really hot harmonic at frequency 624 Hz, 2/3 of the way to the top end of the trumpet frequency range. Moreover, the magnitudes of those frequency are really strong.
By comparison, Earl Hines plays a piano solo later in the recording. A harmonic analysis of the piano segment by Hines appears in the figure below:
Figure 22: Harmonic Analysis, West End Blues, Later Piano Solo
[Segment Range: 02:098 — 02:33.61]
Hines plays the piano with a fundamental frequency of 157 Hz. This frequency is well within the piano frequency range of 28 Hz to 4186 Hz. Even the first harmonic at 314 Hz is a reasonably slow pace for the piano relative to the broad range of frequencies that can be utilized. Hine’s purpose is to relax the listener, after the intense higher frequency playing of the trumpet by Armstrong.
Frequency is essentially speed. Listen to both segments with the sample program. That opening trombone solo moves rapidly along (higher fundamental frequency and harmonics) while the piano solo seems to move lazily along at a much slower speed (lower fundamental frequency and harmonics).
From time to time, Louis intervenes with a "scat" solo. A harmonic analysis of one of these "scat" solos appears in the figure below:
Figure 23: Harmonic Analysis, West End Blues, "Scat" Solo
[Segment Range: 01:29.26 — 01:58.65]
With a fundamental frequency of 197 Hz, this sound will be much slower and calming than the sound of the intense trumpet at a fundamental frequency of 312 Hz and its higher harmonics. Based on a fundamental frequency of 197 Hz, Louis would be classed as a tenor. Tenors have a fundamental frequency range of 130 Hz to 523 Hz. While the fundamental frequency of his voice does appear in lower categories, the voice sounds to be lighter rather than deeper, suggesting the classification of tenor as being more appropriate.
In the previous example, fundamental frequencies were used to build intensity and to relax and soften after that intensity. Human speakers can also use fundamental frequencies of component sine waves and cosine waves to achieve specific effects.
John F. Kennedy was a public speaker who carefully used fundamental frequencies and harmonics to obtain very specific effects when speaking. His inauguration speech demonstrates several very specific examples of this.
At the beginning of his inauguration speech, Kennedy acknowledges the various members of the government in attendance at the speech. A harmonic analysis of the acknowledgement selection appears in the figure below:
Figure 24: Harmonic Analysis, JFK Inauguration, Acknowledgements
[Segment Range: 01:00.57 — 01:46.00]
Human males normally speak using sine wave and cosine wave fundamental frequencies in the range 70 Hz – 200 Hz. Here, in the opening acknowledgements, JFK is using a fundamental frequency of 549 Hz, well above that normal range. His objective here is to establish himself as a leader. The overtones are strong, having large magnitudes and giving him a sense of command.
Later in the speech, JFK uses that famous phrase that made this speech one of the most outstanding of the century: "Ask now what your country can do for you, ask what you can do for your country." In the figure below appears a harmonic analysis of this phrase:
Figure 25: Harmonic Analysis, JFK Inauguration, Famous Phrase
[Segment Range: 13:36.01 — 13:51.15]
Here, JFK uses a significant increase in the fundamental frequency of component sine and cosine waves with accompanying very high frequency harmonics. His fundamental frequency has moved from 549 Hz to 628 Hz, a 14.3% in frequency. His goal in this increase in frequency is very specific. This increase in fundamental frequency of component sine and cosine waves along with the increases in the harmonics are meant to give authenticity and passion to this phrase. Kennedy is saying that he identifies with a greater good and telling his listeners to do the same.
Usage of higher fundamental frequencies and harmonics of component sine waves and cosine waves to obtain these kinds of objectives is well documented by experts who analyze speeches. [12].
Conclusions
Real sounds are composed of combinations of sine and cosine waves of multiple frequencies. Modern stereos often show those frequencies in octave or band form. A special frequency called the fundamental frequency determines the pitch of the sound. Harmonics are multiples of the fundamental frequency that determine the timbre or richness of the sound. Determination of the fundamental frequency is useful to tune instruments and voices. Automated algorithms for determination of a fundamental frequency and any associated harmonics are generally limited to specific contexts: music, voice, and number and magnitude of harmonics. Music and voices cover a very different range of frequencies. Fundamental frequencies and harmonics of component sine waves and cosine waves are used to obtain very specific effects with both music and speaking.
A demonstration program was developed to illustrate principles explained in these articles. Normal user interface protections have not been programmed as would be the case if this program were production quality. Please follow the directions as indicated.
You can download a zip file containing the software and the sound files. See the download instructions in the appendix to this article. PLEASE READ THE DOWNLOAD INSTRUCTIONS CAREFULLY!
Note: SECRETS of Home Theater and High Fidelity disclaims any responsibility for the operation of this demonstration software, developed by Dr. Krell of SW Architects.
Figure 8 shows the user interface to the sound demonstration program.
Figure 26: Sound Demonstration Program User Interface
This user interface is organized around tab pages. Each tab page is used to focus on demonstrating specific sound principles. The first tab page is labeled "Frequency". This specific tab page is designed to illustrate pure tones and loudness associated with a single pure tone, as utilized by audiologists.
As a minor note, the loudness level is represented in the range 0.0 to 1.0. Obviously, this range is not dB (SPL) but a representation of the range of dB (SPL) values. This range was used for ease of programming. Actual dB (SPL) will depend on the settings for your sound card.
- Open the Windows Explorer to the folder where the program was downloaded.
- Double click the program to start execution.
- Click on the Tab with the title "Sound".
- Select a file from the dropdown box.
- Click the button labeled "Load".
- As the waveform loads, the waveform appears in the waveform window.
- Once loaded, you can Start, Pause, and Stop playing the file.
- YOU MUST PUSH THE STOP BUTTON PRIOR TO CLOSING THE APPLICATION,
- OR YOU GET AN ERROR AT THE PRESENT TIME.
- You can adjust the loudness as you play using the loudness slider.
- As the music plays, the frequency bands in the Spectrum window will be updated.
- Move the Selection Start marker by clicking the left mouse button anywhere on the line above the waveform.
- Set the Selection End location by positioning the mouse pointer on the Selection Start marker, press the left mouse button, drag to an end location, and release the left mouse button.
- A transparent window appears over the selected section of the waveform.
- You can Start and Pause playing the selection.
- After the selection plays, the selection window will disappear.
- Click on the tab with the title "Harmonics".
- Select a file from the dropdown list.
- Click the button labeled "Load".
- Select a section to play or analyze. You must select a section.
- Use the same instructions as above to set the Selection Start and Selection End.
- Click the green button labeled "FFT".
- Depending on the length of the selection, the computations may take some time.
- Be patient. All the display elements will turn green when completed.
- Use the top dropdown to the left of the FFT button to switch between Frequencies and Harmonics.
- Use the bottom dropdown to the left of the FFT button to switch among Magnitudes, cosine wave coefficients, and sine wave coefficients.
- As you use these two dropdowns, all of the appropriate elements will update.
- Min and Max values may appear to not change but they are updated. The values may not have changed after the update.
References:
[1] Fourier, Joseph. "The Analytical Theory of Heat" Translation by Alexander Freeman, Cambridge: At the University Press, London, 1878.English translation of French mathematician Joseph Fourier’s Théorie Analytique de la Chaleur, originally published in French in 1822
[2]. Wiener, Norbert, "Generalized Harmonic Analysis", Acta Mathematica. 55 (1): 117–258, 1930.
[3].C. Runge, Zeit. fur Math and Physik, vol 48, p443, 1903.
[4]. Danielson, G. C., and Lanczos, C, "Some Improvements in Practical Fourier Analysis and Their Application to X-ray Scattering from Liquids", J. Franklin Inst, vol 233, pp. 365-380 and pp 435-452, April 1942.
[5]. Cooley, J.W., and Tukey, J.W., "An Algorithm For the Machine Calculation of Complex Fourier Series", Mathematics of Computation, vol 19, pp. 297-301, April, 1965.
[6]. Arnost, Vladimir, Comparison of Algorithms in Real-Time Sound Processing, Department of Computer Science and Engineering, Brno University of Technology, Brno, Cz, https://www.maximalsound.com/mastering/interpolation%20methods.pdf
[7]. Olson, Harry F., Music, Physics, and Engineering, 2nd Edition, Dover Publications, Inc., 1967.
[8]. Hartmann, William Morris, Signals, Sound,and Sensation, Springer, 1997.
[9]. Piszczalski, Martin, and Galler, Bernard, "Predicting Musical Pitch From Component Frequency Ratios", Journal of the Acoustical Society of America, 66(3): 710-720, September, 1979.
[10]. Gerhard, David, Pitch Extraction and Fundamental Frequency: History and Current Techniques, University of Regina, TR-CS 2003-06, November, 2003.
[11]. Provided online at www.zytrax.com/tech/audio/audio.html. Last downloaded on 6/27/2017.
[12]. Greene, Richard, The 7 Reasons Why JFK Is One of the World’s Greatest Speakers, www.huffingtonpost.com/richard-greene/the-7-reasons-why-jfk-is-_b_6200546.html. Last downloaded on 6/27/2017.
Appendix
Sound Demonstration Software Program Download Instructions
THIS PROGRAM IS FOR WINDOWS 8 AND GREATER ONLY!!!
EVEN WITH THOSE VERSIONS, YOU MAY BE ASKED TO DOWNLOAD NEWER WINDOWS COMPONENTS.
- Open a browser of your choice.
- Enter the address below into the address bar of your browser
- www.swarchitects.com/SoundDemo1
- Right click on the file name below:
- SoundDemo.zip
- A popup menu will appear.
- Select the menu option that allows you to download to your hard drive.
- This option is different for every browser.
- Use Windows Explorer to navigate to the folder where the files were saved.
- Open the zip file.
- Extract the contents of the zip file to a folder.
- Use Windows Explorer to navigate to the folder where the zip file was extracted.
- Double click on the file SoundDemo.exe.
- Make sure that you chose the proper exe file.
- One file that has this extension is a configuration file.
- DO NOT MOVE OR REMOVE ANY OF THE FILES OR FOLDERS.
- If the loudness level is dim when the thumb of the slider is dragged, adjust the sound level of the sound card.
WARNING
Both the download instructions and the program have been reasonably tested. However, errors may occur. Please forward any problems/issues with the program to me at: Bruce Krell, [email protected]
About the Author: Bruce E. Krell, Ph.D
Bruce E. Krell holds a BS in Math and English from Tulane University, an MBA in Management and Economics from the University of New Orleans, and a PhD in Applied Math from the University of Houston. Over the years, Bruce has worked in a variety of areas, both as an Applied Mathematician, an Applied Physicist, and as a Software Engineer. Areas in which Bruce has worked include communications satellites, spy satellites, missile guidance systems, radar manufacturing, 2-3 and 3-D microscope engineering, video engineering and analysis, sound engineering, trajectory physics, structural engineering, civil engineering, and FDA Drug Testing, among others. Over the years, Bruce has written 6 books on system engineering, software engineering, and forensic science. During his career, he has given lectures and workshops all over the US and Europe, in many of the areas in which he has worked.
The author wishes to thank Dr. David A. Rich for his contributions to this article.