Music production is an art form that is distinct from the actual composition of music, and instead involves the study of how music is perceived by the listener, how it is represented in some medium, and how it is recored to that medium. Digital technology has completely revolutionized the art of music production, allowing professional quality music to be produced using inexpensive consumer devices, in turn allowing independent artists to more accurately reflect their intentions, provided they are skilled in the art of music production.
In this article, I will provide a fairly in depth introduction to the art of music production, using Apple’s Logic Pro X software, but the principles I’ll introduce are general, and can be applied to the production of music in any professional or home recording studio.
Though you can follow along to this article with nothing at all, other than a pair of ordinary headphones, these are the components I will be using:
These are hands down the best headphones that I’m aware of, and every recording studio has them available for artists. The clarity is unreal, and if you’ve never listened to music on professional audio equipment before, it is in my opinion a transformative experience. They typically cost about $100.
There are plenty of digital audio programs out there, but this is what I use, and it’s wonderful. The interface is intuitive, aesthetically pleasing, and the built-in devices and libraries are awesome. Logic Pro X currently costs $199.
(3) Shure SM-57 Mic
This is the workhorse of microphones, used for everything from live vocals to snare drums. This is not the best microphone, but it’s arguably the most reliable and universal, and will work just fine for everything other than studio quality vocals, though you can probably fake it after reading this article. But, if you’re looking for a microphone that is going to be used to record studio quality vocals, then you’re looking for a condenser microphone, and you won’t have any trouble finding a really high quality mic for around $100 to $200.
The Most Basic Levers of Recorded Sound
In this section, I’ll introduce two of the most basic tools that a producer can use to alter the sound of a mix, which are panning and volume. I’ll revisit each of these concepts in much more technical settings later on, but the purpose of this section is to demonstrate that even these two basic tools can completely change how a listener perceives a piece of music.
You’ve got two ears, and as a result, headphones and speakers generally have two channels, Left and Right, that correspond to your left and right ears, respectively. As a result, any recorded sound can be played back in the left channel, the right channel, or both channels. Moving a sound from one channel to the other is called panning, and it allows you to manipulate the perceived space in which a sound exists. Though there is no center channel, what panning allows you to do is create a perceived space for sound that goes from left, to center, to right, including everything in between, as you “turn the knob” from left to right.
As an example, listen to the opening percussive bells in this song by Emilie Nicolas, and try to focus on which ear you hear the sound in. You’ll notice that different notes are actually in different positions, with some in the left channel, some in the right channel, and others roughly centered (i.e., distributed evenly between the left and right channels). At the 32 second mark, there’s a man’s voice singing, which you can hear is almost entirely in the left channel. Then at the 40 second mark, the same voice appears again, except this time mostly in the right channel. This is a very simple technique that requires only turning a knob from left to right, that can create an apparent geography to a piece of music in the mind of the listener.
In Logic Pro X, each channel has its own pan knob, which you can simply click and rotate from L to R, which will cause the channel in question to move from 100% left to 100% right, as the knob sweeps from L to R, causing the sound to move from the left ear to the right ear. By assigning sounds different positions in a mix, you can create a perceived location for each sound, and this simple technique actually works when you have a large number of instruments, and you want to allow each instrument to be clearly perceptible to the listener.
Informally, the volume of a channel tells you how loud the channel is. Just like with panning, we can create an apparent geography using volume. So for example, if someone is shouting, but the shouting is played back at a low volume, this will create the sense that the shouting is happening far away from the listener. Similarly, if a typically quiet sound, like a whisper, is played back at a loud volume, then this will create the sense that the sound is happening near the listener (i.e., that someone is whispering in their ear).
Volume can also be used to create a sense of sparseness, or fullness, even if the individual volumes of the instruments don’t change. This is exactly what happens during a “drop”, when a song actually begins after an introduction, or a bridge section – the number of instruments typically increases quickly, creating a rush of loudness. As an example, above is a song by Skrillex, featuring Ellie Goulding, which starts with a synth line that has a roughly constant volume. There are other things going on during this introduction, like reverb, that give the intro a sense of sparseness, as if it were being played in a large open space, but the volume alone, when the beat actually comes in, completely changes the perceived sound of the song. This is because a set of instruments together have a louder volume than any of the individual instruments in isolation. So when the song actually comes in, the total volume of the mix is much higher than the isolated volume of the synthesizer line during the introduction, producing a much louder total sound. “Loudness” is actually a sport of sorts in pop music, and as a general matter, pop producers try to achieve the loudest total mix possible, which is a technical challenge we’ll discuss below.
In Logic Pro X, each channel has its own volume fader, which you can simply click and drag up and down, which will cause the channel in question to transition from maximum volume to completely muted, respectively.
Even though Logic Pro X has only two outputs, Left and Right, it is capable of recording what is effectively an unlimited number of individual tracks. You can add new tracks by simply clicking a button, and this will cause a new channel strip to appear, with its own volume fader, pan knob, and other associated controls that you can adjust. What this allows you to do is record instruments separately, but nonetheless play them back simultaneously. So you could, for example, record a guitar track, and then, separately, sing over it, recording a vocal track, and then play both simultaneously. This allows one musician to compose for an entire orchestra, and actually record it, which is nothing short of revolutionary, especially for $199.
As an example, this is a viola septet that I wrote, that also makes use of rather drastic panning during the introduction. You can clearly hear the instruments start to come in, one by one, and each of these instruments was in fact scored separately, and then mixed together into a single piece of music. Though I used Logic Pro X to actually produce the audio for this piece, the instrument sounds themselves come from a library sold by Native Instruments called Kontact, which costs about $600 to $1300. This library can then be added to Logic Pro X, and accessed through the interface relatively easily, expanding the already sizable set of instruments that are included in Logic Pro X.
One of the most important tools that a producer can make use of is compression, which allows you to compress the distribution of volumes a channel generates into a narrower band, making quiet sounds louder, and loud sounds quieter, leaving the middle relatively undisturbed. This ends up causing a track to have a tightly bounded volume, which prevents sudden jumps in volume, which is important for pop music, though it is arguably something you might want to minimize use of for other genres, where you actually want a wide range of volumes to be possible. Nonetheless, for pop music, you almost always want to make heavy use of compression, and by repeatedly compressing individual tracks, and then compressing an entire mix, you can achieve loudness, by generating a tighter average volume, and then ratcheting up that average.
But like everything else, if you overdo compression, it starts to sound unnatural, and not very good. Specifically, for vocals, using too much compression can produce a sound similar to a radio DJ’s on-air mic, which is generally not desirable. Nonetheless, if you’re producing pop music, compression is something you’ll want to use on basically every track, since you don’t want inaudible quiet sounds, nor do you want ultra loud, out of nowhere sounds – i.e., you don’t want highly random volumes that force the listener to get up every 15 seconds, and adjust the playback volume. Instead, you want music that is dynamic, that gets louder and softer, but that nonetheless has a playback volume within some manageable range.
Compression is the tool that lets you achieve this. In Logic Pro X, there are several built-in compressors, and above is the one that I typically use, because I think it produces the best results. There are a lot options on this compressor, but I’m going to focus on the basic controls, which are the threshold, the ratio, and the makeup. You can do a lot with just these three controls, but if you’re interested, you can find plenty of materials online devoted solely to compression.
The threshold is the volume at which the compressor kicks in, and starts to limit loudness. The lower the threshold is, the less upside is permitted in terms of volume. As a result, as a practical matter, if you spin the threshold all the way to the left, you’ll end up reducing the volume of the track. The ratio controls how aggressively the compressor reduces any sounds that are above the threshold. As a result, if you spin the ratio all the to the right, you will drastically reduce the volume of any sounds that are above the threshold. Finally, the make up controls the amount of gain that you add back to the signal, post compression. This is the finishing touch that lets you create a flat signal, that has a roughly constant volume. That is, the compression phase (determined by the threshold and the ratio) will reduce the volume of any sounds that breach the threshold. This will, as a practical matter, reduce the volume of the signal, since really loud sounds will get diminished. The make up gain lets you bring up the net volume of the signal, adding gain to the already compressed signal. If you’re a mechanical thinker, the compression phase puts a ceiling on the volume, and then the make up gain lifts the floor, forcing everything into a narrower band of volume.
Rather than tell you what settings to use, I think the best thing you can do for yourself as a producer of music is to try to imitate the sounds produced in the music you enjoy the most. Nonetheless, there are some benchmarks that I think you should try to understand and achieve. As an example of what I think of as ideal compression for vocals, I’d suggest a close listen to the vocals on this track by My Brightest Diamond (a.k.a. Shara Nova). This is a great example, because there’s almost no audible processing on her vocals, other than compression.
First off, notice that she’s not singing loud at all, but you can nonetheless hear every aspect of her voice, even her breath, and slight percussive elements of her pronouncing hard consonants, like C’s and T’s. In order to achieve this, you need to record a clean vocal signal in the first instance, and in this case, though I wasn’t in the studio, this vocal track is almost certainly the result of a close-mic technique, where the vocalist is positioned within inches of the microphone, separated from the mic only by a pop-screen. What this allows for is a lot of information about what’s going on, frankly, inside the singer’s mouth, which is what allows you to hear all the consonants being pronounced, and the breathing in between phrases. So overall, a close-mic technique produces a very intimate sound, as if she’s singing directly into your ear.
For a song like this, which is in large part a delicate (and quite depressing) ballad, this technique is perfect, because you want a sense of intimacy. For a stadium anthem, you do not, and so you’ll probably have to position your vocalist further back from the microphone, to allow for louder singing, since the particulars of diction and breathing aren’t as important. Compression will also play a different role in these two cases. For a close-mic setup, my personal opinion is that you’ll want to use the compression to bring out the breathing, and the diction of the singer, rather than create a flat signal. That is, you can use the compressor to actually accentuate variance in a signal, rather than crush it, by making heavy use of make up gain, and using the threshold to delicately control the overall signal. The net effect will be to produce a signal that is even more sensitive to changes in volume in the underlying vocal track, since you’ve amplified everything below the threshold, causing small bumps in volume to turn into significant jumps. This is the kind of mic placement and compression you’ll generally want for classical singing, jazz singing, and some types of hip-hop, where the vocals dominate.
In contrast, if your vocalist is going to be singing loudly, you probably need a bit of distance from the mic, and you probably don’t want to achieve intimacy, but instead, probably want to achieve a flat signal. This is the classic pop music vocal track, where the tones are clear, and the volume is roughly constant, and loud. This is achieved by bringing the compressor threshold down significantly, and then applying make up gain to recover the signal. If you’re trying to achieve a flat signal that is nonetheless loud, you’ll probably have to do this a few times, repeatedly applying compressors in series, in order to build up enough gain. In my mind, the archetypal pop vocal track is exemplified by this Britney Spears song, “Till the World Ends”, which was produced by the prolific Dr. Luke. You can hear a lot of diction, not much breathing, but most importantly, her vocals are roughly constant in volume. This is the result of repeated compression, bringing the vocal signal into an extremely narrow band, that is basically on or off, rather than dynamic. This is accentuated by the deliberate muting of the vocal track during the chorus, which is literally turned on and off by muting.
Equalization lets you take a sound and amplify, or reduce, particular frequencies, or ranges of frequencies. Like everything else, equalization is a tool that can have many purposes, but typically, you would use equalization to amplify desirable sounds, reduce unwanted sounds, and in general, filter a track to produce the overall sound that you desire. As a practical matter, what you do with an EQ is try to get rid of all the unwanted augmentations to a signal that occur when you a record a sound. So for example, if you record an acoustic guitar with a microphone, you’re going to pick up sounds that you simply don’t typically hear when someone is actually playing guitar in front of you. This is unavoidable, because a mic is not a human ear, and there’s a lot of changes taking place between the performance, the electrical signal generated by the mic, and the digital signal recorded on your workstation.
The built-in equalizers in Logic Pro X are great, but I found what I think is an incredible equalizer program that you can download for free, called TDR Nova, that also has built-in compression. This program integrates into Logic Pro X rather seamlessly, and it also makes explaining an equalizer simple, because the frequencies are laid out from left to right, in an intuitive GUI. On the lefthand side are the low frequencies, like those generated by a bass drum, and on the righthand side are the high frequencies, like those generated by a cymbal. In reality, a bass drum has a distribution of frequencies, which will certainly include some high frequencies, but the majority will be concentrated in the low end. Similarly, a cymbal could produce some low frequencies, but the majority of the signal will be concentrated in the mid to high frequencies.
The utility of an equalizer is that it can normalize a signal to your expectations for the sounds in question. So, for example, if you record a cymbal, and there’s a ton of low frequency in the signal, then you probably don’t want to leave that in, because that’s not normal for a cymbal. That is, if you hear someone play a cymbal, there won’t be a significant amount of low end, but instead, you’ll generally hear a crashing sound, concentrated mostly in high frequencies. You can then use an EQ to reduce the volume of only the low frequencies, thereby bringing the recorded signal closer to what you’d normally hear in real life. In this case, this is done by simply moving one of the blue dots in the interface above to the frequency range that you’d like to reduce, and dragging it downward until the sound you hear matches your expectations. Again, there’s no formula for this, and you learn by doing, and by listening to what sounds good. That said, the general rule is that if you’re using equalization as a tool for normalizing a signal, then the overall purpose is to shape the frequency distribution to match what you’d typically hear in real life.
When you’re working in a professional recording studio, you will typically record music in an acoustically insulated environment, where the floors, walls, and ceilings have all been treated to minimize the amount of echo. This is done deliberately, so that the only sounds recorded by a microphone in the space come from the source of the sound itself, and not any secondary echoes. This allows for great precision in recording the actual sound produced by an instrument, or vocalist, but it also creates an incredibly unnaturally sounding environment, where there is no echo at all. As a result, signals recorded in these types of environments require processing, to create the impression of an echo that would otherwise have been produced in a normal room. Even if you’re recording at home, you’ll probably want to insulate nearby walls to reduce echo as much as possible, absent a conscious decision to make use of the sound of the room you’re in. Given the high quality of audio software like Logic Pro X, even seriously successful artists like Billie Eilish and Mø are recording at home, using makeshift insulation.
To compensate for the lack of a “room sound” due to insulation, you can use reverb, which is really just simulated echo. Logic Pro X comes with a great reverb program called Space Designer, which I use all the time, and it has a simply enormous number of preset room sounds that you can tweak to achieve whatever sound you’re looking for. The amount of reverb that you use will probably be determined by the sound of the track in question. So for a drum kit, which is extremely noisy, I would generally use only a modest amount of reverb, because it starts to sound awful very quickly, as cymbal hits sustain too long, and kick drums sound like war drums, etc. In contrast, if you’re mixing a single, less volatile instrument, you can probably be generous with the amount of reverb that you use, because the consequences won’t be as dramatic.
As an example, this is a viola sonata that I wrote, that is unaccompanied, and uses a fairly generous amount of reverb. You can clearly hear the notes sustain, and echo, long after they’ve ended, creating the impression that the piece is being performed in a large hall. However, because the actual viola signal is quite loud in the mix, you get the sense that you’re standing near the performer. So the net perceived result is being close to a solo viola performance in a large hall.
Mixing a Track
The ultimate goal of music production is to create a finished product that allows for all of the individual tracks to be perceived clearly by the listener, yet nonetheless conveys a sense of oneness, where the individual components contribute to a single, coherent work. As a practical matter, this will require constant revision, and balancing what you’d prefer an individual track to sound like in isolation, with what it actually sounds like when the rest of the instruments are included. There is of course no algorithm that I’m aware of for doing this, but there are some general guidelines that I use as a practical matter. As an example, I’ve chosen Radiohead’s, “Paranoid Android”, since it makes use of a simply insane variety of individual track sounds. You can get an intuition for mixing by thinking about how you’d manage something like this, producing a finished product that has a balanced sound, with a dynamic, but bounded volume.
The most basic question you have to answer when mixing, is how loud you want each component track to be. As a practical matter, you should have some rough sketch of what you expect the mix to sound like, which will require answering basic questions about how loud, e.g., a particular guitar track should be. As a general matter, for pop music, you want to make sure that the focus is on the vocals, absent a conscious decision to relegate them to the background.
Positioning in EQ and Panning
Simply adjusting the volumes of the tracks probably won’t be enough to achieve the mix you imagine, because there will be too much overlap in frequencies between the individual tracks. Stated differently, two tracks that have roughly the same distribution of frequencies will be hard to discern between, even if you make one louder than the other. This is because there’s a crowding out effect that occurs when you have too much sound in a particular frequency range, and as a result, you need to try and apportion both the panning spectrum and the frequency spectrum of a mix among the individual tracks.
So if you think about panning, there’s everything from hard left to hard right, and you want to position instruments somewhere in that range. This is a tool that lets you partially isolate instruments that might otherwise crowd each other out. So, if for example, you have two very similar sounding guitars, you could pan one hard left, and the other hard right. As a matter of style, you probably don’t want to do this, because it’s super cheesy, but it highlights the principle, which is that you can use panning as a means to isolate instruments that would otherwise overlap in frequency.
Similarly, you can assign certain instruments bandwidth on the EQ spectrum, but this is better suited for creating distinctions between dissimilar instruments. For example, a tom drum is going to have a lot of low frequency, whereas a flute will have a lot of high frequency. As a result, if you want to isolate these two instruments, you can reduce the high frequencies in the tom drum, and reduce the low frequencies in the flute, thereby reducing the overlap between the frequency distributions of the two instruments.
Obviously, this becomes a very complicated balancing act when you have a large number of instruments, with overlapping frequencies that perhaps aren’t even really constant. But this is why mixing is an art, and producing a listenable final product from a large set of disparate tracks is a non-trivial undertaking, that requires great patience, and a willingness to compromise.
Mastering a Track
Once you’ve mixed a track, this will produce a stereo audio file that has exactly two channels, a left channel and a right channel. You can use the parlance, and say that you “bounced” a mix down to two tracks, which means that you’ve taken some large number of tracks, that each have their own processing, like compression and reverb, and effectively printed the net result to a file that contains the sounds that you hear when you actually play your mix. This is, however, not the final product, as the next step is called mastering, which requires achieving sufficient loudness, ensuring there’s no distortion, and ensuring that the overall frequency distribution is balanced.
Loudness and Distortion
The purpose of achieving loudness is to allow for listeners to listen to your music at the loudest volume possible, with minimal amplification. This isn’t actually necessary, so long as your mix is reasonably loud, and so you can justifiably say that I’m not terribly interested in loudness, and that I’d rather preserve the dynamism of my original mix. This is something that you’ll actually hear in some recordings of classical music, in many cases making the changes in volume a bit unmanageable. In contrast, most pop songs can be set to a particular playback volume, that produces a roughly constant actual volume.
Loudness is achieved through compression and limiting. Limiting a signal is not terribly different from compressing a signal, and in fact, you can achieve limiting with a compressor, by turning down the threshold, and turning up the make up gain. However, the limiter programs within Logic Pro X are designed specifically for this purpose, making it much easier to achieve this type of compression, which is specifically geared towards making a mix loud. But just like compression can start to negatively affect quality, subjecting an entire mix to limiting can also negatively affect the quality of the entire mix, if it’s too aggressive, which will produce an undesirable, crunching sound.
You can load audio files into Logic Pro X, and use tools to analyze loudness, and I did this with Miike Snow’s song, “Black and Blue”, which achieves a simply astonishing level of loudness, despite maintaining a very high quality overall sound. The main obstacle to acheiving loudness is the crunchiness of a limiter, and actual distortion, where the signal exceeds the allotted capacity, possibly creating a cracking, or popping sound. You can search for distortion analytically, by watching the meter on the output master, which should turn red if there’s any distortion. In contrast, the sound produced by a limiter really can’t be analyzed, and so you just have to listen carefully, and make sure that you haven’t forsaken quality and clarity, in exchange for loudness in your master.
Unlike altering the EQ of a single track, when you’re mastering, you’re altering the frequency distribution of the entire mix, which means that you need to be very cautious in whatever changes you make, since they impact the frequency distribution over the entire mix, and not just one track. You can be a bit more precise using a program like TDR Nova, which will allow you to apply frequency specific compression. That is, you can identify a frequency range, and then apply compression to only that frequency range, rather than the entire mix. This can be useful if you’re mastering beat-heavy music, because it will allow you to take the entire low end, e.g., and normalize the volume, so that if you have a bass drum together with a bass synth in your mix, you’ll get a normalized, bounded volume for both, producing a more coherent sound, sometimes referred to as “glue compression”, because the components are compressed together in the master, whereas they’re treated separately in the mix.
Above is in my opinion a simply wonderful example of top notch electronic music production, by Ronksi Speed and Syntrobic, featuring singer Renee Stahl. It starts out as a somewhat cinematic house music track, with a fairly contained kick drum, and a layered palette of synth sounds, but then opens up at 1:03, with a beautiful piano melody that completely transforms the piece. The vocals are heavily gated, causing them to sound partially muted, producing an overall work of art that is really astonishingly original, and that’s one of the joys of electronic music, in my opinion –
Thinking carefully about this piece, you can start to appreciate the types of opportunities offered by electronic music, which really has no boundaries in terms of timbre, making the production process a real challenge, and quite fun. Though music snobs sometimes shy away from electronic music, I absolutely love it, and think that the audio production in some of the more cerebral house music like this is just amazing. And in particular, Anjunabeats is a great label if you’re interested in getting to know the best of the genre.