Using an iPad as a second PC monitor for subjective tests involving video

IMG_20141015_011841611

After scouring the internet for a way of turning my iPad into a second screen for my laptop I came across two applications which seemed like they could provide exactly what I needed:

  1. Air Display

  2. Twomon USB 

In order to make a choice between the two it was necessary to try them out.

1. Air Display

airdisplay

This app allows one to connect a computer wirelessly to an iPad/Android device. I found it very easy to use, once installing the app on the iPad and downloading the “Display Host” onto the laptop. This initial promise however was thwarted once I had the Pro Tools session up and running and had dragged the video window onto the iPad screen.

Unfortunately the wireless connection did not allow quick enough/large enough data transfer to allow for seamless video playback. The ‘lagginess’ experienced was enough to force me to decide against using Air Display for the subjective tests. Too many times the video would flit from being in and out of sync with the audio, completely ruining the effects I had painstakingly created.

2. TwomonUSB

mzl.nemntptj.75x75-65

As the name suggests, TwomonUSB works through connecting your mobile device to your computer via a trusty old USB cable. Like Air Display, in order to work, TwomonUSB requires you to download an iOS app as well as software to turn your computer into a server.

In comparison to Air Display, video playback on the iPad was smoother however it was not close enough to the HD playback achieved on the laptop. At times I also felt there was a slight lag also.

In conclusion both apps work well in converting your iPad into a secondary computer monitor, with TwomonUSB having the edge in terms of stability and data transfer speed. For video playback however Air Display is not really a viable option, whereas TwomonUSB provided an adequate performance to be able to enjoy a film, just about.  Neither provide smooth and reliable enough playback to be used for subjective testing as they both demonstrate instances where the video lags behind the audio, causing synchronisation problems.

In the next post I will outline the alternative method I have devised…

Posted in Apps, MSc Audio Production, Review, Subjective testing, University of Salford | Tagged , , , , , , | Leave a comment

Testing Equipment – Break down

The following equipment will be used in the completion of the subjects tests  to compare binaural with conventional stereo:

1 x Windows Laptop

This computer will run the test. Using Pro Tools 10, the video will be imported, as will finalised mixdowns of the binaural and and stereo mixes into the same session. Using the mute and solo buttons, subjects will be able to toggle between the different mixes as they watch the video.

This method will ensure that the test is fair through using the exact same visuals for each of the mixes. Subjects will be able to listen intently to try and decipher any noticable differenes between the two.

1 x VRM Box

The VRM box is a high quality audio interface. It has been chosen for this test for the following reasons:

  • Portability – This interface is very light and fits easily into the palm of your hand. It is USB-powered therefore removing excess hassle and bulk that comes along with having to accomodate an extra power supply.
  • Quality – The VRM box provides a great improvement when compared with the on-board sound of the laptop which will be running the test. With a dynamic range of 108 dBA and very low noise and distortion figures, VRM Box delivers the required sound quality for implementing the test without compromise.

1 x iPad

The video will be displayed on the screen of an iPad. The stimulus for this subjective test was a cultural observation which supposed that it is becoming more and more common for people to eatch films on their mobile devices using headphones to listen to the film soundtrack in order not to disturb other around them in such situations as commuting to work on public transport. For this reason it was decided that ensuring the video was displayed on an up-to-date mobile device was important in order to add relevance to the test but also to further ensure that the test is fair. An alternative would be to display the video on the laptop screen at a size comparable to a Tablet/Smart Phone, however the visual effect is not quite the same and the extra dimensions provided by the laptop screen, even though not being used, may affect subject responses.

I method of connecting the iPad as an extra to the laptop is currently being researched.

1 x AKG K240DF Headphones

These are the most suitable headphones I am able to have access to. Features which the K240DFs have in order to perform well for this subjectove test include:

  • Semi-Open Design – Combining the best of both Open and Closed designs. These AKGs provide good isolation from the outside world (reducing distraction from outside noise and also the chance of disturbing other people) whilst providing the natural, ‘airy’, sound quality which makes open back designs which are superior when it comes to accurate audio mixing and listening.
  • With their flat frequency response, these headphones provide an uncolored sound
  • Created to fulfill the international IRT specification, the K240 DF establishes a uniform quality standard free from environmental variables.
Posted in Audio, MSc Audio Production, University of Salford | Tagged , , , , , , , , , | Leave a comment

Sound localisation and Achieving a sense of distance and depth within a film soundtrack

Using a selection of mixing tools it is possible to forge a sense of distance and depth within a Post-Production sound mix.

EQ

Positioning of an object emitting sound on screen in relation to the camera needs to be paid attention to. In real life the position of a sound source will directly effect the frequency spectrum of the sound which will reach our ears. Wyatt and Amyes in the book Audio Post Production for Television  and Film  state that “As a performer walks away from a camera his voice will reduce in bass content.” (2005 : 222). To try and emulate this phenomenon in post-production Equalisation is employed to create apparent distance and perspective.

Volume

Quite simply, as we all have experienced, sounds created by sources which appear far away are quieter than those created by a source of similar nature which is nearby our particular location, assuming there are no obstacles present to cause diffraction and absorption which significantly alters the characteristics of the sound created by the nearby source .

Panning

Only effective on horizontal plane, amplitude panning can be used to suggest the location of a sound source in terms of distance from the camera. Extreme panning values can be used to suggest a far away sound source coming from “out-of-view” to the far left or far right. For more information on how panning was utilised in the making of the stereo soundtrack for Big Buck Bunny click here.

Reverb

“Natural reverberation helps to tell the ears the direction that sound is coming from, the type of environment that the sound was recorded in, and approximately how far away it is.” (Wyatt and Aymes, 2005: 229).

Sounds which have a more reverberant character are perceived to be situated further away than those which are relatively dry sounds. Additionally the tail of a particular reverb can effect the distance at which a sound is approximated to emanate from, with longer tails appearing to increase the perceived distance, as opposed to shorter tails which by comparison make a sound source seem more upfront.

In the real world, early reflections communicate the position and distance of a sound source, such as the geometry of the room. Scientific audio

Kruszielski, Kamekawa and Marui for their paper The Influence of Camera Focal Length
in the Direct to Reverb Ratio Suitability and Its Effects in the Perception of Distance for a Motion Picture (Kruszielshi et al, 2011) asked subjects to select what they perceived to be suitable the amount of reverberated sound for a presented motion picture. As visual stimuli a video of a saxophone player at two different distances and two different lenses focal length was used. The video was split into 3 key scenes: for Scene (“Frame”) A the player was placed 3 meters from the camera and the lenses focal length set to 18 mm, producing a field of view of 76 degrees. For Scene B the camera was moved 1.5 meters towards the player with the same focal length. The camera was positioned in the same
location as Scene A when creating Scene C, however the focal length was changed to 55 mm producing a field of view of 28.8 degrees. The area occupied by the saxophonist in the foreground of Scene C was the same as in Scene B but with much less visible space in the background area.

From analysing results from two different tests (a direct to reverberated sound ratio (D/R) test and a pairwise comparison made between the three different frames and using three reverb patterns) they found that visual stimuli processed with more distant recorded sounds were perceived to be further than the ones processed with closer recorded sound. They also found that the perceived distance in the visual increases, sounds recorded further away tend to be more suitable. They concluded that “Perceived distance caused by the image plays a major role in the desired amount of reverberation”.

In another study investigating auditory-visual perception of a room width, Larsson et al found that compared to individual senses alone a combined auditory-visual delivered stimulus increased the accuracy of the room width perception. (Larsson et al, 2002)

MSc Project examples

Bunny Snoring Clip

EQ

EQ bunny snore 2

A high-pass filter was added for the duration of this clip to remove unwanted sub/ low frequency noise such as rumble from the Mic stand.

EQ bunny snore copy

At the start of the clip, where the bunny is seen sleeping in the distance, further low frequency information was removed along with some top end. This technique helped to create the impression of the bunny snoring from quite far away in relation to the camera. In the Pro Tools project the above EQ is bypassed as the camera shifts to a closer shot of the bunny in order to provide a contrast between the different viewpoints. Close-up the bunny snore sounds richer in comparison, mimicing what might occur in the real-world (not that youwould ever find a giant rabbit snoring under a tree… a human maybe) as there are less obstacles to alter the sound.

Volume

The following automation curve was created to synchronise volume with camera position whilst the bunny is snoring:

volume bunny snore

As you can see, as the camera moves closer to Big Buck Bunny the sound of his snore is designed to become louder.

Reverb

Listening to the video clip above you will notice that when Big Buck Bunny is seen from a far his snore is much more reverberant than when he is close up. Two different reverbs were used to further imply the distance of the character from camera. For the shots from further away I used the “Field, Wide, Small Hills and Forest” stereo convolution reverb with a tail of 2.7. The level of the reverb send on the ‘snore’ track was automated to decrease slightly as the camera slowly zooms in. For the bunny close-ups the level of this reverb was significantly decreased to give an overall drier feel.

At this point the another reverb was employed to capture the reflections that the sound of the snoring would create within the cave the bunny is sleeping in. This was kept as a mono reverb because the camera is set outside the cave/burrow, therefore an immersive stereo reverb would not have been suitable. As the rabbit ventures out of the cave the level of this reverb is decreased to zero for dramatic effect.  Click here for pictures of the reverb settings as well as more information on how reverb was used during the making of the ‘BBB’ soundtrack.

Flying Gimera

Simliar techniques were used in the creation of distance this section of the film. Lots of the “Field, Wide, Small Hills and Forest” (5.3 sec tail) was applied as Gimera is flung to his demise in the distance. This gave the impression of his yell echoing across the landscape, getting more reverberant and less loud as he disappeared into the distance.

The need for panning was greater in this clip in order to synchronise Gimera’s yelling in the stereo-field to the perceived position of Gimera’s body on-screen. As the visual cuts back to Big Buck Bunny can still hear Gimera to the very far left of the stereo-field, yelling and crashing into a tree (presumably). Below is a screen shot of the panning automation for Gimera’s voice during this sequence.

Pan gimera fly

References

Kruszielski, L,F, Kamekawa, T, and Marui, A. (2011) The Influence of Camera Focal Length in the Direct to Reverb Ratio Suitability and Its Effects in the Perception of Distance for a Motion Picture. Tokyo University of the Arts, Japan. 131th AES
Convention, Convention Paper 8580.

Larsson, P, Västfjäll, D, and Kleiner, M. (2002) Auditory-visual interaction in real and virtual rooms. Proceedings of Forum Acusticum, Seville, Spain, PSY05-004-IP.

Posted in Audio, MSc Audio Production, Post Production, Sound Design, Subjective testing, University of Salford | Tagged , , , , , | Leave a comment

Stereo Panning Method

When it came to panning sound effects created for the Big Buck Bunny soundtrack I felt it was important that the soundfield appear relatively expansive, therefore the full amplitude-panning range was utilised within Pro Tools. Also, because of the rural environments on screenI also felt that the localisation needed to match the action onscreen as closely as possible, i.e. natural, as things might sound in real life in terms of positioning. With this in mind the following method was devised:

Illustration of the panning values used within Pro Tools

Illustration of the panning values used within Pro Tools

Although the full panning range was utilised, only 0-60 Left and Right was used for objects and characters actually seen on the screen. 61 – 100 Left and Right were reserved for things which were present in a particular scene but were out-of-shot. This method gave the impression that the on-screen environments were much wider than the screen-size whilst also providing adequate and pleasing left and right separation for objects in view.

Automating the panning accurately to best mimic on-screen action was a long and time consuming process. Endless hours were spent synchronising sounds with animated movements: zooming-in and aligning panning markers to individual frames at perceived positions on the panning scale.

Panning Automation of sounds for Big Buck Bunny

Panning Automation of sounds for Big Buck Bunny

Overall I am pleased with the results.

Image | Posted on by | Tagged , , , , | 1 Comment

Video Clips Chosen for Subjective Testing

In the time since my last post much progress has been made. The stereo mix is finished and the design for the subjective testing phase is well on the way. In this post I will reveal the video clips from Big Buck Bunny that I have chosen for testing, and the reasons behind certain decisions that have been made with regards to them and the test itself.

The Clips

4 clips in total will be used for testing. Each clip is quite different in terms of action and it is hoped that this broad range of scene types will be able to effectively showcase the 3D audio capabilities of the Wave Arts Panorama plug-in. The clips will provide data for comparison at the analysis stage from which it is hoped conclusions can be made as to the type of scenes Panorama/Binaural audio can provide the most noticeable enhancement to the film soundtrack (if any), e.g. fast action or slower paced scenes. This comparison is additional to the main aim which is essentially Conventional stereo vs. Binaural.

Recommendation ITU-R BS.1534 – A technical standard for the subjective assessment of intermediate quality level of coding systems – states that the length of test sequences should not exceed 20 seconds in order to guard against listener fatigue and to reduce the overall length of the test. For this reason the duration of each clip was kept short (between 20-25 seconds long). It was my initial intention to strictly keep to the 20 second recommendation, however this proved difficult as it seemed chopping some sequences to 20 seconds spoiled the flow of the action: often cutting scenes short at points which seemed unnatural. It is believed 25 seconds is still short enough not to cause listeners too many problems. ITU-R B.S1534 has relevancy to this study as, like the differences in quality between audio coding systems, the differences between conventional stereo and binaural here are subtle, and in some cases very hard to perceive.

Below you will find each video clip which will be used for subjective testing. I had my reservations about posting them as I do not want to jeopardise my test in any way by potentially revealing test material which could void results. Here however I do not think there is any risk as the audio has been completely removed. Also Big Buck Bunny is a widely available and we known film which many people have seen before. Merely posting some muted clips shouldn’t harm the experiment, and there is the added comfort that no one actually reads this blog, at least the population from whom I will be taking test subjects from don’t!

 

 

Posted in Audio, MSc Audio Production, Post Production, Sound Design, Subjective testing, University of Salford, youtube | Tagged , , , , , , , | Leave a comment

Which reverb?!

This was a question that for a while completely stumped me whilst mixing my soundtrack for BBB. I had never worked on a film set in the outdoors and with only one previous Post-Production work to my name which was set completely indoors I had no idea of the acoustic properties of fields or forests having shockingly never really paid attention whilst out on walks. Forests I could kind of guess as being fairly reverberant with objects such as tree trunks, branches, shrubs and leaves, to reflect and absorb sound, but how about a large open field surrounded by a forest and hills? With hardly anything in the way of reflective surfaces would there be any tail?

I searched through my plug-in collection for one which might provide me with some inkling as to what settings would be required to recreate the acoustics of these two environments only to find one Ableton preset entitled Forest which confirmed that a forest reverb is potentially darn big. After trying it though I wasn’t all that happy with it. I found myself yearning for something more natural sounding and so after scouring the internet for some much needed advice I began to notice a couple of words surfacing time and again. Those words were Convolution Reverb.

Not having come into contact with with Convolution Reverb too many times before (apart from the excellent Altiverb from time to time) I felt the need to research further to gain a better understanding of this particular method of simulating reverberant spaces. In rather basic terms, to create a convolution reverb setting an Impulse Response (“IR”) of a particular room or area is captured. This is done by recording how a space responds to a sound possessing adequate amplitude and the full audible frequency range in order to fully excite its acoustics. The two most common types of sound source used are Sine wave sweeps and sounds with sharp transients, e.g. starter pistols, popping balloons or snare drum crack. As explained by designingsound.org, Sine sweeps covering the entire audible frequency spectrum tend to be the preferred method of creating IRs because they provide increased accuracy. “Depending on the length of the sweep, they usually provides the best signal-to-noise ratio” (longer length =  greater signal-to-noise + greater chance of recording resonances). Transient sounds have one advantage in that they require no post processing once recorded.

Put simply Convolution Reverbs essentially recreate the reverberant behaviour of a real-life acoustic space. Curtis Roads (The Computer Music Tutorial) describes convolution in audio terms as:

“Convolution of two audio signals is equivalent to filtering the spectrum of one sound by the spectrum of another sound. Convolution of spectra means that each point in the discrete frequency spectrum of input a is convolved with every point in the spectrum b.”

In the case of convolution reverb, within a designated piece of software such as Altiverb or Avid’s TL Space, each sample of the incoming audio which is to be processed (reverberated) is multiplied (convolved) by the samples in the impulse response file of the acoustic space you wish to reproduce.

With this in mind I thought I would try out Convolution Reverb for BBB. In my plug-in collection I already possessed good Convolution Reverb software in the from of Waves IR-1 / IR-L. The pre-loaded IRs however didnit really suit forest/field imagery so I decided to purchase the “Outdoor Impulse Response” bundle by Boom Library. From the 68 locations I managed to choose two which most complimented the animation. For the field scenes I used the “Field, Wide, Small Hills and Forest” IR, and for the forested scenes I chose “Forest, Plane”. A nice addition to the IRs was the inclusion of pictures of the locations they were recorded:

Boom library Outdoor Impluse Response – “Field, Wide, Small Hills and Forest”Field, Wide, Small Hills and Forest (50.351109,6.640162)

Boom library Outdoor Impluse Response – “Forest, Plane”Forest, Plane (50.941718,6.729813)

How reverb was used

On the whole reverb was used quite conservatively. For close up shots reverb was sparingly added in an attempt to imitate how things would sound naturally. In Forest/Field locations quite often reverberation is not consciously noticed when walking and talking alone or in close vicinity to other people. In fields the acoustics close up actually sound almost dead. More reverb was added to loud sounds such as bangs and crashes and also when a sound source appeared further away from camera (further = more reverb), For the field reverb I ran two versions: the original with a decay of close to 6 seconds, and a tweaked version with a decay of 2.7 seconds. The tweaked version was used mostly for action close to camera where as the original was saved for louder and distant noises. As the film is an animation this created scope to use reverb creatively in ways to achieve effects that would not occur naturally. Large doses of reverb were applied for dramatic effect at various points, such as when Frank brutally murders a second butterfly. Here the largest field reverb was employed, with tail exaggerated so that the sound echoed for a long time, bouncing off distant hills and trees. When this was used in conjunction with muting all ambience the resultant effect emphasised the wickedness of Frank’s action and the shock of Big Buck Bunny and surrounding birds not camera, but who all stop their chattering at that moment.

As well as the main reverbs already described, there were others which were used for specific moments during the film. For example:

  • Bunny snoring in burrow – A large, cavernous reverb created in Waves RVerb. This reverb was kept in mono because the camera is set outside the burrow, therefore an immersive stereo reverb would not have been suitable

cave 1

  • Gimera walking through hollow tree trunk – A short, small reverb created in Pro Tools DVerb to capture the crampedness of tree trunk. Again this sounded best and more realistic in mono.

treeverb 1

  • Slow motion flying butterfly – Another large reverb, this time in stereo to portray the sheer downward force created by the butterfly’s wings. A large tail seemed to match the unrealistic image of a butterfly appearing so large and flying so slowly.

Big rev 1

 

 

 

 

Posted in Audio, MSc Audio Production, Post Production, Sound Design, University of Salford | Tagged , , , , , , , , , , , , , , , , | Leave a comment

The creation of BBB background ambience part 2

As touched upon in the previous post, the ambience track needed several adjustments to make sure it functioned correctly.

The main issues to contend with were:

  1. Unwanted noises, including clips and pops
  2. Relatively high noise floor

Here I will demonstrate some methods I used to reduce the effect of undesirable artefacts.

  1. Unwanted noises, including clips and pops 

    This was done by using fades in Pro Tools. Each time an undesirable noise emerged I split the audio clip, cut back some of the audio either side of the noise and then dragged and joined the resultant clips together until no trace of it remained. Crossfades were then employed to prevent any potential popping and to ensure smooth transitions. Both “equal-power” and “equal-gain” crossfades were used at various points, although “equal-power” more often than not provided the most level-consistent fades. The most probable reason for this is due to a higher presence of clips which when put together are not phase-coherent. “Equal power” fades ensure the overall volume is maintained through a crossfade, avoiding any sudden dips in overall volume. “Equal-gain” crossfades seemingly work best when employed to link phase-coherent clips together, such as recordings of the same sound source. “Equal-power” fades in this instance would cause sudden increases in volume at the centre of the crossfade. More insightful information on how best to use crossfades in Pro Tools can be found in this SoundOnSound article.

    This method was sufficient for removing most unwanted noises of short duration. For removing low frequency rumble caused by wind another technique could also be utilised however. Unfortunately on the days I visited the forest were fairly windy and so even with windshields fitted to the microphones, wind was still able to make its way on to some recordings, tarnishing them for periods of up to 5 minutes. To save removing large sections of audio which otherwise would have been usable, I instead turned to high pass filters to help me improve them. Through careful listening I found that rolling off frequencies after 90-150Hz adequately removed rumbling and other low frequency sounds without harming the natural characteristic of the recordings. Fabfilter Pro-Q was my go to EQ plug-in for this project because of its built-in spectrum analyzer with Pre-EQ, Post-EQ and SC modes. These graphical aids enabled me to accurately determine problematic frequency bands and thus treat them effectively.

  2. Relatively high noise floor

As mentioned previously, hiss was a little too audible on these location recordings, to the point where it became a distraction during first attempts to integrate the ambience tracks with the other sound effects. EQ alone did not do enough to reduce the sound due to hiss containing energy across the whole audible frequency spectrum and so it was decided that “heavier duty” measures were needed in the form of tools found within Adobe Audition 1.5.

Although over 10 years old, Adobe Audition 1.5 contains some excellent tools for removing unwanted audio artefacts. After experimenting with a few these, it was the “Hiss Reduction” tool which produced the most convincing and natural sounding results.

nr-hissreductionThe following procedure, which required a significant amount if time spent tweaking, eventually proved a successful:

  1. Capture Noise Floor – To create a graph that most accurately reflected the noise floor of a particular clip, I selected audio containing only hiss/ the least amount of desirable audio, and then clicked Get Noise Floor.
  2. Reduce hiss – Moving the Noise Floor Adjust slider further to the right removed more hiss, however care had to be taken to find a point at which distortions and odd bubbling effects were not also created. The Hiss reduction tool works to lower the amplitude of a frequency range if it falls below the captured the noise floor. Audio in frequency ranges that are louder than the threshold remain untouched. It works on higher frequencies, where hiss generally occurs, therefore removing more hiss culminates in a reduction of frequencies from other sounds, causing almost metallic sounding artefacts. Care had to be taken as the most common sounds within the location recordings were bird calls which naturally contain a lot of high frequency information. The Reduce By function allowed me to set the level of hiss reduction for audio below the noise floor. Higher values (above 20 dB) resulted in more dramatic hiss reduction can be achieved. The level at which Reduce By was set varied quite a lot between different clips as some did not contain as much his as others.

I always erred on the side of caution whilst removing hiss or noise. Overall I am happy with the results.

 

 

Posted in Audio, MSc Audio Production, Post Production, Sound Design, University of Salford | Tagged , , , | Leave a comment