dNm: data cleaning test: Mad Scientist Club of Forest Grove

dNm: data cleaning test

29th July 2019 at 6:20pm

Word Count: 564

We were worried that this was cheating (if we let ourselves just "fix" the data so that it "shows what it should" we aren't doing science anymore). After a somewhat heated debate we decided to have a sub-team try the idea, with a single preregistered hypothosis & method, as follows:

we believe the oscillations are caused by a resonant frequency (bounciness) in the beam
we expect it to have a single fundamental frequency and we should be able to measure that frequency by various techniques and get a consistent answer
taking a rolling average of the data at that frequency should give us a more accurate picture of what really happened
If we see any reason to doubt any of the above, we abort the project and report back.

Step one: we expect the beam has a resonant frequency around 4Hz.

We confirmed this by whacking it and watching it wobble up and down for a few seconds and counting the bounces and counting the seconds. This matched the frequency we expected from looking at the graphs.

Step two: We should be able to measure the frequency by different techniques and get a consistent answer

We zoomed in on the graphs and measured the peak to peak distances
We wrote a small ruby program to compute the distance between the minima in the decaying average value (we used the decaying average to smooth out high frequency false minima and platues (flat spots)) since it doesn't have a frequency of its own like a rolling average would.

We got consistent answers within each data set (and consistent with each other when we adjust for the fact that one was shot in slow motion and so has a lot more frames per second. It was also consistent with our expectation from the physical test.

Step three: taking a rolling average

We decided to do the averaging in the plot (this may have been a bad idea because it's pretty cryptic) by using a trick we found on the internet:

min(a,b) = a >= b ? b : a
samples(n) = min(int($0), n)
avg_data = ""

sum_n(data, n) = ( n <= 0 ? 0 : word(data, words(data) - n) + sum_n(data, n - 1))

avg(x, n) = ( avg_data = sprintf("%s %f", (int($0)==0)?"":avg_data, x), sum_n(avg_data, samples(n))/samples(n)) 

plot 'test12_wet.dat' using ($1/24):($2/32-12.5) title "raw" with lines, \
     'test12_wet.dat' using (($1-3)/24):(avg(column(2), 7)/32-12.5) title "de-bounced" linecolor rgbcolor 'blue' with lines

In the last line, the avg(column(2), 7) in the y-value takes the average of 7 frames (which is what we found for the frequency in this data set) and the -3 in the x-value re-centers it (think of the seven samples as being 3 samples, 1 sample, and then three more samples – subtracting 3 from the x-value moves it so that it uses the x-value of the center sample instead of the end sample).

Conclusion: Sweet!

It worked even better than we expected. We lost some off the peaks but that's probably more accurate than the raw data since the momentum of the beam would have carried it past the actual peak – that's how the oscillations got started!