Validation: A Historical Narrative Review of PUSH Band 1.0 Research (PART 1)
By Chris Chapman
May 08, 2019
NOTE: This is a dynamic document and will be updated as new research is published.
Whenever a coach or scientist is interested in using a piece of technology in the daily training or testing environment, in its most simple form they need to know if it does what the company says it does. This is known as construct validity, or "the degree to which a test measures what it claims or purports to be measuring”. In this series we are going to explore the science of weight room technology validation, the current and historical state of the literature, in addition to the processes we use here at PUSH.
This three part installment will focus on the PUSH Band 1.0, including the literature available-to-date, how this data is used to improve the product, and how PUSH worked with researchers to dial in methodologies based on the results of these external studies. The variables used to assess validity and reliability will be introduced as they appear in the literature, as you will see a very clear progression in the statistics and selected criterion used to assess the device.
Ideally, any company building technology wants external assessment of their products to be most favourable. However, as a scientist I like to take a different approach. Instead of trying to affirm the positive (confirmation bias anyone?) I want researchers to poke holes in our product and attempt to refute the hypotheses we have around the things we are trying to measure. This gives us an unbiased assessment of our current state and highlights gaps that could be occurring in real world use cases. If published data don’t look favourable, then we reflect and assess potential causes. We try to replicate internally what is done in these studies to see if we can reproduce the same results. In cases where we can reproduce and it is the hardware or software itself, we have just found a problem and can immediately work to create a solution. In some cases we have seen it can be a methodology issue within the study itself. In this case we keep the lines of communication open with the external researchers in order to ensure best practices are used and it's an apples-to-apples comparison as much as possible. In working with 50+ research groups worldwide to date in the last three years, I’ve been surprised at the lack of consistency in the methodologies used across the board. If I speculate why, kinesiology and exercise science is influenced by the silos of biomechanics-, physiology- and strength-focused labs, each looking at the questions with a different lens, toolbox and skill sets. We will expand on that topic in a later post. With that being said, we want to ensure that external (third party) research is as unbiased as possible, which is why we don’t pay for research to be published and don’t try to control or manipulate unfavourable results. While in the short term this may not be ideal for us, in the long term this process will serve us well on the quest for continuous improvement (KAIZEN).
We often forget that no technology is perfect and everything had to start somewhere. As coaches we trust our eyes everyday, yet the literature shows these sensors are worse than almost all of the technology we use on a regular basis, a consideration that should not be taken lightly. Those pushing the bounds of innovation and trying to disrupt the current space will have to iterate many times to get it right. There is always a nexus of tool accuracy/precision, cost, usability and skills required for operation. Not every tool can serve every purpose.
One more thing to note before diving in is that the Band 1.0 version of the PUSH device being discussed has been off the market since March 2018 when we released the next generation hardware Band 2.0, yet there are still papers being published to date with two coming out in the last month few months. This highlights one of the biggest current issues in the sport technology space: the peer review literature publication model cannot keep up with the speed of technological progression. For practitioners it is good practice to not only do a literature review, but it's always of value to reach out to companies directly to see what internal scientific processes and data are available. Any company pulling their weight will be ahead of, or at least on top of the literature in most cases with their internal research and development.
The following is a chronological summary table of the third party (external to PUSH), peer reviewed published literature regarding the PUSH 1.0 device.
The following is a discussion of the third party (external to PUSH), peer reviewed published literature regarding the PUSH 1.0 device. This post only covers articles 1-3, as the rest will be covered in subsequent posts.
1. VALIDITY OF WIRELESS DEVICE MEASURING VELOCITY OF RESISTANCE EXERCISES
Sato, K., Beckham, G.K., Carroll, K., Bazyler, C., Zhanxin, S., and Haff, G. (2015) Validity of wireless device measuring velocity of resistance exercises. Journal of Trainology. 4:15-18.
This was the first study to be published on the Band 1.0, completed by Dr. Kimitake Sato and the group at Eastern Tennessee State University. At the time the tech validation literature in this space wasn’t nearly as evolved as it is today. As a first assessment of PUSH this paper was well done. It used three dimensional (3D) motion capture which is the gold standard as far as the quantification of human motion (kinematics) is concerned, as most systems can get the standard error of measurement to under 1mm for 3D displacement of a discrete point in the capture space. One thing you will see is that there are still studies being conducted and published comparing tools against other non-gold standard criterion tools. These studies have definitely have value, especially for the competing technology companies, but should not be considered true validation as the errors of these tools can be much larger and less consistent than gold-standard. They typically aren’t discussed or considered in the comparisons and assumption of the criterion measures as “de facto” gold standard is rather common.
With Sato et al. (2015), the first thing people tend to question is the choice of exercises: a DB shoulder press and DB bicep curl. However, it is a rather elegant thought process as talking to Dr. Sato the goal was to assess the linear and angular abilities of the Band 1.0 given the nature of the sensors in the device. This is important given it’s a 3D device and most predecessors were 1D or 2D and were typically only ever validated in a linear 1D fashion. Inertial motion units (IMU) like the PUSH Band contain a tri-axial accelerometer which measures linear acceleration in three planes of motion and a tri-axial gyroscope which measures angular velocities around three axes of motion. Combined this is known as six degrees-of-freedom (6DOF) as you may see some devices use this term/acronym. Having these sensors gives us much more information about the nature of the movement measured and this is why over time they will be able to answer questions and provide data far beyond that which previous generation hardware could provide.
The other thing Sato et al. (2015) did which is not often repeated even today is examine the outputs of two separate devices (one on each limb) to assess inter-unit reliability. Given the Band 1.0 is a forearm wearable only (bar mode wasn’t available until Band 2.0), this question was a great one to ask and answer immediately since the predecessor devices were all barbell based and not body worn. The two PUSH devices showed a ‘nearly perfect’ correlation in their data outputs (r > 0.90) for both exercises tasks, and the data were not statistically significant from each other (cue the MBI camp for effect sizes!). Assessing inter-unit reliability is important for practitioners who are using multiple devices and want to compare athletes and outputs against each other. If the devices are spitting out divergent data this can’t be done, as we don’t know if changes are due to the tool or the user. If the coach or scientist is only looking at an individual and assessing changes from a baseline, this is not as concerning and intra-unit reliability is more of interest. Finally, the two-band, two-arm assessment is also important to see if the arm of choice affects the output of the device, as some individuals will be opposite handed and some exercises will be unilateral. Again the data did not differ statistically with the model used so not a concern for practitioners.
Since the left versus right sides were not statistically different for both the PUSH band and motion capture, data were pooled for further analysis (Table 1 & 2). The PUSH band showed very large correlations to the motion capture in both exercises (r > 0.80). A few interesting observations arise from the pooled data. The angular exercise (6.5 - 7.2%) has about half the error as the linear exercise (12.6 - 14.0%), and the data trend much better when compared to the motion capture (Figure 1). This makes sense given the incorporation of extra signals from the gyroscope in the IMU. Angular and non-linear exercise is one area where IMU’s have an advantage over linear position transducers (LPTs).
With the linear exercise you can also notice a horizontal cut point on the graph Y axis at around 1.5 m/s and 2.0 m/s for average and peak velocity respectively where the data become much less accurate and don’t trend along the line. Take note for now as we will discuss this in a later study.
If we are to critique the study for areas of improvement, when compared to those that are published today the statistical model used isn’t as comprehensive. For an initial assessment of the device this was above and beyond what PUSH needed as a first external assessment, but other (now standard) methods of analysis such as Bland-Altman plots, effect sizes, intraclass correlations and coefficients of variation could be used to further assess the tools output versus the gold standard and within itself. Further to that, pre-testing (a priori) thresholds weren’t set in the determination of validity or reliability. The study is exploratory and descriptive in nature, and again there is nothing wrong with that given it was the first of its kind to assess the tool in question: PUSH Band 1.0.
Used gold standard motion capture as criterion measurement
Angular and linear exercises examined
Two devices used for inter-unit reliability assessment
Not a comprehensive statistical model for validity/reliability assessment
No ‘a priori’ thresholds for validity/reliability determination - descriptive / exploratory
PUSH band 1.0 data were statistically similar between two different units when each was worn on a different arm.
Very large correlations in peak and average velocity were observed to gold-standard 3D motion capture for both linear and angular movement.
Angular exercise has less error in all variables than linear exercise when measured with the PUSH band 1.0.
This first external assessment of the PUSH Band 1.0 was a big win for the PUSH team as the device isn’t far off-the-mark on first attempts at a next-generation device.
2. VALIDITY AND RELIABILITY OF THE PUSH WEARABLE DEVICE TO MEASURE MOVEMENT VELOCITY DURING THE BACK SQUAT
Balsalobre-Fernandez, C., Kuzdub, M., Poveda-Ortiz, P., and Del Campo-Vecino, J. (2015) Validity and reliability of the PUSH wearable device to measure movement velocity during the back squat exercise. Journal of Strength & Conditioning Research. 30:7: 1968-1974.
The second study done on Band 1.0 came out shortly after from Spanish researcher Carlos Balsalobre-Fernandez. You might recognize the name as he has created a host of phone/tablet applications for strength coaches, including the widely used MyJump app, that are accessible and provide great utility for testing and monitoring. His study was the first to assess the device in a full body exercise: a smith machine back squat. The smith machine is used quite often in these types of studies to constrain the movement to only the vertical axis. In theory this increases reliability, while allowing for better comparison versus linear position transducers which are good criterion measures for linear exercise. In this study the T-Force LPT device was used as the criterion measurement.
As far as PUSH was concerned, the results in this study looked even better than the previous study, which was encouraging since the squat is arguably the most common strength lift used with velocity-based training (VBT) methodology at the time. For a start-up company trying to create a new solution to an existing problem, any external validation is validation in itself, and good results are very motivating to keep pushing forward with innovation and improving the product.
Both peak and average velocity trend strongly with the criterion LPT having very large to near perfect correlations respectively (Table 1; Figure 1). However, a bias was observed in the PUSH band and it was larger in the average velocity data versus the peak data. This means that while the data trend the same, the actual values reported are either higher or lower (in this case higher) on average than the actual criterion measurement. For the scientist this matters, as the value is different than the actual value. Knowing this bias can allow one to use a correction factor to transform values into the criterion measurement. However, for the practitioner in the trenches (speaking as a strength coach now) the bias may or may not make much of a difference in the daily training environment. As long as the tool is very reliable, and changes measure to measure will be the same for either tool, even if the actual value is off. This means that test to test you will get the same increase or decrease in values as long as the bias is consistent. Where it can matter to the practitioner is if they are trying to compare to normative data that has been collected historically using another tool. Without knowing the bias values you cannot make the comparison.
Looking at the results further, and being overly critical - the spread of the errors in the Bland-Altman plot aren’t ideal, even though they are within the limits of agreement (LOA) for the most part they are spread out just as much vertically as they are horizontally. Ideally, what you want to see here is a more flat and narrow spread with more data points closer to the average opposed to the square box shape distribution of the data. In contrast looking at the positive - the reliability data (Table 2) are very similar between the two tools. The coefficient of variation (CV) is essentially a ratio of the standard deviation (SD) compared to the mean. Where the SD gives you a sense of the absolute variability in the data, the CV transforms it into relative variability of the data as a percentage out of 100%. This allows multiple datasets to now be compared on their relative variability in the data. Typically for tools and data of this nature <5% CV is desirable and <10% is acceptable. Anything >10% is not considered acceptable within a measurement tool. Intraclass correlations (ICC) and test-retest reliability measures are both ways to assess the rep-to-rep measures within a set. There will be some variability in signal since I would never expect any human to move the exact same speed for every rep at a given load. Even if you tried (we have) this is near impossible to do. In the ideal lab environment, a programmable robotic device could be used to move at the same speed every repetition as this would remove any noise due to human error. However, these types of machines aren’t easy to find, as we haven’t been able to locate one anywhere in our province. Regardless, over 3 repetitions of a human lifting a constant load, the values within the set should be really close. More importantly, reliability computed this way should be compared to the reliability values of the criterion measure. Neither will be perfect given the human noise factor, but ideally you should see similar values. In this case, the PUSH values are very close to the LPT values and all are within acceptable ranges for deeming the tools reliable (ICC/r > 0.95).
A final critique of this study was the fact that a PUSH science advisor was involved in the paper at the time. This is declared in the acknowledgements and this person wasn’t involved with data collection or processing in order to avoid any conflict of interest. While I personally have not been a contributing author to any of the third party research during my time at PUSH as I want external assessment to have zero bias, I am definitely not against the idea of companies doing their own research and publishing their own data. While this is typically done via white papers (non-peer reviewed self-published research paper) or directly funding an external lab, I do think the peer review process would help to ensure the quality of these internal studies are as high as possible. As long as conflicts of interest are declared such as the case herein, I think the additional measure of making datasets available would help to mitigate cause for concern moving forward. Eventually an external lab will do the work (someone always does) and repeat or conduct a similar study, which will either confirm the company’s results or show there was some sort of bias involved to ensure favourable outcomes.
First study to look at the squat exercise using PUSH
Use of Bland-Atlman plots to look at error/bias in velocity measurement
Multiple loads from a wide range of relative strength (25%- 85%)
Smith machine used to control vertical axis and increase reliability
Smith machine potentially less ecologically valid
Linear position transducer used as criterion
PUSH sport science advisor was involved
Very high association between the PUSH Band and criterion LPT for both validity and reliability measures
A systematic overestimation bias in both peak and average velocity values observed for the PUSH Band 1.0
The results were better than the first study so the key takeaway for PUSH was to keep moving forward and keep improving the product algorithms to get the data even closer to criterion measurement.
3. VALIDITY AND RELIABILITY OF THE PUSH WEARABLE DEVICE TO MEASURE VELOCITY AND POWER DURING LOADED COUNTERMOVEMENT JUMPS
Ripley, N. & J. McMahon. (2016). Validity and Reliability of the PUSH wearable device to measure velocity and power during loaded countermovement jumps. In Proceedings of the National Strength and Conditioning Association National Conference, New Orleans, LA, USA. 6-9 July.
This study coming from the University of Salford in the UK assessed the validity and reliability of the PUSH band to measure velocity and power in a loaded barbell jump squat. Participants performed three maximal effort jump squats with a 20 kg barbell. Data was collected using the Band 1.0 in body mode worn on the forearm and a Kistler Force plate. A summary of results (Table 1) shows the PUSH band is very reliable with high intraclass correlations (ICC > 0.90) and low coefficients of variation (CV < 6.5%) within session. The values are very similar to the criterion force plate reliability measures. On the validity side, while the correlations are very large and similar to the previous studies (r > 0.91) and variation of the data is similar (SD), a bias is once again observed with overestimation of both velocity and power in the PUSH Band. The authors do the heavy lifting and provide correction equations if users wish to transform their data into the force plate conversion.
There are some caveats to consider that might be contributing to bias observed. First, at PUSH we use an effective mass model opposed to total system mass model in our force (and therefore power) equations. Since force is mass*acceleration, PUSH measures acceleration directly and the mass is input and scaled to the effective portion based on what is being measured. We will discuss this in more detail in a future post as we currently have two papers on this topic accepted into ISBS and ISB, but moreso understand that it may not be an apples-to-apples comparison. The second is the model utilized in this case. Both devices are using what’s called a point mass model, that is summing up all of the human mass and motion into a single dot in space. Without going into more detail, the force plate is using a center-of-mass (COM) point mass model as that is how velocity is typically computed from force. The PUSH band in this case is estimating the barbell as the point mass in the model since it is being worn on the forearm. The potential disparity in the points in space being measured could definitely account for some of the bias observed here. It’s also possible the time at which peak velocity and power occur from the two different points doesn’t happen at the same time, which could also cause a difference in the observed values.
A more accurate way to assess velocity in this case would be to model the barbell with motion capture and compute the velocity from displacement. It would be the same points in space with respect to the point mass model, so any error in velocity mentioned above would be accounted for. Power however is where this gets tricky. Since you need force to compute power, the more crude way to measure this is use the force plate data combined with the barbell motion capture displacement data to compute work, since it is the integral of force over displacement. Power can then be computed since its the rate of work done. One might argue that the force being measured is still a ground reaction force which is measured at the distal contact point relating to the COM, so there is still an issue with the spatial location of our point mass models (the kinetics and kinematics are using different points in space). A more complex way to do this to use a linked segment model (via a full body markerset utilizing motion capture), then using inverse dynamics combined with the segmental kinematics, one could compute segmental power up the chain. Regardless, doing these types of studies isn’t trivial nor easy, and even writing about the methods here takes us down a rabbit hole of technical biomechanics that loses the layperson.
While constructive criticism is how we as humans improve what we do, it is always easier to critique others work (armchair quarterback, keyboard warrior, etc.). It is also human nature to critique before praise. In praise of this study, it was the first one to assess loaded jump squat, let alone any jump using the PUSH Band. It was also the first time anyone assessed power measures from the device. For us any external assessment of a new movement and/or a new variable measured with the Band 1.0 is a big win. Any time a novel project is done there will always be lots of learning for everyone involved, as well as lots of risk undertaking the project for the authors. One thing the author did here which had not been seen to date was to provide correction equations for users to transform their data if they wanted to compare to a force plate.
In the end, this study gave us confidence at PUSH that measuring jumps were within the abilities of the technology and that further time should be focused on improving the accuracy of this feature.
First study to look at a barbell jump squat using the PUSH band.
First study to assess power measures using the PUSH band.
Correction equations provided for bias observed.
Different methods of computation - potentially not apples-to-apples.
Only a single load was used - may not be able to extrapolate to various loading schemes.
Could benefit from error reporting, effect sizes and Bland-Altman plots for additional assessment of validity.
The PUSH Band 1.0 is highly reliable for measuring both peak velocity and peak power in a barbell jump squat.
The PUSH Band 1.0 overestimates values compared to the criterion force plate.
Correction equations provided for practitioners who want to transform their data to relate to force plate data.
For PUSH, this was another study showing similar data trends of the device in a new exercise. High reliability with observed bias, we considered this another win! Nothing major to fix, keep focus on improving data accuracy.
CONCLUSIONs TO DATE…
Looking across the first 3 studies on the PUSH Band 1.0, they were external validation of the work and effort put in by the team over the products inception and initial years to successfully create a disruptive technology in the weight-room measurement space using IMU hardware. Four different exercises showed similar yet improving trends. There were no red flags and the general message to the team was that we are on the right track. The biggest area for improvement was that the data can get more accurate increasing the validity compared to criterion measures, so the goal moving forward was to focus on providing even more accurate measures in the weight room.
Next post we will look at studies 4-6. Stay tuned!