Note: The work in this study was submitted for peer-review to the Footwear Science journal, while preliminary data were presented at the 15th biennial Footwear Biomechanics Symposium. Since the work is still under review, results will not be presented - however preliminary data can be viewed in the “Can We Predict Cushioning Perception from the Mechanical Properties of Shoes?” slides in the “Featured Conference Presentations” section. Code used in this project can be found under the “Footwear-Predictive-Model” repo on my GitHub.
Overview: Footwear companies often conduct research to investigate how running biomechanics or comfort are affected by the mechanical properties of the shoe. However, little research has been done on how runner satisfaction, or how well the shoe is perceived, is influenced by footwear mechanical properties. Therefore, we sought to build a model that could predict runner satisfaction from mechanical properties, which would be a useful tool for footwear companies. Data in this study were from a database compiled across multiple studies in the Brooks Running Research Lab in which subjects ran in shoes on a treadmill and then answered questions related to how well they enjoyed the shoe. The database contained information on 87 subjects, mechanical testing data on 61 unique shoes, and satisfaction ratings from 615 subject-shoe pairings. Satisfaction in a shoes was defined in 3 ways: degree-of-satisfaction on 7-point Likert scale, overall satisfaction on a 3-point Likert scale, and willingness-to-purchase the shoe (yes/no response). Overall satisfaction was derived from the 7-point Likert scale by collapsing all dissatisfied scores (1-3) into a single score and all satisfied scores (5-7) into a single score. Random forest and elastic net logistic regression models were built using each definition of satisfaction as the outcome variable, for a total of 6 models. Predictors used for the models were the mechanical properites measured for each shoe, in addition to subject age, gender, and body mass. For each model, an 80/20 split was used on the data to obtain training and validation sets in order to build the models and then test accuracy, defined by no-information rate (proportion of data belonging to the largest class). All analyses were performed in R, with models built using the ‘caret’ package.