Ok, I have tried to quote the original post but for some reason my posts just hangs. Below was inspired by post number #2378 on page 119 of this thread. The post shows how to get JSON from strava and this post is based on that information.
If you click the link supplied by FartKing69 in the Firefox browser it loads a JSON file. You need to be logged into strava when you click the link. You can then save it to a location of your choice. My understanding of the available data are:
- grade_adjusted_speed (in meters per second - trying to estimate what speed would be if
- running on level surface)
- surface (running surface)
- outlier
- resting
- watts
- moving
- velocity_smooth (in meters per second)
- grade_smooth (percent up and down)
- cadence
- distance (in meters)
- heartrate
- altitude
- total_elevation (seems more like total climbing)
- grade_adjusted_distance
- timer_time (time running - if a device is paused it will be paused)
- time (Total time)
If you save the stream in say
c:\strava json
with the default file name streams.json then this R code will load the data and make it ready to be analyzed.
library("rjson")
setwd(r"(C:\strava json)")
myDataFile <- "streams.json"
myData <- fromJSON(file=myDataFile)
#we need to modify watts as it can contain null values and they need to
# be converted or it causes problems with creating the dataframe
myData$watts <- sapply(myData$watts,function(x) ifelse(is.null(x),0,x))
#we don't need the lat and long data and it is a list so it doesn't feed into
# the dataframe well either so lets drop it
myData <- myData[-c(which(names(myData) == "latlng"))]
jsonDF <- as.data.frame(myData)
head(jsonDF)
To run say a regression predicting heart rate from velocity and watts the code is:
reg1 <- lm(heartrate ~ velocity_smooth +watts,jsonDF)
summary(reg1)
For predicting heart rate from velocity and grade, while including an interaction term the code would be:
reg2 <- lm(heartrate ~ velocity_smooth +grade_smooth + grade_smooth:velocity_smooth,jsonDF)
summary(reg2)
For the day linked on page 119 the results for reg2 are
Call:
lm(formula = heartrate ~ velocity_smooth + grade_smooth + grade_smooth:velocity_smooth,
data = jsonDF)
Residuals:
Min 1Q Median 3Q Max
-42.082 -9.754 -3.288 7.014 70.043
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 99.04824 0.20168 491.125 <2e-16 ***
velocity_smooth 5.51671 0.09050 60.958 <2e-16 ***
grade_smooth 0.30088 0.02949 10.202 <2e-16 ***
velocity_smooth:grade_smooth -0.01373 0.01479 -0.928 0.353
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 14.67 on 53240 degrees of freedom
Multiple R-squared: 0.06918, Adjusted R-squared: 0.06913
F-statistic: 1319 on 3 and 53240 DF, p-value: < 2.2e-16
Both velocity and grade are significant predictors. The interaction is not significant.
For each increase in speed (velocity) in meters/second heart rate goes up 5 points.
These two only account for about 6% of the observed variability in heart rate which strikes me as low but I don't have anything to compare it to.
The overall correlation between HR and pace is
cor(jsonDF$heartrate,jsonDF$velocity_smooth)
[1] 0.2384432
We can also use the grade adjusted speed which should be a good indicator of effort
cor(jsonDF$heartrate,jsonDF$grade_adjusted_speed)
[1] 0.2642804
All of these are just for this data that comes from the link in the original post. I don't know what this really tells us about this run across america but I had some time today and wanted to see if I could get it working and make this available in case it was useful.
I would be interested in any links to comparable runs by other folks. I don't really know anything about Strava.