Sunday, 31 January 2016

Fitbit API Data Analysis Using Raspberry Pi, Python and R

I'm still feeling vaguely bad about the machine I made to cheat on pedometer step counting so I felt had to pay penance by doing more Fitbit API data analysis.

I wanted to try and find some more interesting ways to visualise the data I get from the Fitbit API.  My inspiration was a book called “Information is Beautiful”, a book I bought from a well-known South American river book company just before Christmas.  Except from my foray into creating a Sleep infographic I’ve always been a bit conservative in terms of how I visualise data, relying on bog-standard, boring bar charts and scatter graphs.  “Information is Beautiful” has many and various infographics that make analysing data accessible, intuitive and just, well, beautiful!  That was my inspiration, here’s the journey I went on…

Here's what I produced, I'll then tell you how I did it!


I've had my Fitbit Charge HR for just over a year now so I thought I'd "celebrate" by analysing a whole years worth of data from the Fitbit API!  To do this I used the OAUTH2.0 method I wrote about here.

To get a years worth of data I simply had to use the following URL for the API call:

https://api.fitbit.com/1/user/-/activities/steps/date/2016-01-31/1y.json

So this is asking for my step data (activities/steps) for the one year period up to and including 2016-01-31.  The command I ran was:

sudo python fitbit_oauth_request_v1.py > 2016-01-31.json

...meaning the output was redirected to the file 2016-01-31.json.  The content of the file looked like this (after trimming off some initial text that came from the print statements in the Python script):

more 2016-01-31.json
{"activities-steps":[{"dateTime":"2015-02-01"
,"value":"21803"},{"dateTime":"2015-02-02","value":"7324"},{"dateTime":"2015-02-03","value":"10293"},{"dateTime":"2015-02-04","value":"12714"},{"dateTime":"2015-02-05",
"value":"10383"},{"dateTime":"2015-02-06","value":"11496"},{"dateTime":"2015-02-07","value":"17795"},{"dateTime":"2015-02-08","value":"19735"},{"dateTime":"2015-02-09",
"value":"10808"},{"dateTime":"2015-02-10","value":"8897"},{"dateTime":"2015-02-11","value":"10106"},{"dateTime":"2015-02-12","value":"9779"},{"dateTime":"2015-02-13","v
alue":"9850"},{"dateTime":"2015-02-14","value":"12108"},{"dateTime":"2015-02-15","value":"27393"},{"dateTime":"2015-02-16","value":"12992"} 

So a simple JSON structure that has one element per day of the year with a simple step count in it.  I then transferred the JSON file to my PC to process it with R.

I loaded up the JSON structure in R using:

> library(jsonlite)
> stepdata2015 <- fromJSON(file.choose(),flatten=TRUE)

Where file.choose() means the Windows file chooser form is opened to allow you to select the JSON file.  The data looked like this (abridged):

> stepdata2015
$`activities-steps`
      dateTime value
1  2015-02-01 21803
2  2015-02-02  7324
3  2015-02-03 10293
4  2015-02-04 12714
5  2015-02-05 10383

Looking at the type of data I saw:
> stepdata2015[0]
named list()

So not the "data frame" I've worked with in the past.  This was reflected in the fact that I couldn't manipulate the data in a similar way to how I'd done it in the past.  So I turned it into a data frame by doing this:

> stepdata2015_df <- as.data.frame(stepdata2015)

...which made the data look like this (abridged):

> stepdata2015_df
   activities.steps.dateTime activities.steps.value
1                 2015-02-01                  21803
2                 2015-02-02                   7324
3                 2015-02-03                  10293
4                 2015-02-04                  12714
5                 2015-02-05                  10383

Then I graphed the data using these commands:
> library(ggplot2)
> graphval <- qplot(activities.steps.dateTime, activities.steps.value, data=stepdata2015_df)
> graphval + labs(title="Fitbit Step Data - 2015",x = "Day",y = "Steps")

...which yielded this graph:


This is definitely a graph but every single X value and Y value has a corresponding axis label.  Most likely because they're both considered to be text fields.  To make the X axis values to be of type date/time I did:

> stepdata2015_df$TimePosix <- as.POSIXct(stepdata2015_df$activities.steps.dateTime)

Then to turn the Y axis values into numbers I did:

> stepdata2015_df$StepsInt <- as.integer(stepdata2015_df$activities.steps.value)

...yielding:

> stepdata2015_df
    activities.steps.dateTime activities.steps.value  TimePosix StepsInt
1                  2015-02-01                  21803 2015-02-01    21803
2                  2015-02-02                   7324 2015-02-02     7324
3                  2015-02-03                  10293 2015-02-03    10293
                 2015-02-04                  12714 2015-02-04    12714
5                  2015-02-05                  10383 2015-02-05    10383

Which means a much nicer looking graph which understands the X axis as a date and the Y axis as a number and intelligently provides fewer labels:


A nicer graph but really just a random collection of points to my eye.  A bit of reading showed me you could add a smoother trendline to the chart by using a "geom" parameter and doing this:

> graphval <- qplot(TimePosix, StepsInt, data=stepdata2015_df,geom = c("point", "smooth"))
> graphval + labs(title="Fitbit Step Data - 2015",x = "Day",y = "Steps")

Yielding:



...which actually tells the story of my year quite nicely and shows how my step totals are really influenced by how much running I do.  I started 2015 doing a little bit of running, did lots of running up to May/June, then cut back over the summer as I got injured and then did more towards the end of the year and into 2016 as I came back from injury.  In fact, I've been really careful coming back from injury, increasing my weekly KM by no more than 10% and this is reflected in the gradient of the trendline.

I then decided the data needed aggregating into monthly totals and so did this:

> stepdata_2015_agg_sum <- aggregate(list(Steps = stepdata2015_df$StepsInt), list(month = cut(stepdata2015_df$TimePosix, "month")), sum)

Yielding (abridged):

> stepdata_2015_agg_sum
        month  Steps
1  2015-02-01 350767
2  2015-03-01 385209
3  2015-04-01 385578
4  2015-05-01 477423
5  2015-06-01 391484

I also decided to create my own infographic to visualise my month-on-month step count.

To calculate how many footsteps I needed to show on my visualisation I added some summaries:

> stepdata_2015_agg_sum$tenthoublocks <- stepdata_2015_agg_sum$Steps / 10000
> stepdata_2015_agg_sum$footsteps <- round(stepdata_2015_agg_sum$tenthoublocks, digits=0)

...yielding (abridged):

> stepdata_2015_agg_sum
        month  Steps tenthoublocks footsteps
1  2015-02-01 350767       35.0767        35
2  2015-03-01 385209       38.5209        39
3  2015-04-01 385578       38.5578        39
4  2015-05-01 477423       47.7423        48
5  2015-06-01 391484       39.1484        39

I then opened the data in Excel to graph it (or create a pictograph to use the proper lingo).  Using this website to tell me how to create charts with images instead of boring bars I came up with the chart below.  Each foot represents 10,000 steps:



I then thought I'd create my own!  Each step on the “path” below represents 10,000 steps and I did it by manually copying, pasting and formatting in Excel:


Notwithstanding that months are of a different length, the infographic does nicely tally with my 2015 running profile of running a bit (Feb to April), running a lot (May - too much really), getting injured (June to October), getting back into running (October to Jan).  It’s not the most beautiful infographic in the world and Mrs Geek thinks the footsteps look like butterflies but I’m happy with it!!

I think the standard Excel generated one was just fine!

Sunday, 24 January 2016

Google API Access Using OAUTH2.0, Python and Raspberry Pi

This is a post about using Python to retrieve data from a Google API.  For a long time I've been aware that there were lots of Google APIs that gave access to lots of delicious data but I've not got around to playing with them.  Finally I found the time.

I decided to try to get access to the blogger API (i.e. to give information about this blog!) using OAUTH2.0 and Python.  The process of accessing the API is similar as for the Fitbit API I blogged about recently.  So in simple terms the procedure is:

  1. Register your app, specify access to the Blogger API and get your credentials.
  2. Using OAUTH2.0 to get permission from the user to access their data (and get an authorisation code).
  3. Swap your authorisation code for access and refresh tokens and use them to access the API.
  4. Periodically get a new access token using the refresh token.

On their overview pages, Google recommend the use of pre-defined libraries to authenticate and access the APIs.  Who am I to argue so this is what I did!

Part 1 - Register Your App
To do this go to Google Developer Console and login (I assume you've got a Google account).  You see a screen that looks like the image below.  Click on the project list and select "Create a project":


From the Developer Console select "Enable and manage APIs".  Select the API you want (in my case Blogger V3) and then "Enable API".

You'll then be prompted to create credentials for the project (these are the OAUTH2.0 credentials).  To do this I basically followed the Wizard that Google takes you through to specify what you want.  In summary this was:
  • Select "Go to Credentials"
  • When asked "Where will you be calling the API from?" specified "Other UI".
  • When asked "What data will you be accessing?" specified "User data".
  • This told me I needed OAUTH2.0 credentials and so I clicked "Create client ID"
This took me to a place where I could define what details the user (in this case always me) would see when they're asked to authenticate access from my app.  I specified "Blogger API Application" as the name of the application.

Click "Continue" and your OAUTH2.0 credentials are created.  At this point there's the option to download a file containing your credentials.  Do this and rename the file to be "client_secrets.json", (you'll need this later).  

Part 2- Everything Else!
As stated before, the nice people from Google recommend that you use pre-defined software modules to authenticate and access the API.  Sounds like a cunning plan so, as a Python man, the first thing I did was ran this command on my Raspberry Pi to download and install the Python module for Google API access:

sudo pip install --upgrade google-api-python-client

(I'm pretty sure pip was either already installed on my Pi or I installed it in the dim and distant past).

I then create a directory to hold my Python script to authenticate and use the API.  For me this was:

/home/pi/google/blogger

In this directory I placed the client_secrets.json file I downloaded in the earlier step.

I then set about writing the Python script required to authenticate and access the API.  However to my resounding joy I found that there were loads of pre-written scripts on the interweb, including one for the blogger API.  Never one to look a gift horse in the mouth* I set about cribbing** a pre-written script.

(*-An English saying meaning if someone offers you something then take it.  **-Another English saying meaning flagrant copying).

There is stacks of documentation here telling you how to use the Google Python module for API access.  After a quick look at this I selected "Samples" then "Sample Applications" which took me to a Github page.  On here there is a stack of pre-written Python scripts for Google API access including one for Blogger API use.  It's written by Joe Gregorio (jcgregorio@google.com) so all credit to him and zero credit to me.

I copied the script and pasted it into a Nano editor.  To create this file I did:

sudo nano blogger_2.py

Then I ran the script using the command:

sudo python blogger_2.py --noauth_local_webserver

(The --noauth_local_webserver switch was because I was running the script from a remote SSH session).

Here's a screenshot of what I saw in the SSH session:


So I've redacted some sensitive stuff on the image above but you copy the URL that the script presents and paste it into a browser.  You're shown a page like that shown below and you click "Allow" to permit access from your application to your data:


You're then served a page with an authorisation code on it (redacted below).


Copy this and paste it back onto the command line for the Python script.  The script then continue, accesses the API and prints the result!


So there you go. So the script looks at my blogger account, for my user object gets my blogs object then for each of my blogs objects prints out all the posts objects.

If you look in the directory from which you ran the script you can see that a file called blogger.dat is created.  This contains your current access and refresh tokens and so is used and overwritten when new tokens are needed.

To show I'm not a complete cribber and to learn a bit more I set about adding extra lines to the script to get stats relating to my blog.

For this you need the Python API module reference which is here.  Here's a screen shot:

In the script you can see an object called "service" is created using this line:

service, flags = sample_tools.init(
      argv, 'blogger', 'v3', __doc__, __file__,
      scope='https://www.googleapis.com/auth/blogger')

This can then be used to create objects to access different methods within the API.  Such as:

blogs = service.blogs()

...which is then used to get the lists of blogs on the users Blogger account using:

# Retrieve the list of Blogs this user has write privileges on
thisusersblogs = blogs.listByUser(userId='self').execute()

Then it ierates through each blog using:

# List the posts for each blog this user has
for blog in thisusersblogs['items']:

So before it does this you can create an object for page views using:

pageviews = service.pageViews()

(Reference the Python module documentation references above).  Then within the "for blog" loop you can do:

print('The stats for %s:' % blog['name'])
request = pageviews.get(blogId=blog['id'],range='all')
views_doc = request.execute()
print (views_doc)

The range='all' parameter specifies that you want stats for the lifetime of the blog.  The options are:

      30DAYS - Page view counts from the last thirty days.
      7DAYS - Page view counts from the last seven days.
      all - Total page view counts from all time.

...and overall you get output like this:

The stats for Paul's Geek Dad Blog:
{u'kind': u'blogger#page_views', u'counts': [{u'count': u'94521', u'timeRange': u'ALL_TIME'}], u'blogId': u'123456678902'}

The stats for Paul's Blog:
{u'kind': u'blogger#page_views', u'counts': [{u'count': u'22', u'timeRange': u'ALL_TIME'}], u'blogId': u'123456678902'}

Enjoy!





Friday, 15 January 2016

Code.org Hour of Code Exercises are Awesome!

Here's a word -> Awesome.  What's awesome you ask?  The Hour of Code exercises on the code.org website are awesome I tell you!!

Prompted by an excellent teacher at my youngest daughter's school she did a few of these exercises at home before Christmas.  I helped her with a few odds and ends but by-and-large she did it all herself.

I'll give you an overview as to what it's all about and as I go along, I'll tell you why it's awesome!

What is It?
Based upon a set of well known movies, characters or toys you chose a theme to do some coding with.  Here's few examples:


Among other things, my daughter chose to do the Minecraft activities.  I think because it's based upon famous things that kids will have heard of it's more engaging than  "Hello World" or some other more abstract topic.

How Does to Work?
You step through a set of tasks associated with the theme.  First you are shown a video to give you some background (you also get videos as you go along as new concepts are introduced):


Your challenge is then presented to you:


You then get presented with a Scratch IDE that allows you to drag and drop code blocks and run the resulting code:



You drag and drop your code, press "Run" and see what happens.  If your code is correct you get a nice "well done" message.  So really short, snappy and interactive with loads of sound effects, feedback and prompts to make you want to do more.


What Coding Concepts do you Learn?
Different blocks for different actions:


Increasing the complexity of the steps:


Introducing loops, showing that you don't have to create endless code to do repeated steps:


You get a nice error message if you get something wrong.  You can then modify your code to get it right:



You get tips that your code could be more efficient and can even see the detail of the Javascript underlying your blocks:


You learn about IF statements:


You even get a certificate to print out and stick on the fridge!



Conclusion
So here you've learned about the main building blocks of writing code (sequence, selection and repetition) in a fun and engaging way.  I wrote an email to the people behind it just to say thanks and how ace I thought it was.

Two quotes from my daughter:

  • "It's really cool" 
  • "I like how it lets you use the skills you've learned"


Go do it.  Here's the link again!


Monday, 11 January 2016

Fitbit API Access Using OAUTH2.0 and Raspberry Pi

Previously I've blogged on accessing the Fitbit API to do sleep analysis, a sleep infographic and to get per minute data.

This used the OAUTH1.0 authentication method but as of March 2016 that was deprecated.  The requirement is to move to using OAUTH2.0 instead so this post is about how I rolled up my sleeves and did that!

The Fitbit developer site has excellent documentation on using OAUTH2.0 to access the API.  I basically followed this step by step in order to gain access.  I'll describe what I actually did and give code examples below.

Note this is for the general hobbiest, not for someone trying to do this professionally!

Step 1 - Register App
Go to step 1 of my original Fitbit API post and carry out the steps to register you application.  Then come back here.

To do OAUTH2.0 you also need to go onto the Fitbit developer site and:

  • Select "MANAGE MY APPS", select your app and log your "OAuth 2.0 Client ID".  (I assume you've already got your "Client (Consumer) Secret" logged).
  • Then select "Edit Application Settings" and set your "OAuth 2.0 Application Type" to "Personal".

...you're all set to go!

Step 2 - Get an Authorisation Code
Carry out the steps under "Authorization Page" section of the OAUTH2.0 documentation.  This simply means forming a URL, pasting it into a browser and then following the steps on the resulting web page to authorise the app to use your data.

https://www.fitbit.com/oauth2/authorize?response_type=code&client_id=22942C&redirect_uri=http%3A%2F%2Fexample.com%2Fcallback&scope=activity%20nutrition%20heartrate%20location%20nutrition%20profile%20settings%20sleep%20social%20weight

The example URL above is from the Fitbit documentation.  Simply change the client_id to the one you logged above to do it for you.  "response_type=code" means your request type is "Authorization Code Flow".  From reading the documentation this means you get both an access token and a refresh token for using the API; more on this later.

This will result in Fitbit redirecting to the callback URL you specified when registering your application and appending an "Authorization Code" to the end of the URL.  Log this authorisation code for later use.

Step 3 - Get Access and Refresh Tokens
You now need to use your authorisation code to obtain your first access and refresh tokens.  You need to do this within 10 mins of step 2 above.

How to do this is specified in the section "Access Token Request" of the Fitbit OAUTH2.0 page.  To follow these steps I wrote some Python script on my Raspberry Pi to put the parameters together and execute the HTTP POST that you need to do.  Simply take the code below and enter your client ID, your consumer secret, your authorisation code and your redirect URL.

If it works the result will be a JSON response printed to screen that shows your first access token and refresh token.  Log these for use in the next step!

import base64
import urllib2
import urllib

#These are the secrets etc from Fitbit developer
OAuthTwoClientID = "Your_ID_Here"
ClientOrConsumerSecret = "Your_Secret_Here"

#This is the Fitbit URL
TokenURL = "https://api.fitbit.com/oauth2/token"

#I got this from the first verifier part when authorising my application
AuthorisationCode = "Your_Code_Here"

#Form the data payload
BodyText = {'code' : AuthorisationCode,
            'redirect_uri' : 'http://pdwhomeautomation.blogspot.co.uk/',
            'client_id' : OAuthTwoClientID,
            'grant_type' : 'authorization_code'}

BodyURLEncoded = urllib.urlencode(BodyText)
print BodyURLEncoded

#Start the request
req = urllib2.Request(TokenURL,BodyURLEncoded)

#Add the headers, first we base64 encode the client id and client secret with a : inbetween and create the authorisation header
req.add_header('Authorization', 'Basic ' + base64.b64encode(OAuthTwoClientID + ":" + ClientOrConsumerSecret))
req.add_header('Content-Type', 'application/x-www-form-urlencoded')

#Fire off the request
try:
  response = urllib2.urlopen(req)

  FullResponse = response.read()

  print "Output >>> " + FullResponse
except urllib2.URLError as e:
  print e.code
  print e.read()

Step 4 - Make and API Call and Refresh Tokens
This follows the steps described under the "Making Requests" and "Refreshing Tokens" section of the Fitbit OAUTH2.0 document.

In simple terms, you make a request using the access token.  This has a limited lifetime (one hour) so when it runs out you use the refresh token to get a new access token (to use now) and a new refresh token (to get the next access token).

To get data from the API and get new tokens (if required) I used the code pasted in below.  In simple terms this:

  • Reads the current tokens from a text file.  I created this text file on my Raspberry Pi, pasted in my access token, pressed return then pasted in the refresh token.  (Both tokens from step 3 above).
  • Makes a HTTP GET to the API URL using the access token.  If this works, happy days.
  • If the HTTP GET doesn't work, it does a HTTP POST using the refresh token and logs the new tokens to file ready for next time.

To make it work:

  • Edit the IniFile variable to specify where you have your file stored.
  • Enter your client ID and client secret.

import base64
import urllib2
import urllib
import sys
import json
import os

#This is the Fitbit URL to use for the API call
FitbitURL = "https://api.fitbit.com/1/user/-/profile.json"

#Use this URL to refresh the access token
TokenURL = "https://api.fitbit.com/oauth2/token"

#Get and write the tokens from here
IniFile = "/home/pi/fitbit/tokens.txt"

#From the developer site
OAuthTwoClientID = "Your_ID_Here"
ClientOrConsumerSecret = "Your_Secret_Here"

#Some contants defining API error handling responses
TokenRefreshedOK = "Token refreshed OK"
ErrorInAPI = "Error when making API call that I couldn't handle"

#Get the config from the config file.  This is the access and refresh tokens
def GetConfig():
  print "Reading from the config file"

  #Open the file
  FileObj = open(IniFile,'r')

  #Read first two lines - first is the access token, second is the refresh token
  AccToken = FileObj.readline()
  RefToken = FileObj.readline()

  #Close the file
  FileObj.close()

  #See if the strings have newline characters on the end.  If so, strip them
  if (AccToken.find("\n") > 0):
    AccToken = AccToken[:-1]
  if (RefToken.find("\n") > 0):
    RefToken = RefToken[:-1]

  #Return values
  return AccToken, RefToken

def WriteConfig(AccToken,RefToken):
  print "Writing new token to the config file"
  print "Writing this: " + AccToken + " and " + RefToken

  #Delete the old config file
  os.remove(IniFile)

  #Open and write to the file
  FileObj = open(IniFile,'w')
  FileObj.write(AccToken + "\n")
  FileObj.write(RefToken + "\n")
  FileObj.close()

#Make a HTTP POST to get a new
def GetNewAccessToken(RefToken):
  print "Getting a new access token"

  #Form the data payload
  BodyText = {'grant_type' : 'refresh_token',
              'refresh_token' : RefToken}
  #URL Encode it
  BodyURLEncoded = urllib.urlencode(BodyText)
  print "Using this as the body when getting access token >>" + BodyURLEncoded

  #Start the request
  tokenreq = urllib2.Request(TokenURL,BodyURLEncoded)

  #Add the headers, first we base64 encode the client id and client secret with a : inbetween and create the authorisation header
  tokenreq.add_header('Authorization', 'Basic ' + base64.b64encode(OAuthTwoClientID + ":" + ClientOrConsumerSecret))
  tokenreq.add_header('Content-Type', 'application/x-www-form-urlencoded')

  #Fire off the request
  try:
    tokenresponse = urllib2.urlopen(tokenreq)

    #See what we got back.  If it's this part of  the code it was OK
    FullResponse = tokenresponse.read()

    #Need to pick out the access token and write it to the config file.  Use a JSON manipluation module
    ResponseJSON = json.loads(FullResponse)

    #Read the access token as a string
    NewAccessToken = str(ResponseJSON['access_token'])
    NewRefreshToken = str(ResponseJSON['refresh_token'])
    #Write the access token to the ini file
    WriteConfig(NewAccessToken,NewRefreshToken)

    print "New access token output >>> " + FullResponse
  except urllib2.URLError as e:
    #Gettin to this part of the code means we got an error
    print "An error was raised when getting the access token.  Need to stop here"
    print e.code
    print e.read()
    sys.exit()

#This makes an API call.  It also catches errors and tries to deal with them
def MakeAPICall(InURL,AccToken,RefToken):
  #Start the request
  req = urllib2.Request(InURL)

  #Add the access token in the header
  req.add_header('Authorization', 'Bearer ' + AccToken)

  print "I used this access token " + AccToken
  #Fire off the request
  try:
    #Do the request
    response = urllib2.urlopen(req)
    #Read the response
    FullResponse = response.read()

    #Return values
    return True, FullResponse
  #Catch errors, e.g. A 401 error that signifies the need for a new access token
  except urllib2.URLError as e:
    print "Got this HTTP error: " + str(e.code)
    HTTPErrorMessage = e.read()
    print "This was in the HTTP error message: " + HTTPErrorMessage
    #See what the error was
    if (e.code == 401) and (HTTPErrorMessage.find("Access token invalid or expired") > 0):
      GetNewAccessToken(RefToken)
      return False, TokenRefreshedOK
    #Return that this didn't work, allowing the calling function to handle it
    return False, ErrorInAPI

#Main part of the code
#Declare these global variables that we'll use for the access and refresh tokens
AccessToken = ""
RefreshToken = ""

print "Fitbit API Test Code"

#Get the config
AccessToken, RefreshToken = GetConfig()

#Make the API call
APICallOK, APIResponse = MakeAPICall(FitbitURL, AccessToken, RefreshToken)

if APICallOK:
  print APIResponse
else:
  if (APIResponse == TokenRefreshedOK):
    print "Refreshed the access token.  Can go again"
  else:
   print ErrorInAPI

Thursday, 7 January 2016

Strava API Analysis Using R

In a couple of previous posts I've covered how I have used the Strava API to analyse exercise data I've captured with my Garmin Forerunner 910 watch.

Analysing this data has always been a bit laborious but my new found discovery of R (see here for how I used it for Fitbit data) means it's now easy.

Getting data from the Strava API is pretty easy, you just have to register, get a key and then you can use simple HTTP GET requests to get your data (the first link above shows how I did it).  So no HTTP post, no forming payload, no 256 hashes or anything.

This makes it easy to import into R using the jsonlite library.  Here's an example (assuming you've installed jsonlite):

library(jsonlite)
stravadata <- fromJSON('https://www.strava.com/api/v3/activities?access_token=7<your key here>&per_page=200&after=1420070400',flatten=TRUE)

This then yields a R data frame that you can manipulate.  First have a quick look at the first row of the dataframe:

> stravadata[c(1),]
         id resource_state external_id upload_id               name distance moving_time elapsed_time total_elevation_gain type           start_date
1 236833349              2        <NA>        NA First swim of 2015     1300        2700         2700                    0 Swim 2015-01-04T20:45:00Z
      start_date_local                  timezone start_latlng end_latlng location_city location_state location_country start_latitude start_longitude
1 2015-01-04T20:45:00Z (GMT+00:00) Europe/London         NULL       NULL          <NA>           <NA>   United Kingdom             NA              NA
  achievement_count kudos_count comment_count athlete_count photo_count trainer commute manual private flagged gear_id average_speed max_speed total_photo_count
1                 0           0             0             1           0   FALSE   FALSE   TRUE   FALSE   FALSE    <NA>         0.481         0                 0
  has_kudoed average_cadence average_watts device_watts average_heartrate max_heartrate elev_high elev_low workout_type kilojoules athlete.id athlete.resource_state
1      FALSE              NA            NA           NA                NA            NA        NA       NA           NA         NA    4309532                      1
      map.id map.summary_polyline map.resource_state
1 a236833349                 <NA>                  2

This gives you a good idea of interesting fields to further analyse:

  • name = column 5
  • distance = column 6
  • type = column 10

..which lets you look at the data in a more refined format. So first row and the three columns listed above:

> stravadata[c(1),c(5,6,10)]
                name distance type
1 First swim of 2015     1300 Swim

Before going much further I needed to filter the results to just show those for 2015 as my Strava API call would have included everything from 2016 to date as well.  Do this by:

strava2015 <- stravadata[grep("2015-", stravadata$start_date), ]

...which yields this (just first 3 rows shown):

> strava2015[c(1:3),c(5,6)]
                name distance
1 First swim of 2015   1300.0
2      HIIT 20150106   4716.1
3      HIIT 20140108   4709.2

Then picking out just the type and the distance:

> strava2015simple <- strava2015[,c(10,6)]

...and looking at first 3 rows of this:

> strava2015simple[c(1:3),]
  type distance
1 Swim   1300.0
2 Ride   4716.1
3 Ride   4709.2

Making it very easy to compute some aggregated stats for distances for 2015:

First averages:

> stravaagg <- aggregate(list(Distance = strava2015simple$distance), list(Type = strava2015simple$type), mean)
> stravaagg
  Type  Distance
1 Ride 17765.398
2  Run  5487.856
3 Swim  1067.619

...then totals:

> stravaagg <- aggregate(list(Distance = strava2015simple$distance), list(Type = strava2015simple$type), sum)

> stravaagg
  Type  Distance
1 Ride 1030393.1
2  Run  334759.2
3 Swim   50178.1

So easy!  (I'm not going to trouble the Brownlee brothers with these figures!)

Monday, 4 January 2016

Using R to Analyse Fitbit API Data

I felt vaguely bad about my last post which was about creating a machine to cheat a pedometer/step counter.  Hence I thought I'd make amends by doing some analysis of step data for me obtained from the Fitbit API.

Previously I've written blog posts on accessing the Fitbit API and creating a Fitbit based infographic.  For this post I wanted to explore using R to ease the processing of analysing Fitbit data.  And wow!  Just wow! Once you get your head around R it's so easy!

Why R you may ask?  Simply because usually for this sort of analysis I end up manipulating data in Python or Excel then graphing it in Excel.  I'd read that R has all sorts of capabilities to make this easier and so wanted to give it a go.

I obtained a days worth of Fitbit API data for 2015-12-31 for me using the method I documented here.  The data looked like this (abridged):

{"activities-steps":[{"dateTime":"2015-12-31","value":"21446"}],"activities-steps-intraday":{"dataset":[{"time":"00:00:00","value":0},{"time":"00:01:00","value":7},{"ti
me":"00:02:00","value":11},{"time":"00:03:00","value":0},{"time":"00:04:00","value":0},{"time":"00:05:00","value":0},{"time":"00:06:00","value":0},{"time":"00:07:00","v
alue":0},{"time":"00:08:00","value":0},{"time":"00:09:00","value":0},{"time":"00:10:00","value":0},{"time":"00:11:00","value":0},{"time":"00:12:00","value":0},{"time":"
00:13:00","value":0},{"time":"00:14:00","value":0},{"time":"00:15:00","value":0},{"time":"00:16:00","value":0},{"time":"00:17:00","value":0},{"time":"00:18:00","value":
0},{"time":"00:19:00","value":0},{"time":"00:20:00","value":0},{"time":"00:21:00","value":0},{"time":"00:22:00","value":0},{"time":"00:23:00","value":0},{"time":"00:24:
00","value":0},{"time":"00:25:00","value":0},{"time":"00:26:00","value":0},{"time":"00:27:00","value":0},{"time":"00:28:00","value":0},{"time":"00:29:00","value":0},{"t
ime":"00:30:00","value":0},{"time":"00:31:00","value":0},

...so basically one long JSON object that was represented as a single string in a text file.

To do the analysis, first you need to download and install R.  I did this from here.

To load the JSON into R for analysis I simply did this from the command line:

> install.packages("jsonlite")
> library(jsonlite)
> stepdata <- fromJSON(file.choose(),flatten=TRUE)

So here I've installed a package called jsonlite that can be used to pull in JSON objects to manipulate in R.  When you install an R package for the first time you have to choose a site to download it from.  I simply chose my nearest site geographically.

The library(jsonlite) statement just makes the library available to use.

The stepdata <- fromJSON(file.choose(),flatten=TRUE) statement then pulls the data into R.  The file.choose() component of this causes the Windows file chooser form to be loaded, allowing you to choose a file.

You can then look at the resulting R variable by doing this (output abridged):

> stepdata
$`activities-steps`
    dateTime value
1 2015-12-31 21446

$`activities-steps-intraday`
$`activities-steps-intraday`$dataset
         time value
1    00:00:00     0
2    00:01:00     7
3    00:02:00    11
4    00:03:00     0
5    00:04:00     0
6    00:05:00     0

So each sub-component of the JSON string becomes a R sub-component that you can easily reference.  To just see the step data and  not see any of the other related data you can do this (output abridged):

> juststepdata <- stepdata$`activities-steps-intraday`$dataset
> juststepdata
         time value
1    00:00:00     0
2    00:01:00     7
3    00:02:00    11
4    00:03:00     0
5    00:04:00     0
6    00:05:00     0

So now I had a variable (actually an R "data frame") that could use for graphing.

To produce a nice graph I first had to get the time fields into a proper date and time format that R would understand as such.  To do this I did (output abridged):

> juststepdata$DateAndTime <- paste("2015-12-31",juststepdata$time,sep=" ")
> juststepdata$TimePosix <- as.POSIXct(juststepdata$DateAndTime)
> juststepdata

         time value           TimePosix         DateAndTime
1    00:00:00     0 2016-01-02 00:00:00 2015-12-31 00:00:00
2    00:01:00     7 2016-01-02 00:01:00 2015-12-31 00:01:00
3    00:02:00    11 2016-01-02 00:02:00 2015-12-31 00:02:00
4    00:03:00     0 2016-01-02 00:03:00 2015-12-31 00:03:00
5    00:04:00     0 2016-01-02 00:04:00 2015-12-31 00:04:00
6    00:05:00     0 2016-01-02 00:05:00 2015-12-31 00:05:00

First I added a date and time string column using the paste() function to concatenate a fixed date string onto the time string data elements.  Then the as.POSIXct() function call turns the date + time string into a full Posix style date and time value that R can understand as such.

The to graph simply do:

> install.packages("ggplot2")
> library(ggplot2)

...to install and load the library and then to draw the graph:

> graphval <- qplot(TimePosix, value, data=juststepdata)
> graphval + labs(title="Fitbit Step Data- 31/12/2015",x = "Time of Day",y = "Steps")

...which results in:

Simples! I just love the way you can draw a decent looking graph with just two lines of R.  No mucking about with rows and columns like in Excel and no jiggery-pokey with axis label positions and the like.

On the graph I observe:

  • Lots of 0s when I'm asleep and sitting around during the day.
  • A general "buzz" of ~25 steps per minute during day time.
  • An afternoon walk at ~1300 with a walking cadence of ~100 steps per minute plus a few higher values when I ran.
  • A few extra peaks during the evening.  It was New Years Eve and we spent the evening visiting different people's houses.  

I'm sure there's more interesting ways to represent the data and get more insight.  Something for a future post perhaps...