51Degrees-Logo

Gaining Device Statistics from W3C Server Logs

Engineering

8/14/2012 4:02 PM

C Development

First, if you haven't checked out the latest release of the 51Degrees.mobi C API, you can grab it here and check out the documentation guide here. It's necessary to read through the whole guide before continuing with this post, as I'll be picking up from where I left off with the implementation of the isMobile function for parsing results strings. Make sure you can duplicate that before continuing.

There are a few challenges to address when extracting the required information from a W3C log file:

  1. Identifying the location of the UserAgent column within the log file.
  2. Extracting the UserAgent from the previously identified location.
  3. Compensating for the log file format by modifying the obtained UserAgent.

A working draft of the W3C log file format can be found here. Broadly, the log file contains a header of identification lines preceded by ‘#' symbols before presenting the bulk of the data. The data itself is white-space separated into columns, which makes it easy to read and parse, but means that the data itself is less useful to use until some pre-processing has been performed.

First, a quick overview of the preprocessor commands for this implementation. This should go in your own .c source file or replace the text in TestFunctions.c:

#include "51Degrees.mobi.h"

Just to recap from the user guide, presented again here is the isMobile string parsing function that returns a string "True" or "False" depending on the "IsMobile" property of the device detected.

char *isMobile(char *result) { char *token; token = strtok(result, "|"); while (token != NULL) { if (strcmp(token, "IsMobile") == 0) { token = strtok(NULL, "|"); token = strtok(NULL, "\n"); return token; } else token = strtok(NULL, "\n"); if (token != NULL) token = strtok(NULL, "|"); } return "null"; }

The solution implemented here uses the "strtok" function in the string.h header file. There is some controversy around using this function, mainly due to the fact that it modifies the source string, however, for the purpose of this implementation, it is more than adequate and can be easily replaced with an end user version if required. If you have already gone through the user guide, you will be familiar with it's use. If not, here again is the link to a description.

After implementing isMobile, we can take a look at the main function for our program:

int main(int argc, char* argv[]) { char *check, *result, subject; FILE *input = NULL; int mobileCount = 0, nonMobileCount = 0, index = 0, i = 0; Workset *ws = createWorkset(); /*Create the 51Degrees.mobi workset*/ /*Open file*/ input = fopen("test.log", "r"); if (input == NULL) { /*Exit program if file open failed*/ printf("File open failed.\n"); getchar(); return 0; } /*Identify the location of the user agent field.*/ do { fgets(subject, 1500, input); check = strtok(subject, " \t"); } while (strcmp(check, "#Fields:") != 0); do { check = strtok(NULL, " \t\n"); index++; } while (strcmp(check, "cs(User-Agent)") != 0); /*Perform the log check on each line of the log file.*/ while (fgets(subject, 1500, input) != NULL) { check = NULL; /*Obtain the user agent from the subject line using the index identified earlier.*/ for (i = 0; i < index; i++) check = strtok((check == NULL ? subject : NULL), " \t\n"); /*Replace "+" symbols with " " from the userAgent.*/ changePlus(check); /*Populates the userAgent field of the Workset structure*/ strncpy(ws->userAgent, check, MAXLENGTH); /*Get the device data from the user agent and scan it for the "IsMobile" identifier.*/ result = detectDeviceFormat(ws); check = isMobile(result); /*Increment a counter based on the outcome of the "IsMobile" check.*/ if (strcmp(check, "True") == 0) { mobileCount++; } else { nonMobileCount++; } fflush(stdout); } /*Output the total number of mobile and non-mobile devices.*/ printf("Mobile results: %d\nNon Mobile results: %d\n", mobileCount, nonMobileCount); getchar(); return 0; }

Due to the flexible nature of the log file, the column quantity and location may change between servers. Because of this, it is necessary to scan for the header line denoted “#Fields”. Once located, the line should be scanned for the occurrence of the identifier “cs(User-Agent)”. By incrementing an index counter for each column we ignore while looking for this column name, we can easily find the location of the user agent in the body of the data. It's worth noting that this technique works because empty columns in a log file must be indicated with a “-“, meaning there is no chance of accidentally skipping a column, assuming the file has been formatted correctly. This takes care of issue one on our list.

Issue two is now a lot easier considering we already used strtok to find the column name. By calling strtok multiple times without resubmitting a new string we can skip along a line until we find the correct location. Note the nested "if" statement within the function call. The first time we call strtok we must submit the string in question, subsequent calls while we examine this line must be null. Also, by submitting the newline symbol “\n” as a further token identifier, we deal with the quirk of the fgets function that cause it to return the appended newline character onto the string, which may skew our results.

So after taking care of issues one and two, we come to the final step of pre-processing that must occur before we send our UserAgent through the device detection algorithm. As mentioned previously, the log file format uses white-space separation to divide entries, therefore any spaces in those entries are plugged automatically with “+” symbols. A simple string parser that swaps these “+” symbols for “ “ has been included here which allows for an escape character for actual plus symbols that may exist within the User Agent string.

void changePlus(char* subject) { char *checker = subject; if (checker == NULL) return; while (*checker != '\0') { if (*checker == '+') { *checker = ' '; } if (*checker == '\\') { checker++; if (*checker == '+') checker++; } else checker++; } }

After performing these operations on the log file, we are now ready to copy the resulting userAgent to the ws->userAgent field of the workset and submit it to device detection, then finally to our string parser which will return “True” or “False” depending on the IsMobile flag. All that is left to do is increment the correct counters and repeat for each line of the log file.

All of the code presented here can be cut and paste into the “TestFunctions.c” file provided with our Lite distribution, then recompiled either with your development environment of choice or the 51Degrees batch file.