Remote Data Access¶
pandas users can easily access thousands of panel data series from the World Bank s World Development Indicators by using the wb I/O functions.
Either from exploring the World Bank site, or using the search function included, every world bank indicator is accessible.
For example, if you wanted to compare the Gross Domestic Products per capita in constant dollars in North America, you would use the search function:
Then you would use the download function to acquire the data from the World Bank s servers:
The resulting dataset is a properly formatted DataFrame with a hierarchical index, so it is easy to apply .groupby transformations to it:
Now imagine you want to compare GDP to the share of people with cellphone contracts around the world.
Notice that this second search was much faster than the first one because pandas now has a cached list of available data series.
Finally, we use the statsmodels package to assess the relationship between our two variables using ordinary least squares regression. Unsurprisingly, populations in rich countries tend to use cellphones at a higher rate:
New in version 0.15.1.
The country argument accepts a string or list of mixed two or three character ISO country codes, as well as dynamic World Bank exceptions to the ISO standards.
For a list of the the hard-coded country codes (used solely for error handling logic) see pandas.io.wb.country_codes .
Problematic Country Codes Indicators¶
The World Bank s country list and indicators are dynamic. As of 0.15.1, wb.download() is more flexible. To achieve this, the warning and exception logic changed.
The world bank converts some country codes in their response, which makes error checking by pandas difficult. Retired indicators still persist in the search.
Given the new flexibility of 0.15.1, improved error handling by the user may be necessary for fringe cases.
To help identify issues:
There are at least 4 kinds of country codes:
- Standard (2/3 digit ISO) – returns data, will warn and error properly.
- Non-standard (WB Exceptions) – returns data, but will falsely warn.
- Blank – silently missing from the response.
- Bad – causes the entire response from WB to fail, always exception inducing.
There are at least 3 kinds of indicators:
- Current – Returns data.
- Retired – Appears in search results, yet won t return data.
- Bad – Will not return data.
Use the errors argument to control warnings and exceptions. Setting errors to ignore or warn, won t stop failed responses. (ie, 100% bad indicators, or a single bad (#4 above) country code).
See docstrings for more info.
The ga module provides a wrapper for Google Analytics API to simplify retrieving traffic data. Result sets are parsed into a pandas DataFrame with a shape and data types derived from the source table.
Configuring Access to Google Analytics¶
The first thing you need to do is to setup accesses to Google Analytics API. Follow the steps below:
- In the Google Developers Console
- enable the Analytics API
- create a new project
- create a new Client ID for an Installed Application (in the APIs auth / Credentials section of the newly created project)
- download it (JSON file)
- On your machine
- rename it to client_secrets.json
- move it to the pandas/io module directory
The first time you use the read_ga() function, a browser window will open to ask you to authentify to the Google API. Do proceed.
Using the Google Analytics API¶
The following will fetch users and pageviews (metrics) data per day of the week, for the first semester of 2014, from a particular property.
The only mandatory arguments are metrics,dimensions and start_date. We strongly recommend that you always specify the account_id. profile_id and property_id to avoid accessing the wrong data bucket in Google Analytics.
The index_col argument indicates which dimension(s) has to be taken as index.
The filters argument indicates the filtering to apply to the query. In the above example, the page URL has to contain aboutus AND the visitors country has to be France.
Detailed information in the following: