Scraping (tabular data) from the internet has become fun with Power Query. The new add-in of Microsoft Excel facilitates data extraction from multiple, various data sources and enables data transformation of the extracted data. The transformed data can be loaded into an Excel data model (Power Pivot) and can be presented by Power View or Power Map in Microsoft Excel.
In my quest of scraping the internet with Power Query I decided to extract the biggest municipalities from the Statline website (Statistics Netherlands) and to extract the weather forecast from the Buienradar website for each extracted municipality. The gathered data is cleansed, transformed and presented through Power View reports.
Before informing you about the Power Query ETL process you find the end results below.
Translations to English: temperatuur = temperature, kans op = chance of, zon = sunshine, en = and, buien = rainfall, per = per, gemeente = municipality and tijd = time.
This report shows the average temperature, average chance of sunshine and rainfall for the biggest municipalities (population >= 150.000, date 31-12-2013) in the coming five days.
Translations to English: neerslag = rainfall, per = per, gemeente = municipality, mm = millimeters, in = a and datum = date.
This report shows the expected rainfall in millimeters for the biggest municipalities (population >= 150.000, date 31-12-2013) in the coming five days.
The Power View reports are refreshed by the underlying Power Query ETL process. This process consists of the steps shown below.
Each step has his own Power Query ETL script. I will explain the Power Query script, shown below, of the step ‘Extracting the Weather Forecast’.
In the Power Query script the function Nederland.Plaatsen is executed for each extracted municipality. The function generates a different url for each municipality and extracts the weather forecast in tabular form from the Buienradar website. The tables with the forecast information are combined into one data set. The final data set produced at the end of the last step is shown below.
The final data set is loaded into Power Pivot and used in the creation of the Power View reports.