• Blog
  • Info Support
  • Career
  • Training
  • International Group
  • Info Support
  • Blog
  • Career
  • Training
  • International Group
  • Search
logo InfoSupport
  • Latest blogs
  • Popular blogs
  • Experts
      • Alles
      • Bloggers
      • Speakers
  • Meet us
  • About us
    • nl
    • en
    • .NET
    • Advanced Analytics
    • Agile
    • Akka
    • Alexa
    • Algorithms
    • Api's
    • Architectuur
    • Artificial Intelligence
    • ATDD
    • Augmented Reality
    • AWS
    • Azure
    • Big Data
    • Blockchain
    • Business Intelligence
    • Cloud
    • Code Combat
    • Cognitive Services
    • Communicatie
    • Containers
    • Continuous Delivery
    • CQRS
    • Cyber Security
    • Dapr
    • Data
    • Data & Analystics
    • Data Science
    • Data Warehousing
    • Databricks
    • DevOps
    • Digital Days
    • Docker
    • eHealth
    • Enterprise Architecture
    • Hacking
    • Infrastructure & Hosting
    • Innovatie
    • Integration
    • Internet of Things
    • Java
    • Machine Learning
    • Microservices
    • Microsoft
    • Microsoft Bot Framework
    • Microsoft Data Platform
    • Mobile Development
    • Mutation Testing
    • Open source
    • Pepper
    • Power BI
    • Privacy & Ethiek
    • Python
    • Quality Assistance & Test
    • Quality Assurance & Test
    • Requirements Management
    • Scala
    • Scratch
    • Security
    • SharePoint
    • Software Architecture
    • Software development
    • Software Factory
    • SQL Server
    • SSL
    • Start-up
    • Startup thinking
    • Stryker
    • Test Quality
    • Testing
    • TLS
    • TypeScript
    • Various
    • Web Development
    • Web-scale IT
    • Xamarin
    • Alles
    • Bloggers
    • Speakers
Home » Test-driving AzureML: Machine Learning in the Cloud
  • Test-driving AzureML: Machine Learning in the Cloud

    • By Koos van Strien
    • .NET 7 years ago
    • .NET 0 comments
    • .NET .NET
    Test-driving AzureML: Machine Learning in the Cloud

    On June 16th, Microsoft announced their cloud-based Machine Learning solution AzureML as “a fully-managed cloud service for building predictive analytics solutions”. Last Friday, just before the Worldwide Partner Conference, AzureML went into public beta. Which means it showed up at my Azure dashboard, leaving me no choice but to try it :-).

    1 - Create

    After trying some examples and building a few experiments (‘experiment’ is the AzureML-name for a Machine Learning solution) using the tutorials Microsoft has provided on the Azure documentation sites, my enthousiasm is growing. In order to provide an overview of the product, I’ve written a small review on the following topics:

    • User interaction (Ease of use, GUI)
    • Comprehensiveness of available modules
    • Integration of R
    • Opening up your experiments to the world

    User interaction (Ease of use, GUI)

    Machine Learning is an Azure Data Service like HDInsight, SQL Databases and Blob Storage. That means you can add it as a new service from within the Azure portal. After adding a ML Data Service (yes, you can add multiple), further development happens from whithin Azure ML Studio, which is a separate website running on https://studio.azureml.net. No local installs are needed and all development is done from within the browser.

    3 - ML Studio shot

    This works out pretty well. In ML Studio, a Data Flow is constructed not unlike the way we’re used to in SQL Server Integration Services (SSIS). Adding uploaded Data Sets, ML Models or transformations is just a matter of dragging boxes (‘Modules’ according to ML Studio) from the ‘Toolbox on the left’ to the working area. By connecting those boxes the data flow is defined. An intuitive way of handling data: without further training, I was immediately able to upload a dataset, split it into a training- and testing set, and do some transformations on the data.

    The main missing things so far for me were the ability to rename the modules and the ability to attach always-visible labels to inputs, outputs or data flows. In the screenshot above, it’s hard to say what the ‘Split’ module is splitting, and whether it’s a randomized split or not. Even when I know that the split is a randomized split of test and training data with a ratio of 3 : 1, I cannot see whether the left output of the Split module is putting out 75% or 25%.

    Comprehensiveness of available modules

    Adding new datasets is as easy as uploading a CSV-file (or another format, if you prefer). The possibilities to define how the data is structured inside the file are limited however: besides some default implementations like ‘comma-separated’ or ’tab-separated’, more advanced structures need to be pre-processed into simpler / supported structures before they can be used inside ML Studio.

    5 - Uploading

    Once the data is loaded, a comprehensive toolbox is at your hand to transform the data. To sample (pun intended) a few tools:

    • Convert between datasets (to TSV, CSV, ARFF, etc.)
    • Apply filters (median filters, moving average filters, threshold filters)
    • Manipulate datasets (join, adding / removing columns, scrub missing values)
    • Split and partition datasets
    • Clip and quantize numeric values

    On the analytic side, various functions are offered:

    • Text Analytics
    • Statistical functions
    • Feature Selection
    • Machine Learning
    • R Language Modules

    Skipping Text Analytics, Statistical functions and Feature selections we zoom in on Machine Learning. Three types of models are offered here: Classification, Clustering and Regression. Besides a collection of default implementations (each having parameters to tune), it’s also possible to create neural networks using a custom definition script, making it possible to process and analyse all kinds of non-default data like images.

    4 - Neural Network regression

    Compared to the way Data Mining algorithms are present in SQL Server Analysis Services, this really feels like a giant leap forward: the way algorithms can be tuned, implemented, tested and compared is not only much easier to use, but also easier to tune and customize.

    Integration of R

    Particularly interesting is the possibility to integrate R scripts into your ML Experiment. When dragging the ‘Execute R Script’ module to the working canvas, you are provided with a module accepting two data inputs and an input for a zipped script bundle. You can also directly enter R scripts into the properties of the R module. On the output side, two outputs are offered. You can bind the input and output ports from within the R script. The number of inputs / outputs is fixed, but you don’t have to use them all.

    Curious to see which packages were installed, I’ve tried to list all installed modules by using the R command installed.packages(). It turns out there are around 14000 packages installed, so I will not include the complete list here. You can be pretty sure that the majority of your desired packages will be available though. Funny detail: in the sandbox where R is executed, Microsoft has installed it into the directory ‘C:/Argghh’ :-).

    6 - Argghh

    Opening up your experiments to the world

    While developing a Machine Learning experiment, you can run the experiment from within the editor. This works neat and gives fast feedback about your experiment. Also, several modules to compare Machine Learning methods are available and quite easy to use. But all this is still in the development stage of your Machine Learning experiment.

    In order to open up your experiments for use inside from within other solutions, Azure ML offers the possibility to publish any experiment as a Web Service, which can be called upon both per-case or in bulk.

    Preparing your experiment for deployment is quite easy: in an experiment where the model has been trained with development data, you can save the trained model. This generates a new module in your toolbox, which can directly be used in both existing and new experiments.

    7 - Trained Model

    As a second step, you can define where in the data flow the data from the API flows in and out. When calling on the experiment via the API, these entry and exit points are used instead of the manually uploaded datasets. Doing this is as easy as right-clicking an input and saying ‘Set as Publish Input’.

    8 - Inputs outputs

    By using the button ‘Publish Web Service’ inside the editor, the solution is published in ’test’. You can then easily test the API via your Azure environment. Administrators can make this web service publicly available.

    9 - Publish Web Service

    Summary and Conclusion

    Being a Business Intelligence professional, I have some experience with Data Mining, but I’m certainly not a Machine Learning expert. The way Machine Learning can be used inside data solutions is impressive. Particularly impressive is the ease of training models and re-using already-trained models.

    The way R is integrated in AzureML really stands out for me. I think this opens up a range of new appliances for R. Where R is nowadays widely used in an expiremental or ad-hoc setting, this opens up the way for deployed, managed and re-usable solutions that can easily be integrated for both small and larger datasets.

    IMHO Microsoft has done a good job with Azure Machine Learning. The result is an easy-to-use solution, making the operations on data in order to perform analysis highly visible. The way in which these models can be deployed is a huge benefit, opening up the ML solutions towards enterprise solutions. Although there is surely room for improvement, for now this seems to be a solid platform which can easily be extended.

    Share this

Koos van Strien

View profile

Related IT training

Go to training website

Related Consultancy solutions

Go to infosupport.com

Related blogs

  • Innovative patterns in software quality management

    Innovative patterns in software quality management Tom van den Berg - 1 month ago

  • Developing a Service Fabric service using another host

    Developing a Service Fabric service using another host Tim van de Lockand - 5 months ago

  • Adding a package to your private WinGet.RestSource feed…

    Adding a package to your private WinGet.RestSource feed… Léon Bouquiet - 10 months ago

Related downloads

  • Beslisboom voor een rechtmatig ‘kopietje productie’

  • Klantreferentie: Remmicom zet wetgeving om in intellige…

  • Klantreferentie RDW: Samenwerken voor veilig en vertrou…

  • Klantreferentie BeFrank: Strategische IT voor een innov…

  • Wie durft te experimenteren met data in de zorg?

Related videos

  • mijnverzekeringenopeenrij.nl

    mijnverzekeringenopeenrij.nl

  • Winnaar | Innovation Projects 2017

    Winnaar | Innovation Projects 2017

  • Explore | Info Support & HAN & Poliskluis

    Explore | Info Support & HAN & Poliskluis

  • LifeApps bij HagaZiekenhuis

    LifeApps bij HagaZiekenhuis

  • Info Support | Bedrijfsfilm

    Info Support | Bedrijfsfilm

Nieuwsbrief

* verplichte velden

Contact

  • Head office NL
  • Kruisboog 42
  • 3905 TG Veenendaal
  • T +31 318 552020
  • Call
  • Mail
  • Directions
  • Head office BE
  • Generaal De Wittelaan 17
  • bus 30 2800 Mechelen
  • T +32 15 286370
  • Call
  • Mail
  • Directions

Follow us

  • Twitter
  • Facebook
  • Linkedin
  • Youtube

Newsletter

Sign in

Extra

  • Media Library
  • Disclaimer
  • Algemene voorwaarden
  • ISHBS Webmail
  • Extranet
Beheer cookie toestemming
Deze website maakt gebruik van Functionele en Analytische cookies voor website optimalisatie en statistieken.
Functioneel
Altijd actief
De technische opslag of toegang is strikt noodzakelijk voor het legitieme doel het gebruik mogelijk te maken van een specifieke dienst waarom de abonnee of gebruiker uitdrukkelijk heeft gevraagd, of met als enig doel de uitvoering van de transmissie van een communicatie over een elektronisch communicatienetwerk.
Voorkeuren
De technische opslag of toegang is noodzakelijk voor het legitieme doel voorkeuren op te slaan die niet door de abonnee of gebruiker zijn aangevraagd.
Statistieken
De technische opslag of toegang die uitsluitend voor statistische doeleinden wordt gebruikt. De technische opslag of toegang die uitsluitend wordt gebruikt voor anonieme statistische doeleinden. Zonder dagvaarding, vrijwillige naleving door uw Internet Service Provider, of aanvullende gegevens van een derde partij, kan informatie die alleen voor dit doel wordt opgeslagen of opgehaald gewoonlijk niet worden gebruikt om je te identificeren.
Marketing
De technische opslag of toegang is nodig om gebruikersprofielen op te stellen voor het verzenden van reclame, of om de gebruiker op een website of over verschillende websites te volgen voor soortgelijke marketingdoeleinden.
Beheer opties Beheer diensten Beheer leveranciers Lees meer over deze doeleinden
Voorkeuren
{title} {title} {title}