Hello again! It is time for another blog post! This time I am not going to blog about something that I have learned on the job (although I did learn a lot that I could use doing my job), instead I am going to blog about a little hobby project I did.
I am a bit of a gadget-freak. I think every geek will recognize this somewhat. A little while ago I ran out of new gadgets, so I decided to buy an IP Camera. These are really cool camera’s, because they can be connected to an TCP/IP network and usually provide an http api to control the camera en read images from it. The camera I bought is a FOSCAM I8910W, although the brand is not that important. I believe that the http api’s may differ per brand, but once you know how to get an image stream, they mostly work the same.
After a bit of searching on the internet, I came across the following foscam developer pdf: http://www.foscam.es/descarga/ipcam_cgi_sdk.pdf. This pdf tells me the different url’s and parameters to make a camera pan, and it tells me that the url to capture an snapshot is something like http://<cameraadress>/snapshot.cgi with optional parameters for an usename and password. This means that when I browse to that url with an IP Camera in my network my browser will show me an image.
With this information we can basically create a video stream. We could use this url to read images in a loop and display them to the user. This method is really simple and it allows the client application to control the refresh rate. The disadvantage is that we have to send and receive a lot of http requests and those will cause a lot of overhead. So browsing a little further in our developer pdf I came across the following api.
This tells me that there is a way to do one http request and get an imagestream back! This is really what we want. As you can see there are also parameters for resolution and the rate in which the camera sends us back images. For a video of 640 * 480 at 30 fps (my max rate ) we would use the url http://<cameraadress>/videostream.cgi?resolution=32&rate=0.
The only problem is, in which format do we get our response? Well after a bit of research and monitoring with fiddler I came to the conclusion that a lot of IP Camera’s use the same format. It is called a MJPEG (Motion JPEG) stream. Wikipedia has a great article about it: http://en.wikipedia.org/wiki/Motion_JPEG. After reading this you realize it is quite simple! MJPEG over http is really an http multi part response where each part is an image. According to the spec http://www.w3.org/Protocols/rfc1341/7_2_Multipart.html the http response will look something like this (it is just http multi part, only the parts are images):
You can see, the body the of the http response will contain: one or more partheaders –> part data –> boundary->nextpartheaders –>part data –> boundary and this repeats. When I ask a video stream from my camera it will repeat endlessly. Every header is separated by a carriage return character followed by a linefeed character. The part data is seperated from the part headers by cr lf cr lf. Normally one would read the body of the httpstream till boundary, parse the headers, read the part data till boundary, and repeat this in a loop. This means you would need to check constantly for the the boundary when you are reading the part’s body. Luckily, my camera provides the length of each part in a Content-Length part header. So my algorithm becomes clear:
-Look for the Content-Length header
-Read the length of the image from the header
-Read exactly the amount of bytes specified in the header from the part’s body data.
-Construct image from data
-Repeat this in a loop.
We can use this to write some .Net code . The first class I have defined is a MultiPartStream class. This class represents a MultiPartStream and has methods like NextPart() which returns the bytes that belong to the next part in the stream. It has no notion of images, it is just an object oriented multi part stream. Here is the code:
3://Specs say that the body of each part and it's header are seperated by two crlf's
68:// let's parse the header section for the content-length
69:int length = GetPartLength(headerSection);
70:// now let's get the image
On line 4 till 7 you can see the private fields. First are the cr lf cr lf bytes that separate the headers of a part from the body of part. Then there are headerBytes. This is a buffer of 100 bytes in which I continually read a portion of a part to look for the cr lf cr lf bytes. If see these bytes, that means, the buffer contains the header of a part and I can use that header to extract the parts length. The regex on line 6 is used to parse the header and extract the lengh. Maybe some knows a better way to parse a string? The constructor wraps a BinaryReader around a BufferedStream, this is to minimze the number of IO reads on the underlying transport. The BinaryReader also reads and returns the number of bytes I tell it to. Reading from the stream directly sometimes doesn’t return all the bytes.
Now let’s have a look at some of the methods.It all starts on line 62. This method returns a Task that will extract the next part in the stream on a background thread. It first reads the header section by calling ReadContentHeaderSection. This method is defined on line 35 and this method is the one that fills the header bytes with bytes until it comes across the cr lf cr lf. It does this by reading one byte at a time and after each read it checks if finds the separator characters. After it finds those 4 characters, it will convert the bytes that are in the buffer to a string. From this string, I can extract the Content-Length header with a regular expression to read the rest of the part. I do this on line 15. After I get the part length, I can read the body of the part with the binary reader and return the bytes to our caller.
Next class I am going to discuss is the AutomaticMultiPartReader. This class accepts our previously defined MultiPartStream and starts reading from it in a loop. Every time it get’s a part it will throw an event containing the last part it got from our MultiPartStream. Here is the code for this class:
The most interesting method is the StartProcessing method which starts on line 23. It starts a loop and then on line 28 calls asynchronous NextPartAsync method. By using the await keyword and ConfigureAwait(false) I make sure the the part get’s read without blocking the UI, and the event also get’s thrown from a background thread. This means that when I use this class from a presentation technology like wpf or winforms I will need to do some stuff to get this event back to the UI thread. I could also throw this event on the UI thread by not using ConfigureAwait, but I want my UI thread to stay as responsive as possible.
This takes us to the IpCamController class. This class is a bit UI specific, as it will contain the logic to take the PartReady event and translate it to a BitmapImage on the UI thread. This class also has the logic to move the camera etc. With a little bit of effort you could easily adjust it to also work with WinForms or Silverlight for example. Below is it’s code.
First up is the constructor. I create a WebRequestHandlers to give the HttpClient class the credentials of the user. Most IP Cameras support Basic authentication, which means it is not really secure. We can pass credentials for Basic authentication via the NetworkCredentials class in .Net. I assign the base address so, I can use relative addresses later on for different camera commands. On line 19 I set the timeout property to infinite. This has to be done because the response of the IP Camera does not end, it keeps sending images, without this, the HttpClient would timeout because the response takes to long to arrive.
Most interesting bit happens on line 81. Here it all starts, I make a request to the camera video url, I tell the HttpClient to return as soon as the headers are received, I can use the headers to check for a multipartresponse type. If this is ok, I create a new AutomaticMultiPartReader, subscribe to it’s part ready event, and tell it to start processing. This does not block the UI thread because it mostly calls the NextPartAsync, which does it work on a background thread. On line 29 we can see our eventhandler. All it does, is wrapping the bytes of the received part in a MemoryStream and use this Stream to create a BitmapImage on the UI thread. It then throws an event to tell a subscriber that an image is ready. In WPF BitmapImages are just like controls, they need to be used by the same thread that creates them.
Now it is to look at a simple client application.
When you have all the above plumbing ready, a client is really simple. The above client basically consists of an Image control and four buttons. Below is the code.
In the loading event I create a new IPCamController, I pass it an url and my credentials. I subscribe to it’s ImageReady event and in that event I just assign a new Image to the Image control. That’s it! All the pan methods are simple and are easily explained with the developer pdf. There is still a lot to improve in this app. I could make use of the MVVM pattern for example, but for learning and demo purposes this will do. Also keep in mind that this MultiPart parser will only work if a camera also provides a Content-Length header per part, if a camera doesn’t, we need to keep reading each part until we come across the boundary. So this little demo app still has plenty of things to do to make it more robust.
That’s it for today! I hope you get inspired by this blog post to try some of these things for yourself to learn even more. Feel free to leave comments and suggestions.!