Monday, May 23, 2011

Mercurial integration with SVN.

I've recently been evaluating integration of SVN with some other distributed source control systems, mainly Git and Mercurial. The idea is to have a distributed source control system that would integrate seamlessly with a central SVN repository (very slow and painfull access over internet). That allows a certain number of things, amongst which there is fast access to commit history, cheap branches to experiment...

At some point I had both Mercurial and Git installed, and decided it was time to do a bit of cleanup. I have a bias towards Mercurial because it has a better integration in Windows explorer through TortoiseHG (you can call that lazyness).

After uninstalling everything I started clean, installed the latest TortoiseHG version (which btw has a very nice GUI), as well as hgsubversion as explained here: Getting started with HGSubversion

Mercurial was working fine but I was running into the following error each time I tried to clone our repo:

hg clone svn+ssh://
destination directory: someotherpath
abort: Can't create tunnel: The system cannot find the file specified.

That error message has been driving me nuts for a while, and after trying a lot of things (uninstalling everything, installing Git again, checking library versions ...), I stumbled upon the solution :

set SVN_SSH="C:\\Program Files\\PuTTY\\plink.exe"

As simple as that... Just define that variable SVN_SSH and it worked flowlessly. After that I justed defined it in my user environment to make it persist.

Hopefully this will help someone else.

Saturday, May 7, 2011

K-means clustering using R

I'm trying to learn R, and I'm a firm believer that there's not better way to learn than by getting your hands dirty. After reading an excellent post on Intelligent Trading blog, it got me thinking how you would do a clustering analysis with R, using K-means.

In the rest of this post I will try to detail the different steps that I followed, in hope that it can be useful to others. In this post I will be using a couple of R packages, namely quantmod, fpc and a few others. The most crucial is quantmod. You can install it with:

They are used for:

  • quantmod: this is your bread and butter to ease time series analysis
  • graphics, scatterplot3d, gplots RColorBrewer are used for plotting
  • fpc: this packages is package dedicated to clustering 

First this article supposes that you already have your data handy in a xts object used by quantmot. If it's not the case have a look at this article.

In the details below, "x" is the name of the object that contains my timeserie. Now let dig into it.

The first thing you need to do is create matrix with the different criteria you want to use for the clustering. In my case I'm going to use three normalized ratios, Close / Open, High / Open,  Low / Open , and then stuff them into a matrix. Then you want to process the actual cluster and display it:

The result looks like:
This works pretty well, except you have to specify a number of cluster you're looking for. Another may to do this is through the pamk function from the fpc package, for for which you don't specify the actual number of cluster, it will be calculated (you provide a range of value though):

Note the difference with kmeans method, the cluster information is packed into a pam object. That why you access the cluster details through zpamk$pamobject.


Friday, May 6, 2011

Thumbnails for Videos

I recently installed a web gallery to host photos and videos named Zenphoto. It works perfectly to my taste except for a small detail. Video galleries load a flash player named flowplayer, which does the job nicely, but until you click on the actual video to make the flash player load, you can't see a thumbnail of the video.

 The solution to have thumbnails for videos in Zenphotos, the same way you have for photos, is to place a jpg with the same name as the video. For instance if you have a video named a.mp4, create a jpeg named a.jpg in the same folder as the video. This photo will be used as thumbnail for you video.

It's fine to do a few manually ... but call me lazy I don't fancy the extra manual step each time I need to add a new video. So a create a little script to generate the thumbnails. 

Before digging in the details, it assumes a few things. First this only works under linux, so you must not be afraid of command line. Secondly it relies on ffmpeg to extract a frame automatically.

If you don't have ffmpeg and you're using an Ubuntu / Debian type:

Then you into the folder under which the videos are and type:

And paste the following content into the file:

Run the script and you're done.