Working on a remote server

Since I figured out how to install R on my Dreamhost VPS, I’ve started sending some computations to the server to run. I’m sure my workflow will be honed as I do this more, but I just wanted to share a bit about how this works for me.

The key element of my approach is SVN. If you don’t know what SVN is: SVN is kind of like a cross between TimeMachine and GoogleDocs for geeks — a way to backup your work and collaborate with others. If you do know what SVN is, you probably think I should use git; I know, I know. At least I’m not using CVS.

So, I write code on my laptop and as I go I test it out on data and check the code in to my SVN repository (hosted incidentally at Assembla). When I am ready to do something on the server, I create a space on my VPS and, from that directory, do an SVN “checkout” of the project I’m working on, which grabs the code from my SVN repository and makes a copy on the server. I may in addition need to FTP some data to the server; I could check the data into SVN as well and I may do that going forward, but because I was somewhat constrained in my SVN repository I have not done this so far.

By this stage, I have basically replicated a chunk of my laptop — the code and data I need to do my computations — on the server. So now I ssh into the server and run the code there as I would locally. When it’s done I fetch the results from the server to do more processing locally.

In my current project I had not been planning to work this way, so I had written a lot of absolute paths (e.g. “~/data/X”) in my code. Because things were not set up that way on my server, I had to change a lot of paths to make them relative, which is fine. But I’m thinking in the future I could set up my laptop and server space to look more similar, so that the transition would be more seamless. I guess in the extreme you could check in a whole directory of your harddrive (code, data, etc) into SVN and thus have a complete copy of that repository both on your local machine and on the server. The only issue is how to deal with stuff you don’t want checked in — huge datasets, images/output from code, etc. I’ll keep honing it.

Until we’re all seamlessly in the cloud I think this setup will help me be more productive. I had had enough of trying to do work while running CPU-intensive stuff in the background, or having my computer chugging along overnight.