working interactively on a remote computer

In my software/hardware setup post, I talked a little bit about working on a remote machine.  As promised, here are the details about how I make interactive coding easy for me.

Let’s start from the very very beginning.  Our department has a pretty sweet set of really powerful computers (“the cluster”) available for us to use.  Because the computers are so awesome, they have to be kept in a room that is specially cooled and maintained, and they don’t have desktops that we can sit down and interact with.  As such, you need to use a different computer (i.e., a laptop) to remotely log in to the cluster and either (a) start an interactive session, in which you can type commands into the Linux shell, or open up an interactive version of (say) R or python and type commands there, or (b) submit a batch job or shell script to run without user interaction.

Batch jobs and scripts are pretty straightforward, so I’m not going to yammer on about that in this post.  But working interactively is a little trickier, mostly because it’s good practice (in the name of reproducible research, scientific integrity, and organization) to keep a record of the commands you run to get your results.  If you get results interactively on the remote machine, there’s not a built-in way to do this.  But never fear!  Software and shortcuts exist that allow you to save a script on your local computer, but run each line of that script interactively on the remote machine.  Since statisticians like me usually do most interactive work in R, I’ll describe here how I run a local R script interactively on the cluster.

I’m currently a Mac user, so my main tool for this purpose is Aquamacs.  This is basically a version of an Emacs text editor.  My opinion on Emacs is that it’s a really powerful tool, but requires a lot of customization to access all that power, and it has pretty funky keyboard shortcuts.  Aquamacs allows you to use either Emacs keyboard shortcuts OR common Mac keyboard shortcuts (e.g., command-Z for undo) in an Emacs session, which I find really useful.  Aquamacs makes use of ESS (Emacs Speaks Statistics) when interacting with R.

So let’s get to the point: here are the steps!

(1)  Install Aquamacs.

(2)  Open your local R script inside Aquamacs.

(3)  Type M-x shell (M means escape key), which will basically open up a Terminal window inside Aquamacs.  (Once you hit M-x, you won’t be typing in the R script anymore, but will see your stuff appear at the bottom of the window).

(4)  In the Terminal window that just opened up, log in to the remote machine. (I’m making the assumption here that the login process to the remote machine involves some variant of an “ssh” command in the terminal.)

(5)  Click Window > Move tab to new frame.  The terminal window will slide over to the other side of your computer, so you’re now seeing the R script and the prompt of the remote machine simultaneously.

(6)  Start R on the remote machine.

(7)  Staying in the remote-machine-R window, type M-x ess-remote.  You’ll then be prompted for a dialect – type – the line “options(STERM=’iESS’)” will have been run inside your R session.

(8)  Move back to the local R script.  You can now run line-by-line on the remote machine either using control-n to run just the current line, or using control-r to run a block of highlighted lines.

This has worked pretty well for me, but I am definitely interested in hearing others’ ideas if someone knows of a more efficient way to do this from a Mac.  I use Aquamacs almost exclusively for this, which feels a little like using a sledgehammer for a tiny little nail, since I’m not really harnessing all the power of Aquamacs/Emacs or using it for any of its other intended purposes, and I haven’t put a lot of time into customizing it.  But it does get the job done, and it’s definitely better than the ol’ copy-paste trick.

I’m incredibly happy with my Macbook (it’s a delightfully fast, beautiful, efficient computer), but I really really miss Notepad++, the best text editor I’ve ever known – it’s Windows-only.  Running interactively on a PC is smoother than the Mac workflow I described above: it basically involves (1) logging into the remote machine using something like PuTTY and opening R, (2) opening up your local R script in Notepad++, and (3) hitting F9 to run a line or highlighted set of lines.  So much more elegantly simple!  Something for Mac software developers to aspire to, I suppose…

Advertisements

my software/hardware setup

Inspired by the awesome Hilary Parker and the dawn of a new academic year, I’ve put together a rundown of tools I find essential in my day-to-day as a biostatistics graduate student.  None of this was formally taught to me – much has been recommended, learned on the fly, or found via the “just Google it” method – but I hope to inject some sense of coherence into the whole situation with this post.  We thought something like this would be especially useful for incoming students or anybody looking to change or optimize their setup.  So let’s begin!

Hardware

My personal computer is a 15″ MacBook Pro, which I got in October 2011.  I was hesitant to make the switch over to the Mac (I had owned only PCs before then), but I’ve never been happier with a laptop.  The work I do on a daily basis is much better streamlined on the Mac.  However, either platform works in our field, so I’ll be sure to note when a piece of software I discuss is Mac- or PC-specific.  The laptop is my main piece of hardware (not counting our departmental computing cluster, which I’ll mention later) – the only other thing I’d mention is my 300GB external hard drive, which I use to back up my computer with Time Machine.  Backups are absolutely essential – I choose to use an external drive, but backing things up in the cloud has become common practice.  I use Dropbox (you get 2GB for free) for backing up my most important files and for creating shared folders.  Other common cloud storage solutions are Amazon S3 and SugarSync.

Software

By far, my favorite piece of software is R – every statistician’s best friend.  It rocks.  In the genomics world, most R packages are published on Bioconductor.  The R GUI on the Mac is pretty awesome, so working with R and Bioconductor locally required almost no setup for me.

What was a bit more challenging was figuring out my R situation when working on our departmental computing cluster – i.e., when working on a remote machine that I’ve logged into from my laptop via ssh.  There are two pieces of software I’ve found really useful when working remotely: Cyberduck (for file transfers) and Aquamacs (for running code interactively from my machine to the cluster – Mac-specific).  I’m not fully convinced that Aquamacs is the best way to go for the interactive code – in fact, the thing I miss most about having a PC is the text editor Notepad++.  Notepad++ is a PC-specific editor that connects beautifully to R (with NppToR – just hit F8 to run a line in R locally!) or to an ssh client (just hit F9 to run a line remotely!).  However, I have a pretty good system worked out using Aquamacs and ESS – I’ll post the specifics in another post.  And, speaking of text editors – I’ve come to like TextWrangler (Mac-specific) quite a bit.

For typesetting anything with more than one equation in it, I (and most of the mathematical/statistical community) use LaTeX.  I use TeXShop as my frontend and MacTeX as my TeX distribution. This setup works like a dream on my Mac – it’s incredibly fast, and it took NO customization to get the two features that are really important to me: (1) automatic PDF refresh when you change your TeX code and (2) a backward search feature where I can click on the PDF and be taken directly to that point in the TeX code.  When I used a PC, I used TeXnicCenter as my frontend and MiKTeX as my TeX distribution, but I also found that I needed Sumatra (an alternative to Adobe for reading PDFs) and some extra customization to get my two required features.

I use PowerPoint for presentations containing zero or one equation(s), and I use Beamer (a LaTeX class) for anything with two or more equations.  I have PowerPoint 2008, which is pretty slow on a Mac, so I’ve been considering trying Keynote.  (Thoughts, anyone?).  I’ve also tried to get the best of both the PowerPoint and Beamer worlds (WYSIWYG + nice equations) by using LaTeXiT.  There’s a PC-equivalent called Aurora, which I used once for 30 days until my free trial expired.

Anything else?

That’s pretty much all I use on a daily basis.  I’ll mention a couple other miscellaneous things:  I’m just starting to use github to manage and share my code – git has great mechanisms for keeping track of all the craziness that comes with doing a shared project.  Lots of people in my department use Sweave, a cool way to integrate R and LaTeX.  I am not one of those people.  Sweave is especially good for putting together manuals, but not so good for working with analyses that take a while to run or that need to be very specifically formatted.

I’d love to hear about any setup tips that you find useful – do share!