alyssafrazee.com is live!

My academic website (biostat.jhsph.edu/~afrazee) and this blog have been merged into one fantastic nerd site, alyssafrazee.com, as of today!  

If you subscribe to this blog with RSS, you can just delete that feed and add alyssafrazee.com to your reader instead (it has RSS enabled, and all the wordpress posts were ported over).  New posts will go on the new site from now on, but I’ll keep this site live so I won’t break old links. 

Enjoy! 

links/cheatsheets galore

Here are some great pages I’ve found myself referring to a lot in the past few weeks, or pages I want to remember (without bookmarking, since I never look at my bookmarks).  I’m collecting them here so I can have a somewhat reasonable number of tabs open in my browser at one time.  (Things are getting insane at the moment).  They’ve helped me, so maybe they’ll help you too.

  • Markdown basics – useful for writing README.md ‘s for github repos, among many other things: 
  • Mac keyboard shortcut symbols – I never know what those crazy characters mean.  This is what they mean.
  • Chrome keyboard shortcuts – I have now successfully been able to navigate between Sublime, Terminal, and two separate Chrome tabs without using my mouse.  AWESOME. (The above link is for Mac; this is the link for Windows).
  • A good git workflow – I haven’t read this through but would really like to, since I need a more structured way of coding, committing, and pushing to github
  • Homebrew – easy software installation for OS X!  yeah!  I finally got wget properly installed thanks to this magical world.  And if something fails, you can start with “brew doctor” as a way to find the problem.  
  • I installed Sublime Text 2 the other day (a really beautiful/powerful text editor), and then I decided to try out SublimeREPL for running code right there in Sublime.  I didn’t want to lose the link to the SublimeREPL documentation, so it’s here.  So far I think SublimeREPL is okay, but there are a few things I don’t like, or maybe just haven’t figured out yet: (1) I’ve had problems importing my libraries (i.e., if I’m in a folder with a script thelibrary.py, and I type import thelibrary.py, it complains that it can’t find thelibrary).  (2) I haven’t been able to ctrl-C or ctrl-D to escape functions while they’re being evaluated.  (3) sometimes the syntax highlighting in the REPL is annoying – e.g., there will be big strips of bright pink at the end of certain lines.
  • An intro to HTTP – I’d love to read this soon, since I’ve started to do a bit of web development
  • Flask mega tutorial – I’m working through this right now
  • statuscode – news for programmers!

more to come!

hacker school: day 11

Today is the 11th official day of my summer at Hacker School.  It. is. awesome.  By day, I get to surround myself with great people and work on whatever project I want.  Seriously, no rules.  It’s been so easy to get in the zone: I start coding and lose track of time, in the best way.  And if I get sick of coding, there are other great things to do, like help other people with their code, go to little student-organized working groups, get a book from the Hacker School library and do some reading, or pair with another person on his or her project.  Or I can take my code to one of the eight gazillion experts here and have them review it – I’ve done this a couple of times and I already feel like I know how to write neater code than I wrote a month ago.  I really love it here – it’s so awesome to go to “work” in the morning and be able to work on whatever whims and fancies occur to me, as long as they are sort of programming-related.  (And honestly, a lot of my whims and fancies are programming-related.  Nerd alert.)

And by night, I get to explore New York City!!!!!!  More on that later.

For these first few weeks, I’ve been having a blast coding up my favorite game, 500, in both R and python, which is a surprisingly great exercise.  It lends itself well to object-oriented stuff, and the logic is totally sensible but complex enough to make the implementation a fun project.  I’ve also been working through the Matasano Crypto Challenges, because cryptography is awesome, and because it’s nice to have little self-contained problems to work on, and because it’s another fun way to improve my python skillz.  Also, they’re based on real-world security problems, so the application is interesting and relevant.  I’m almost through number 6 (of 48…).  

I’m storing all of my Hacker School code and notes in my hackerschool github repo.  You can source and play my 500 game, but sadly I can’t post my solutions to the crypto challenges (it’s against their rules).  To make up for it, I’ve been keeping a little journal of sorts about my experiences with them in the README.md file :) 

It’s already day 11 and I don’t have a Big Summer Project With A Deliverable that I’m working on at the moment.  However, I think I would like to study a bit of web development – I’m excited about making software that non-programmers/non-statisticians can use, and that definitely involves making some kind of user interface.  I’ve dipped my pinky toe in these waters with things like shiny (for R) and d3 (for JavaScript, which I don’t know), but I’ve never actually jumped in and made anything with these, and I’d like to figure out how to make things that aren’t necessarily focused on data analysis.  To be clear, I think data analysis is awesome and I want to be some kind of data scientist someday, but there are so many other cool tools to be made.  People casually throw out software ideas all the time:  “I wish I could block certain websites during my study times”, “I wish there were a better to-do-list app than Google Tasks, I want to categorize my to-do list”,  “I wish I could automate the process of checking whether my dissertation committee follows the school’s rules”, “I wish there were software that would turn people’s crappy slides into good slides”, “I wish there were a bot that would to search Craigslist for me and email me with promising ads” — people have mentioned all these ideas to me in the past few months.  The thing about being at Hacker School is that you start to believe anything’s possible…all I’d need is a little bit of ______, and I could definitely make that with a [language I already know] backbone.  I’ve been feeling like “all I’d need is some practice doing web development, and I could definitely make some of this software wtih R/python/HTML/CSS.”  So we’ll see what happens!  Stay tuned for Alyssa’s Awesome Web App.  (Now I’m accountable.)

Also, New York.  NEW YORK IS SO COOL.  For reference, I’m living in central Brooklyn and working in lower Manhattan.  I’ve been here for just under 3 weeks, and I haven’t been above 34th Street (!) so I have a LOT left to explore.  But based on these three weeks, here are my New York thoughts.

THINGS I LIKE ABOUT NYC:

  • amazing food
  • bagel sandwiches
  • parks everywhere
  • you can totally eat on the subway
  • sublime coffee and espresso
  • I can ride a subway “home” from “work” that goes over the East River (above ground) and watch Manhattan twinkle in the twilight
  • sweet concerts/events/secret science club meetings happen all the time
  • you can get vanilla malted ice cream in a pretzel cone.  no biggie.
  • etc. (to be continued)

THINGS I DO NOT LIKE ABOUT NYC

  • the Yankees
  • smells vaguely of trash sometimes
  • a guy looked me in the eye and threw his nasty garbage in my shopping bag in the subway station last week.  what is that about?!

And finally, some pictures! 

One time we went to Washington Square Park after work to eat ice cream, and it looked like this:

Image

 

Another time, my friend Mandy came to visit – it was a beautiful day, and we went to the Hudson River Park!

Image

Image

 

The tallest shiniest building in that last picture is the new World Trade Center – one of four planned.

Mandy and I also went to the High Line, which we really enjoyed – the High Line is this park that’s built on an old railroad track that runs about a story above the streets of NYC, over neighborhoods like Chelsea and the Meatpacking District.  She insisted on taking this picture of me while we were having dinner there:

Image

And here is a picture of the street I live on.  I don’t live in one of these (I’m a few blocks over, where the brownstones are less brown), but I enjoyed the picturesque-ness of this area.  Plus it was a great opportunity to use instagram’s vintage-y filters.  So artsy ;) 

Image

stay tuned for updates!

 

summer 2013: Hacker School!

Hello from New York!

Tomorrow morning, my summer adventure begins: I’m a student in the summer 2013 batch of Hacker School!  Basically, Hacker School is three months of learning about programming, in a bring-your-own-project-ideas-or-enthusiasm, collaborative, friendly, flexible, awesome environment.  I haven’t actually been yet, so that’s my one-sentence interpretation of what it is based on the website, and from my three Skype sessions with facilitators (so definitely check out the website if you want to know more).

I’m really excited about this, for many reasons.

  1.  I’m guessing I’ll meet some amazing, smart, friendly, enthusiastic people who like learning stuff, which I look forward to because I’m pretty passionate about learning stuff too, and I really like meeting people who share that passion.  
  2. I really like coding/programming, but rarely have the time during research to think very hard about the best way to program my methods – it seems like getting the results quickly is always more important than getting the results in the optimal, most organized, most efficient way.  I figure that practicing programming will make it easier to write good code faster.  This is becoming a huge challenge for statisticians, since new statistical methods are being deemed useless without good software – so I’m excited to   really dig in to coding!
  3. Some pretty famous people are going to be at Hacker School…I’m a little star struck.  I’m beyond excited to meet them, and also to meet my fellow students and the facilitators.  (I just like meeting people!!!)
  4. I get to live in New York for three months.  So pumped.  I know, I know, there’s no air conditioning, washers/dryers, or space in any of the apartments I can afford (I mean seriously, the shower in my current place is shaped like a triangle, and my roommates tell me it’s totally normal if the curtain just falls down after you shower…) but the food is delicious, the public transit is so good, and it’s very leafy.  

I’ve been interested in learning JavaScript for a while, mainly because of a pretty neat tool that some of my friends/colleagues developed for interactive data visualization, using d3. So, one of the Hacker School facilitators recommended starting by reading the book Eloquent JavaScript.  It’s free and it’s actually delightful to read (I’ve chuckled out loud several times in the first few chapters).  So that’s where I’ll be starting tomorrow!

Keep an eye out for more Hacker School/New York musings this summer!  AND ALSO, if you have ideas for useful software projects, I (and I suspect my fellow Hacker Schoolers) would love to hear them.

Ideas for Super Awesome Conferences

At this time last week, I was experiencing my first ENAR!  Overall I had a great time  - met some cool people, went to some cool sessions, hung out by the pool and went to Epcot, etc.  But as we were hanging out over the course of the conference, I found myself in more than one discussion with friends about how stat conferences could be so much more awesome.  I thought to myself  last Monday: “Alyssa!  You haven’t written in your blog since Election Day.  ‘Ideas for Super Awesome Conferences’ would be a fabulous post!”  And then I was beaten to the punch by Yihui’s post on conferences.  Read it, it’s good.  But I have some different (and sometimes conflicting) ideas, so I decided to write “Ideas for Super Awesome Conferences” anyway.

DISCLAIMER:  I’m no expert.  I’ve only been to two big stat conferences (ENAR 2013 and JSM 2012 in San Diego).  I’m a TOTAL CONFERENCE NEWBIE (since I’m a third-year student).  Also, I have also never planned a conference so I most likely have no idea what goes in to such a huge endeavor.  Bearing this in mind…

Ideas for Conference Organizers:

(1) Choose a good venue.  This is probably the first thing that happens after the city is chosen, so I’m sure a lot of thought goes in to this, but the venue at ENAR left a lot to be desired.  For one thing, the conference was held in a giant conference-center/hotel megacomplex with pretty much nothing outside the complex.  This meant:

  • there weren’t really any options for food except for the megacomplex restaurants, which all served the same food and charged $14 for chicken fingers (which is either a totally ridiculous thing to submit for reimbursement, if you’re lucky enough to have that, or a totally outlandish way to spend your graduate student stipend)
  • There was one convenient hotel choice, so if you missed the February 15 deadline for booking a room, wanted a cheaper option, or wanted to rack up your hotel rewards points for some other chain, or anything like that, you had to stay in a hotel at least a few non-walkable miles from the conference center.  The solution to this was I guess to just stay in the conference hotel (which is where I was, and it was convenient and nice), but then I realized on Tuesday that I hadn’t been outside in three days that that made me sad.
  • Evening activities were limited to whatever was in the megacomplex, unless you wanted to spring for a several-mile cab ride.  (We did get bused to Epcot, which totally rocked except for the fact that we only got to stay for a couple of hours, most of which were spent eating dinner.)

I also found it somewhat hilarious that the conference material plugged the venue by saying that this conference center has the “largest pillar-free resort ballroom” in the country (which is oddly specific, but whatever) – and then saying that ENAR wouldn’t actually be using this ballroom but that we were free to go take a look at it!  Okay…

Anyway, my ideal venue would be located within walking distance of several hotels, restaurants, bars, convenience or grocery stores, etc – the key to me is having options.  I think this is where Yihui and I diverge a little bit, because I don’t think a college campus like Iowa State would be a good place to have a conference at all.  For one thing, most college dorms are occupied with college kids during the school year, so the whole dorm option seems unlikely, and if you’re in a small-ish town with limited hotel space (like Ames, or like the small town I went to college in), it just won’t be possible to fit all the participants in. For another, I don’t think forced interaction between participants because there’s literally nothing else to do is very healthy – I’ve found that having somewhere to go (an interesting restaurant or bar) is a fun way to socialize/network with people you’ve just met.  It takes away some of the tension.  Also, Ames is 41 miles from the nearest major airport – I know that’s specific to Iowa State, but it’s true for a lot of more rural universities, and I think it would be a major inconvenience.  For an example of a venue I really liked, take last summer’s JSM in San Diego – nice conference center, many hotel choices, good food around, the beach was accessible by public transit…

Also, the venue should have good wifi.  Accessible within the entire conference center, including the session rooms (in case presenters need to upload their slides from Dropbox because their favorite flashdrives mysteriously went missing the day before they needed to leave and their backup flashdrives suddenly decide to be “malformed” or whatever.  Not that this happened to anyone I know…).  The cost of such wifi could probably be included in the registration fee.

(2)  Have a conference app!  It’s 2013.  We work in a tech-y field.  Lots of people have smartphones and would use the app.  It would include most of what is in the printed program (which would solve the excessive program-printing problem) in addition to a scheduler – so you could plan out the talks/sessions you want to go to that day (thanks to my friend John for this idea, which I LOVE).  It could also include things like a live twitter feed, an “announcement area” for room changes and lost-and-found issues and whatnot, and some of the social networking features that Yihui was talking about (e.g., search for participants by name or by university, so you could see whether anyone you know from years ago will be coming).  This app could probably get made cheaply and quickly if it were a contest (in a word: crowdsource!)

(3) Like Yihui said, have the nametags be printed front and back.  Such a good, simple idea.  Also, potentially ask for first and last names in separate boxes on the registration form…my friend got a printed badge that said her last name in giant letters instead of her first name, which was kind of hilarious and also kind of unfortunate.

(4)  Think carefully about placement of the poster session, because it’s often hard to get people to go to that part of the conference.  I had a couple discussions about this.  I tend to agree with Karl, who suggested having it as the only thing happening during an afternoon session in the middle of the conference.  This way: everybody’s around.  They’ve all arrived and they aren’t leaving yet.  (The one at this conference was Sunday night – the first day – so a lot of people weren’t there yet).  Nobody has another session they really wanted to be at that conflicts with it.  I talked a little more about it with Tom, a conference vet who says that people still don’t really go if it’s scheduled like that.  All I’ll say is this:  even if it’s in the afternoon, have food and drinks.  Kind of a happy hour situation.  Maybe not open bar, but cash bar would be awesome (though a beer should cost less than $11…).  And advertise the happy hour nature of the session.

 

Now, the responsibility making a conference awesome does not fall only on the shoulders of the organizers….

Ideas for Conference Participants

I really only have one, and it’s this.  Give a great talk.  I know not everyone likes giving talks, but either someone thought you’d do a good job and that you have interesting work (invited sessions) or you thought your work was worth sharing (contributed sessions), so in either case, you kind of owe it to the attendees to at least try to engage them in your research.  The ability to clearly communicate your research is an integral part of being a good scientist, I think, so it’s part of the job, and conferences are a good way to practice this.  ENAR actually puts out a list of guidelines for giving an effective presentation!  The most engaging talks I’ve seen have these qualities:

  • They stay in the time limit comfortably, so the presenter isn’t completely rushed.  The 1-slide-per-minute guideline seems to work surprisingly well for me.  (This means that if I see that “1/40″ footer on Beamer slides at the beginning of your 15-minute talk, I will inwardly shed a tiny tear)
  • They explain the big picture well, but might leave out some details.  I think this is awesome because then if people are intrigued by the general idea, they’ll ask the speaker afterward about the details.  If people are confused about the concept, they kind of just forget about it and move on to listening to the next talk.  Leaving out details is a great way to stay within the time limit.
  • They practice their talk at least once.  Maybe they even invite someone to listen – someone who knows stuff, but doesn’t know about their research.  Then they’ll be able to tell whether their talk is appropriately timed, or whether they’ve left out too many details. (This actually just what I try to do – but I like to imagine that I’m not a crazy person and that my favorite speakers also practice their talks.)
  • They use slides to remind them what to talk about rather than to tell them.  Which means they don’t read slides full of text as they point at the words they are reading with a laser pointer.  The “reminder slides” are usually diagrams or pictures.
  • They tell a joke or two (Yihui mentioned this in his post and actually told jokes at his talk, which I really appreciated.  My favorite quote of his was something along the lines of “I included some funny stuff in here.  I’m sorry if it offends you.  You may think this is not ENAR.”  Ha!  But the larger point is that having a joke in your presentation absolutely should not make it un-ENAR-like!  That’s just sad!  Statistics is fun!  We should be able to laugh about it!)

I’d be super pumped to go to a stat conference in a cool city with a sweet app, a happy hour poster session, and some really awesome talks – keeping my fingers crossed!

 

Merry Statistician’s Christmas!

My advisor told me today, in a fit of excitement and enthusiasm, that “the real winners in this election were statisticians.”  (He later went on to joyfully declare, “Man, I LOVE being a statistician!”  The guy is living the dream.)  According to him, today is statistician’s Christmas.  What a great metaphor!  Check it out: all year long, we predict and model and analyze and see if we can figure out what will happen.  Everything goes down on Tuesday night, and we wake up Wednesday morning to the best gift of all: the real data.  And this lets us figure out how well our models and predictions and analyses did.

This year, the stat nerds did awesome.  Nate Silver (of fivethirtyeight) is Mr. Popular in geek circles right now.  Should Florida officially go Obama, he will have predicted every state’s vote correctly.  That’s really hard to do, since so many factors influence how people vote and not all of them are measurable, but man, he and some other awesome people rocked it.  Check out this amazing Twitter hashtag it spawned. 

It’s also kinda cool to look at the data visualization stuff that comes after elections.  I “watched” a lot of the election coverage on Huffpost – their state-by-state graphics were really nice.  I liked the interactive parts – e.g., how hovering over a dot on the county scatterplot told you which county it was.  On another note, the main way we visualize US election data is with the results map, where states are colored red if they went Republican and blue if they went Democrat.  This caused my roommate (who isn’t from the US) to comment “That map is looking pretty red…Romney must be winning?”  And that’s the problem with the election map – big, unpopulated, 3-vote states are usually red.  NPR did some fun stuff that addressed this issue by making the sizes and colors of states on the map a bit more informative.

So, merry stat-mas to all!  Enjoy the peace and quite of the election being over.

working interactively on a remote computer

In my software/hardware setup post, I talked a little bit about working on a remote machine.  As promised, here are the details about how I make interactive coding easy for me.

Let’s start from the very very beginning.  Our department has a pretty sweet set of really powerful computers (“the cluster”) available for us to use.  Because the computers are so awesome, they have to be kept in a room that is specially cooled and maintained, and they don’t have desktops that we can sit down and interact with.  As such, you need to use a different computer (i.e., a laptop) to remotely log in to the cluster and either (a) start an interactive session, in which you can type commands into the Linux shell, or open up an interactive version of (say) R or python and type commands there, or (b) submit a batch job or shell script to run without user interaction.

Batch jobs and scripts are pretty straightforward, so I’m not going to yammer on about that in this post.  But working interactively is a little trickier, mostly because it’s good practice (in the name of reproducible research, scientific integrity, and organization) to keep a record of the commands you run to get your results.  If you get results interactively on the remote machine, there’s not a built-in way to do this.  But never fear!  Software and shortcuts exist that allow you to save a script on your local computer, but run each line of that script interactively on the remote machine.  Since statisticians like me usually do most interactive work in R, I’ll describe here how I run a local R script interactively on the cluster.

I’m currently a Mac user, so my main tool for this purpose is Aquamacs.  This is basically a version of an Emacs text editor.  My opinion on Emacs is that it’s a really powerful tool, but requires a lot of customization to access all that power, and it has pretty funky keyboard shortcuts.  Aquamacs allows you to use either Emacs keyboard shortcuts OR common Mac keyboard shortcuts (e.g., command-Z for undo) in an Emacs session, which I find really useful.  Aquamacs makes use of ESS (Emacs Speaks Statistics) when interacting with R.

So let’s get to the point: here are the steps!

(1)  Install Aquamacs.

(2)  Open your local R script inside Aquamacs.

(3)  Type M-x shell (M means escape key), which will basically open up a Terminal window inside Aquamacs.  (Once you hit M-x, you won’t be typing in the R script anymore, but will see your stuff appear at the bottom of the window).

(4)  In the Terminal window that just opened up, log in to the remote machine. (I’m making the assumption here that the login process to the remote machine involves some variant of an “ssh” command in the terminal.)

(5)  Click Window > Move tab to new frame.  The terminal window will slide over to the other side of your computer, so you’re now seeing the R script and the prompt of the remote machine simultaneously.

(6)  Start R on the remote machine.

(7)  Staying in the remote-machine-R window, type M-x ess-remote.  You’ll then be prompted for a dialect – type - the line “options(STERM=’iESS’)” will have been run inside your R session.

(8)  Move back to the local R script.  You can now run line-by-line on the remote machine either using control-n to run just the current line, or using control-r to run a block of highlighted lines.

This has worked pretty well for me, but I am definitely interested in hearing others’ ideas if someone knows of a more efficient way to do this from a Mac.  I use Aquamacs almost exclusively for this, which feels a little like using a sledgehammer for a tiny little nail, since I’m not really harnessing all the power of Aquamacs/Emacs or using it for any of its other intended purposes, and I haven’t put a lot of time into customizing it.  But it does get the job done, and it’s definitely better than the ol’ copy-paste trick.

I’m incredibly happy with my Macbook (it’s a delightfully fast, beautiful, efficient computer), but I really really miss Notepad++, the best text editor I’ve ever known – it’s Windows-only.  Running interactively on a PC is smoother than the Mac workflow I described above: it basically involves (1) logging into the remote machine using something like PuTTY and opening R, (2) opening up your local R script in Notepad++, and (3) hitting F9 to run a line or highlighted set of lines.  So much more elegantly simple!  Something for Mac software developers to aspire to, I suppose…