Month: March 2014

R365: Day 33 – Plotting polynomials with {ggplot2}

Plotting polynomials is really easy using {ggplot2}. I am still a novice with {ggplot2}, but the advantage of the package is that it lets you add on code as you go to specify different aspects about the graph, such as the title, whether to display lines or points, or how any linear model should be assessed. Lets use the dataset and code that I used as an example in Day 32. We were able to easily make a graph using the plot() function. However, to modify the graph using the basic {graphics} package requires some knowledge of all of the options in {par}, which are many (“we are legion”), and *importantly* need to be stated in the correct way. {ggplot2} lets you tag on extra portions of a graph just by adding a “+” sign. This can even end up as a burden if you re-use code a lot, as you can end up with tons of useless code that just clutters up the lines. But if you’re careful, it should not be too much of a problem. Once you get used to the new syntax, {ggplot2} is way easier and more flexible, and lets you make some really nice graphs.

##Lets try to plot out the curve for the graph
library(ggplot2) 
##so lets just look at just the scatterplot
qplot(x=spore.date, y=spore$spore.number, geom=c("point"))

Rplot1

 

##Now lets look at the graph with some smoothed averages
qplot(x=spore.date, y=spore$spore.number, geom=c("point","smooth"))

Rplot2

## now lets look at the graph and treat it as a quadratic
qplot(x=spore.date, y=spore$spore.number, geom=c("point","smooth"), method="lm", formula = y ~ 
 poly(x, 2))

Rplot3

##And as a X^3 function
qplot(x=spore.date, y=spore$spore.number, geom=c("point","smooth"), method="lm", formula = y ~ 
 poly(x, 3))

Rplot4

#And just to be rediculous, lets look at a x^10 function
qplot(x=spore.date, y=spore$spore.number, geom=c("point","smooth"), method="lm", formula = y ~ 
 poly(x, 10))
###......really ugly....

Rplot5

Advertisements

R365: Day 32 – Formatting dates

R365: Day 32 – Formatting Dates

I recently had to work with a dataset that I had put together in Excel that was poorly formatted. The dataset had date values associated with it but Excel had formatted them in an odd way. Luckily, R is really flexible in what it accepts as dates and lets you tell it what format everything is in.

#set your working drive (change this to whatever you need)
setwd('L:/Robin/R')
## open up your datasheet
## I just made a random datasheet with dates and some random data
spore=read.csv(file="R365.csv", head=TRUE, sep=',')
##look at your data
spore
spore.date=as.Date(spore$date,format='%m/%d/%Y')
spore.date
##now coerce it into a time series with {zoo}
require(zoo)
sporeproblem=zoo(spore$spore.number,spore.date)
##now you can work with it like its a time series dataset
plot(sporeproblem)

You can use the as.Date() function to format your dates pretty easily. The codes for what to use to represent days, months, years, days of the week, etc are available here and are reprinted below for convenience:

Symbol Meaning Example
%d day as a number (0-31) 01-31
%a
%A
abbreviated weekday
unabbreviated weekday
Mon
Monday
%m month (00-12) 00-12
%b
%B
abbreviated month
unabbreviated month
Jan
January
%y
%Y
2-digit year
4-digit year
07
2007

Getting Rid of Characters

Excel is an awesome tool if you do not have time to learn programming. Excel is also unbelievably frustrating and evil. However, having even some rudimentary skills with Excel can make sure that your life is less stressful.

My friend recently had a problem in that she had a data sheet with lots of values that had “>=” in front. These characters are not recognized in Excel, and makes it impossible for you to do any kind of calculations. Excel has a number of powerful and flexible functions that might help you, but they tend to rely on snipping out set portions of text (eg – the first 7 letters of a cell). One of my other friends suggested the solution of just using replace.

Replace (“ctrl+H”) is a useful  tool in this case as it finds only specified characters and replaces them. *NOTE* you cannot replace with nothing, you have to replace with a zero. Here is an example of Replace at work:

1 2 3

 

 

Audiobook Review: I’ll Never Get Out of this World Alive

img011

I really enjoyed the audiobook version of “I’ll Never Get Out of this World Alive”; it was well written and had an interesting story from a period-location that I rarely read about (Early-1960’s San Antonio). The story follows the path of Doc, a dope addict ex-physician who earns money to support his habit by providing health care to the underworld of San Antonio. The main body of his practice focuses on providing illegal abortions to the prostitutes of the city. He is haunted by the memory and ghost of Hank Williams, the famous country music singer who died from an overdose. As the story progresses, Hank follows Doc as he navigates his illicit trade.

I really enjoyed the reading of the audiobook, and I was pleasantly surprised to  find out that the author himself performed the reading. His country drawl put a nice spin to the novel and made the story feel rich, and his pronunciations of the spanish words also seemed to be on point. Overall a very enjoyable audiobook that Iwould highly recommend.

R365: Day 31 – RgoogleMaps

While looking around at different mapping packages, I found a cool package that pulls map data off of Google Maps, allowing you to make really nice looking figures. Some of the cool things you can do include making bubble maps, like this one of cadmium levels around the river Meuse in the Netherlands:

cadmium

library(sp)
 data("meuse", package = "sp", envir = environment())
 m<-bubbleMap(meuse,zcol='cadmium');

Or color-coded maps, like this one showing leukemia levels in upstate NY in the early 1980’s.

Rplot08

library(RgoogleMaps)
 data("NYleukemia", envir = environment())
 population <- NYleukemia$data$population
 cases <- NYleukemia$data$cases
 mapNY <- GetMap(center=c(lat=42.67456,lon=-76.00365), destfile = "NYstate.png",
 maptype = "hybrid", zoom=8)
 ColorMap(100*cases/population, mapNY, NYleukemia$spatial.polygon, nclr=9,add = FALSE,
 alpha = 1, log = TRUE, location = "bottomleft")

This blog post was super useful to explore beyond the basics of RgoogleMaps.

 


R365: Day 30 – Set working directory and importing data from CSV files

I am helping my wife with more survival analysis, but one of the problems that I always run into is that I suck at opening up data sets from external files. What I’ve done in the past is create vectors for each column of data; however I have not had to do this with more than 3 columns. Even still, that alone is very obnoxious and it would be much easier to just directly import the data from the CSV file.

A lot of the help files for importing data suggest that you use the function read.csv(), which is fine and does work. However, when you go to specify your file, you can either correctly include the full directory name or you can set up a pre-determined directory. Or just fail, which is what I’ve done mostly in the past.

To set a working directory, you use the setwd() function. I was reading data off of an external drive, so I used:

# set up your working directory using setwd
#working directories are just wherever the file your interested in is stored
setwd('E:/R')
# remember to use a '/' not a '\' because '\' apparently means to escape
# this is especially pertinent if your are just copying whatever is at the bottom of the file description in properties
## you should have saved your file as a .csv before hand
## check out ??read.csv if you have other separators like tab or semicolon
survival<-read.csv(file='KM.csv',head=TRUE,sep=",")
#call the head of the file to make sure that the object is being read correctly
head(survival) ##It works!

Which worked! Kinda surprisingly. Now lets see if I can change the working directory, then call stuff from either directory (so that way if you have files in an external drive and your hard drive you don’t have to switch directories back and forth).

##set a new working directory
setwd('L:/Robin/R365')
##Try reading stuff out of the new directory...
packages<-read.csv(file='Rpackages.csv',head=TRUE,sep=",")
##...and the old E:/ directory
poster<-read.csv(file='postersummary.csv',head=TRUE,sep=",")
###...and it did not work. R's response is...
##In file(file, "rt") :
##cannot open file 'postersummary.csv': No such file or directory

This did not work out so well for me. I might be missing something or screwing up something else, but I suspect that R only understands one directory at a time if you are not including the whole file name (eg – L:/Robin/R365/rpackages.csv). Hope this helped!

EDIT**** – if setwd() doesnt work, use file.choose() it lets you pick the file from your system.

 

EDIT2****- once you have set a working drive, you can ask R what documents are in the drive using ‘ls()’

R365: Day 29 – each, times, and length.out argument

When you are making dataframes, you need all of your datasets to be the same length. But sometimes you don’t want to bother dealing with entering every last variable in a vector. We actually used the length.out argument during the Day 27 – dataframes post to specify how many individuals we needed. The each argument lets you specify how many times to repeat each numeric, the times indicates how many times to repeat the group as a whole, and the length.out argument indicates the total length of the desired vector. You can use the arguments together in one rep(), or individually.

## make a vector with repeating numbers of length 20
rep(c(3,5,6),length.out=20)
## you can do the same thing with the argument 'len' instead of 'length.out'
rep(c(3,5,6),len=20)
## make a vector with each number repeated three times
rep(c(3,5,6),each=3, len=20)
##Lets try to break R, tell it to repeat something until length 24, but also tell it to stop at length 20
## the 'times' argument tells R to repeat the whole group a certain set of times, whereas
##the 'each' argument tells R to repeat each numberic a set number of times
##It does both, and repeats the group, but stops at length 20.
rep(c(3,5,6),each=3, times=3,len=20)
##now lets try this with seq
seq(1, 9, length.out=20)
##but seq does not recognize the 'each' or 'times arguments
#seq(1, 9, length.out=20, each=3)
#seq(1, 9, length.out=20, times=3)

R365: Day 28 – {maps} again

I offered to help a friend make a map of the US with some states filled in. I thought it would be a quick and easy job, and it is, but it took me a while to figure out how to do it. At first I tried plotting the US and then setting up a separate region() argument. This just created a new plot consisting of the states that I had specified! Which was pretty unfortunate and not what I wanted to do. After looking through some blogs and finding out about some awesome sounding packages (heads up, eventually I will tackle {RGoogleMaps}), I eventually figured out that within the maps() function there is an argument called ‘add’, which lets you add on another step to a previously plotted map. This way, you build the map in different layers. It’s kind of cumbersome and I am reasonably sure there is an easier way around this, but I’m going to go with ‘If it’s dumb and it works…”, and say it’s good enough for me.

I didn’t want to reveal her map, so I created a map of the places I have been and the places I want to go to soon. It was kind of tricky, because you have to kind of guess what the countries are called. I entered ‘us’ thinking that it would highlight the US, but it highlighted both the US and the former USSR. Eventually after several permutations, I figured out that it recognized both because the front end of their name (so if you put ‘south’ it would highlight South Korea, South Africa, etc) and entered ‘usa’. The map also does not recognize the term ‘russia’, but will recognize ‘ussr’ (which is a bit ominous). You can find a whole list of the countries by using:

 map("world", namesonly=TRUE, plot=FALSE)

But they seem to be in no order. Maybe there is a list somewhere out there?

Rplot04

##install and load the maps package into R
install.packages("maps")
library(maps)
##if you need help with maps, remove the hashtag on the line below
#library(help="maps")
## or check out http://cran.r-project.org/web/packages/maps/maps.pdf

##Map out World
map('world', interior = FALSE)
map('world', boundary = FALSE, lty = 5, add = TRUE)
map('world',region=c('belize','canada','italy','australia','USA'),fill=TRUE,col='blue',add=TRUE)
map('world',region=c('argentina','brazil','chile','peru','new zealand','UK','ireland','iceland','south korea','vietnam','nepal','ussr','madagascar','switzerland','netherlands','belgium','france','spain','germany','denmark','norway','finland','sweden','mexico','ghana','india'),fill=TRUE,col='green',add=TRUE)
## add axes to your map
map.axes()
##add a scale bar to your map
map.scale()
##find out about plotting legends http://stat.ethz.ch/R-manual/R-patched/library/graphics/html/legend.html
legend(50,-50, c("Places I've Been","Places I want to Go"),fill=c("blue","green"))

Audiobook Review: Let’s Explore Diabetes with Owls by David Sedaris

img009

I really and truly love David Sedaris novels. Witty, self-abasing, and laugh-until-my-guts-hurt funny, his novels explore his life at all stages: his childhood growing up in newly-integrated North Carolina, his misadventures in college, and his recognition and acceptance of his homosexuality. His novels are often self-critical and circular, with seemingly unconnected events rolling back around to bite him in the ass. Many of his audiobooks are read by himself, often in front of a live audience. In his most recent edition, David Sedaris focuses on his present-day life with his husband Hugh. While the novel does have a chapter that features owls, the title of the book draws from an odd inscription he gave to a fan at a book signing. While I really like ‘Lets Explore Diabetes with Owls’, I felt like he has had better collections. As he has become richer from his works, his stories become less relatable. At one point, he decides on a whim to move from his summer home in Brittany region of France to a cottage in the Somerset region of England. However, despite his fame and fortune, he still faces many of the same problems as ordinary people: mean customs officials, deterioration of his health as he ages, pre-occupation with strange hobbies, and theft of a computer. His novels often focus heavily on his family. While his parents always came across as stingy, this novel made them seem unnecessarily cruel. He describes never being good enough in his father’s eyes, and it really felt like it hurt. While I really liked “Owls”, if you are new to David Sedaris, I would suggest checking out another one of his collections first. I think my favorite collection was “Me Talk Pretty One Day”, which did a nice job of sampling stories from all stages of his life.

Audiobook Review: The Cuckoo’s Calling by Robert Galbraith

img010

A week ago I listened to the audiobook for J.K. Rowling’s newest book, “The Cuckoo’s Calling”. She wrote the book under the alias “Robert Galbraith”, an ex-military man. She has stated that she found the use of the pseudonym to be useful to her writing process, to separate her writing from her fame. I will admit that I was drawn to the book specifically because it was written by J.K.R., but I appreciate the need for anonymity and the usefulness of a pen name. So while I realize that she was the one who wrote the book, I want to honor “Robert Galbraith” as the persona who drew out the fine writing.

Robert Glenister did a really nice job with the voices for the audiobook. I will not pretend to know much about regional differences between British accents, but it did seem like he hit every character spot on. His voicework meshed really well with the text, making the audiobook really enjoyable.

The story follows a down-on-his-luck ex-military private investigator Cormorant Strike as he investigates the 3-month-old suicide death of a famous model. It’s kind of a classic locked-door mystery, with the woman falling to her death from a 3rd floor balcony, with no witnesses beyond a cocaine-addled neighbor who claims to have heard voices arguing moments before the fatal fall. With few remaining physical clues, Strike has to rely on interviews with unreliable sources, stonewalling police, disinterested family, and drug addicted friends. To his benefit, he has one solid ally in the chase: his new secretary, Robin Elacott. As Strike investigates, he finds that the case is hardly closed, and that the killer may be close at hand.

I really liked the writing in the book. Galbraith did a good job emphasizing the small details that make any book great. Probably my favorite moment was the detective walking near a grizzled homeless man on the street who slowly opens his mouth and extends his tongue. It was such a strange and non-sequitur visualization, but it makes perfect sense given the odd things that happen in cities. The event was completely unrelated to anything else in the plot, but it added to the richness of the environment and made the novel feel realistic. While I loved the attention to detail, I did not like the structure of the book. The ending was a surprise, but I felt like it was written so that it could have been any of a number of possible suspects. I also did not like some details of the book that seemed cliché and out of place in such rich writing. I felt like the name ‘Cormorant Strike’ itself is a really cheap attempt at making the character feel more interesting from the beginning. Overall, I really enjoyed the audiobook and I am looking forward to the sequel, ‘The Silkworm’.