R365: Day 38 – If Else statements in R

One way to logically organize code is to perform if-then-else statements. You can do these sorts of statements in most coding languages and even in Excel (!), which is a really easy format if you just need to do something quickly. The basic gist of if-then-else statements goes something like this:

If (some event or situation is true)

Then (do something)

Else (in all other cases, do this other thing)

These sorts of statements are useful for organizing and quickly analyzing data. In R, you can use these to summarize data or to help add stochasticity to a model. This is probably not the most efficient way to add stochaticity, but it is effective.

Lets pretend that it rains about three days a week. We could use historic data to back this up, but those data only go so far; eventually you need to start forecasting. You can use a fitted model to previous data to try to predict when rain will occur, but these models don’t end up working so well. Depending on your goal, it can be more effective just to model hypothetical weather.

if ( rbinom(1,1,(4/7))>=1 ){ 
 "Sunny!" 
} else { 
 "Rainy!" 
}

The output from this will give you back either “Sunny!” or “Rainy!” depending on the results of the rbinom() statement.

If (haha!) that code up there looks really cluttered and you don’t feel like dealing with it, then you can use ifelse() as one statement to do the same thing

ifelse((rbinom(1,1,(4/7))>=1),"Sunny!","Rainy!")

I have no idea why people seem to prefer doing the if statement up there, it seems like way more work and less intuitive. It might be that ifelse is less efficient, so if you have a big code it will take longer, but that doesnt really affect me now. Another cool thing is that you can nest if-then-else statements to create some reasonably complex tasks.

 ifelse((rbinom(1,1,(4/7))>=1),ifelse((rbinom(1,1,(13/14))>=1),"Sunny!","Thunderstorms!"),"Rainy!")

Here we nested function within our other ifelse() to say that some of the time there will be thunderstorms instead of sun.

Advertisements

R365: Day 37 – Random numbers

One of my friends is working on stochastic and deterministic modeling, so I thought that I would go a bit into stochastic modeling for the next few posts.

What is stochastic/chaotic/deterministic? Stochastic means truly random, where the events at one time point do not affect events in the future. There are lots of ways to generate random numbers, including some really awesome books, or using atmospheric noise. Chaotic systems are not random! Chaotic systems are deterministic, but you need to know the initial conditions to make accurate predictions. Weather is considered a chaotic system. Deterministic models means that the model progresses in a predictable fashion. That does not necessarily mean linear, but these models tend to be stable and (over a very long time period) tend to come to some sort of equilibrium, even if that means oscillating into infinity.

I first ran across these terms while taking a population biology class. We would analyze populations of endangered species and determine if their populations were likely to be deterministic (which in that context meant doomed to extinction). (bummer). After entering grad school I took a class on dynamic models in biology using this book as a textbook and working our way through life tables, matrices, ODEs and attractors.

The heart of any stochastic modeling is the ability to generate randomness. There are lots of different ways to create random numbers, but some of the simplest are the use of runif() or sample(). Both of them will give you a random number from a uniform distribution. If you are interested in another distribution, there are lots of probability distributions, one of them might suit you better. For instance, if you just want to generate a coin tossing random integer, you can use

rbinom(100,1,prob=0.5)

Which will generate 100 random digits of either 0 or 1. If you are interested in an uneven split, you can easily adjust the prob= portion to suit your needs. If what you need is an actual number, then sample() is a good tool. Sample() can be used with or without replacement, so that if you wanted to sample something from a group only once, you could. Sample() also lets you sample from either a vector or a positive integer. So if you had a vector of things like names (like a list of states), then you could sample randomly from those as well. For instance,

a=sample(state.name, 10, replace=FALSE)
 [1] "New Mexico" "Nevada" "Colorado" "Nebraska" "Hawaii" "California" "Tennessee" "Delaware" 
 [9] "West Virginia" "Connecticut"

or you could combine this with {maps} to create

Rplot08

a=sample(state.name, 10, replace=FALSE)
map('state', interior = FALSE)
map('state', boundary = FALSE, lty = 5, add = TRUE)
map('state',region=a,fill=TRUE,col=heat.colors(10),add=TRUE)

 This post was pretty useful for looking into random numbers.

R365: Day 35 – {zoo}

Most people in science have to deal with time series at some point or another: you measured glucose levels in blood over a period of weeks, you recorded the number of birds in a habitat over a summer, you monitored the efficiency of a an enzyme as it was overwhelmed with substrate. Beyond science, people like economists monitor for trends in markets over time (e.g.-the number of new home buyers in the US from 2006-2012). Whatever your poison, time series are inescapable. We have already covered some of the tools that you can use to view and analyse time series, like ts() and acf() and detrend(). While these tools are useful, they don’t always work if your time series is irregular (irregular data intervals, time data in different units, etc). {zoo} is a useful package if you do anything with time series.

Something that {zoo} does not like to do is handle duplicate time data. So if you have two data points from the same date in a {zoo} time series, many of the zoo functions will not work. One way to handle that is to average the replicated time data points, using aggregate()

> z <- suppressWarnings(zoo(1:8, c(1, 2, 2, 2, 3, 4, 5, 5)))
> z
1 2 2 2 3 4 5 5 
1 2 3 4 5 6 7 8 
> aggregate(z, identity, mean)
 1 2 3 4 5 
1.0 3.0 5.0 6.0 7.5

or just taking the last data point in a series of repeated data points:

> aggregate(z, identity, tail, 1)
1 2 3 4 5 
1 4 5 6 8

and you can even interpolate between time points if that is useful to you

> time(z) <- na.approx(ifelse(duplicated(time(z)), NA, time(z)), na.rm = FALSE)
> z[!is.na(time(z))]
 1 2 2.3333 2.6667 3 4 5 
 1 2 3 4 5 6 7

{zoo} also does not like to display log scales for the x axis, although you can circumvent that using the suppressWarnings() function. You can also create your own script to get around that

> z <- zoo(1:100)
> plot(z, log = “y”, panel = function(…, log) lines(…))

Rplot08

{zoo} can also read directly from .txt and .csv files, but you have to have them formatted correctly. I had an issue when I had multiple different time series (eg-site A, B, C,D, and E) all next to each other in one file. {zoo} read the first column as a date, but it did not interpret other columns as dates.

 

I will talk more about {zoo} as things come up.

 

R365: Day 34 – Pretty Syntax

While looking through documents to find out more about a package called {biwavelet}, I noticed that the website Inside-R (ha! Inside-R = Insider! I didnt get that pun until just now!) had a feature called ‘Pretty R‘. I was intrigued, a lot of what I like to do in this blog is post pretty pictures (thats been a large justification of learning so much about plotting maps in recent posts…). Intrigued, I followed the link and found that their website turns R script into a pretty looking format for websites with HTML. I entered some jibberish code just to see what it looked like (turns out I completely forgot the format for using seq()…) and voila! It turns it into pretty looking code like

R=seq(from=1,to=10,by=2)
T=c(5,2,5,7,24)
X=plot(R~T)

Unfortunately it automatically creates a note at the end that says:

Created by Pretty R at inside-R.org

but it is still a cool way to automatically have your R code formatted

Inside R1

 

The opening screen showing you where to put all your code…

Inside R2

 

and the output! Copy and paste the HTML into your document (most websites and blogs have somewhere to paste HTML, look for <> signs).

 

R365: Day 33 – Plotting polynomials with {ggplot2}

Plotting polynomials is really easy using {ggplot2}. I am still a novice with {ggplot2}, but the advantage of the package is that it lets you add on code as you go to specify different aspects about the graph, such as the title, whether to display lines or points, or how any linear model should be assessed. Lets use the dataset and code that I used as an example in Day 32. We were able to easily make a graph using the plot() function. However, to modify the graph using the basic {graphics} package requires some knowledge of all of the options in {par}, which are many (“we are legion”), and *importantly* need to be stated in the correct way. {ggplot2} lets you tag on extra portions of a graph just by adding a “+” sign. This can even end up as a burden if you re-use code a lot, as you can end up with tons of useless code that just clutters up the lines. But if you’re careful, it should not be too much of a problem. Once you get used to the new syntax, {ggplot2} is way easier and more flexible, and lets you make some really nice graphs.

##Lets try to plot out the curve for the graph
library(ggplot2) 
##so lets just look at just the scatterplot
qplot(x=spore.date, y=spore$spore.number, geom=c("point"))

Rplot1

 

##Now lets look at the graph with some smoothed averages
qplot(x=spore.date, y=spore$spore.number, geom=c("point","smooth"))

Rplot2

## now lets look at the graph and treat it as a quadratic
qplot(x=spore.date, y=spore$spore.number, geom=c("point","smooth"), method="lm", formula = y ~ 
 poly(x, 2))

Rplot3

##And as a X^3 function
qplot(x=spore.date, y=spore$spore.number, geom=c("point","smooth"), method="lm", formula = y ~ 
 poly(x, 3))

Rplot4

#And just to be rediculous, lets look at a x^10 function
qplot(x=spore.date, y=spore$spore.number, geom=c("point","smooth"), method="lm", formula = y ~ 
 poly(x, 10))
###......really ugly....

Rplot5

R365: Day 32 – Formatting dates

R365: Day 32 – Formatting Dates

I recently had to work with a dataset that I had put together in Excel that was poorly formatted. The dataset had date values associated with it but Excel had formatted them in an odd way. Luckily, R is really flexible in what it accepts as dates and lets you tell it what format everything is in.

#set your working drive (change this to whatever you need)
setwd('L:/Robin/R')
## open up your datasheet
## I just made a random datasheet with dates and some random data
spore=read.csv(file="R365.csv", head=TRUE, sep=',')
##look at your data
spore
spore.date=as.Date(spore$date,format='%m/%d/%Y')
spore.date
##now coerce it into a time series with {zoo}
require(zoo)
sporeproblem=zoo(spore$spore.number,spore.date)
##now you can work with it like its a time series dataset
plot(sporeproblem)

You can use the as.Date() function to format your dates pretty easily. The codes for what to use to represent days, months, years, days of the week, etc are available here and are reprinted below for convenience:

Symbol Meaning Example
%d day as a number (0-31) 01-31
%a
%A
abbreviated weekday
unabbreviated weekday
Mon
Monday
%m month (00-12) 00-12
%b
%B
abbreviated month
unabbreviated month
Jan
January
%y
%Y
2-digit year
4-digit year
07
2007

Getting Rid of Characters

Excel is an awesome tool if you do not have time to learn programming. Excel is also unbelievably frustrating and evil. However, having even some rudimentary skills with Excel can make sure that your life is less stressful.

My friend recently had a problem in that she had a data sheet with lots of values that had “>=” in front. These characters are not recognized in Excel, and makes it impossible for you to do any kind of calculations. Excel has a number of powerful and flexible functions that might help you, but they tend to rely on snipping out set portions of text (eg – the first 7 letters of a cell). One of my other friends suggested the solution of just using replace.

Replace (“ctrl+H”) is a useful  tool in this case as it finds only specified characters and replaces them. *NOTE* you cannot replace with nothing, you have to replace with a zero. Here is an example of Replace at work:

1 2 3

 

 

Audiobook Review: I’ll Never Get Out of this World Alive

img011

I really enjoyed the audiobook version of “I’ll Never Get Out of this World Alive”; it was well written and had an interesting story from a period-location that I rarely read about (Early-1960’s San Antonio). The story follows the path of Doc, a dope addict ex-physician who earns money to support his habit by providing health care to the underworld of San Antonio. The main body of his practice focuses on providing illegal abortions to the prostitutes of the city. He is haunted by the memory and ghost of Hank Williams, the famous country music singer who died from an overdose. As the story progresses, Hank follows Doc as he navigates his illicit trade.

I really enjoyed the reading of the audiobook, and I was pleasantly surprised to  find out that the author himself performed the reading. His country drawl put a nice spin to the novel and made the story feel rich, and his pronunciations of the spanish words also seemed to be on point. Overall a very enjoyable audiobook that Iwould highly recommend.

R365: Day 31 – RgoogleMaps

While looking around at different mapping packages, I found a cool package that pulls map data off of Google Maps, allowing you to make really nice looking figures. Some of the cool things you can do include making bubble maps, like this one of cadmium levels around the river Meuse in the Netherlands:

cadmium

library(sp)
 data("meuse", package = "sp", envir = environment())
 m<-bubbleMap(meuse,zcol='cadmium');

Or color-coded maps, like this one showing leukemia levels in upstate NY in the early 1980’s.

Rplot08

library(RgoogleMaps)
 data("NYleukemia", envir = environment())
 population <- NYleukemia$data$population
 cases <- NYleukemia$data$cases
 mapNY <- GetMap(center=c(lat=42.67456,lon=-76.00365), destfile = "NYstate.png",
 maptype = "hybrid", zoom=8)
 ColorMap(100*cases/population, mapNY, NYleukemia$spatial.polygon, nclr=9,add = FALSE,
 alpha = 1, log = TRUE, location = "bottomleft")

This blog post was super useful to explore beyond the basics of RgoogleMaps.

 


R365: Day 30 – Set working directory and importing data from CSV files

I am helping my wife with more survival analysis, but one of the problems that I always run into is that I suck at opening up data sets from external files. What I’ve done in the past is create vectors for each column of data; however I have not had to do this with more than 3 columns. Even still, that alone is very obnoxious and it would be much easier to just directly import the data from the CSV file.

A lot of the help files for importing data suggest that you use the function read.csv(), which is fine and does work. However, when you go to specify your file, you can either correctly include the full directory name or you can set up a pre-determined directory. Or just fail, which is what I’ve done mostly in the past.

To set a working directory, you use the setwd() function. I was reading data off of an external drive, so I used:

# set up your working directory using setwd
#working directories are just wherever the file your interested in is stored
setwd('E:/R')
# remember to use a '/' not a '\' because '\' apparently means to escape
# this is especially pertinent if your are just copying whatever is at the bottom of the file description in properties
## you should have saved your file as a .csv before hand
## check out ??read.csv if you have other separators like tab or semicolon
survival<-read.csv(file='KM.csv',head=TRUE,sep=",")
#call the head of the file to make sure that the object is being read correctly
head(survival) ##It works!

Which worked! Kinda surprisingly. Now lets see if I can change the working directory, then call stuff from either directory (so that way if you have files in an external drive and your hard drive you don’t have to switch directories back and forth).

##set a new working directory
setwd('L:/Robin/R365')
##Try reading stuff out of the new directory...
packages<-read.csv(file='Rpackages.csv',head=TRUE,sep=",")
##...and the old E:/ directory
poster<-read.csv(file='postersummary.csv',head=TRUE,sep=",")
###...and it did not work. R's response is...
##In file(file, "rt") :
##cannot open file 'postersummary.csv': No such file or directory

This did not work out so well for me. I might be missing something or screwing up something else, but I suspect that R only understands one directory at a time if you are not including the whole file name (eg – L:/Robin/R365/rpackages.csv). Hope this helped!

EDIT**** – if setwd() doesnt work, use file.choose() it lets you pick the file from your system.

 

EDIT2****- once you have set a working drive, you can ask R what documents are in the drive using ‘ls()’