R365: Day 27 – dataframes

So when you have a lot of data, its sometimes worth it to organize it all into a dataframe so that way you can call different rows and columns without having to refer back to different data vectors. Dataframes are just a bunch of vectors that are all the same length and have been strung together into a matrix-like object. Making dataframes is fairly straightforward, but all of the objects have to be of the same length. While you can make your own dataframes, all of the objects in the {datasets} package are formatted as dataframes. Once you have made a dataframe, you can attach() the dataframe, allowing you to call the columns and rows without having to refer back to different vectors. Because the object is made like a matrix, you can call individual datapoints as well.  

##create data sets from different vectors
## well pretend we're selecting something like a random selection from a student body
# I like to switch up runif() and rnorm(), do as you please weight=runif(150, min=100, max=250)
height=rnorm(150,mean=65, sd=3)
##if we want the sex of the participant to be random, this gets tricky (I think)
## this is probably not the most efficient way to do stuff, but I am going to make a vector of length 150
sex=rep(c("male","female"),length.out=150)
##then sample without replacement from the vector to create a randomized vector
SEX<- sample(sex, 150,replace=FALSE)
##I want to give them grades based on a normal curve, but i dont know how to do that now
## I tried "rnorm(150,c("A","B","C"."D","F"))" but it didnt work
##we can fudge a normal curve by setting up the amounts ahead of time in a different vector
## but i need to learn how to do that in the future (note to self)
grade=c(rep("A",15),rep("B",35),rep("C",50),rep("D",35),rep("E",15))
##these numbers are not a perfect normal curve, but its close enough for my purposes here
GRADE=sample(grade,150, replace=FALSE)
##apparently you can't set the max value for rnorm? so the SAT values might be weird
##well use runif instsead
SAT=runif(150, min=1200,max=2100)
##now we'll stick all these vectors together into a dataframe
class=data.frame(weight,height,SEX,GRADE,SAT)
##lets look at this just to make sure its okay
head(class)
##now lets play with it a bit
##call using matrix notation
##remember matrix notation in R is ROW COLUMN (RC), like the remote controlled cars. a stupid pneumonic, but if it works...
class[130,5]
plot(weight~height)
Rplot
##which looks random, which is good because the data is random
##I tried to make a mosaic or association plot with the other data, but it didnt work
plot(weight~SAT)
plot(height~SAT)
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s