For a full list of data sets included in the base datasets
R package, use: data()
or library(help="datasets")
.
You can load a built-in data set like this:
data("HairEyeColor")
There is documentation available for each one.
?HairEyeColor
Now let’s examine the data.
HairEyeColor
## , , Sex = Male
##
## Eye
## Hair Brown Blue Hazel Green
## Black 32 11 10 3
## Brown 53 50 25 15
## Red 10 10 7 7
## Blond 3 30 5 8
##
## , , Sex = Female
##
## Eye
## Hair Brown Blue Hazel Green
## Black 36 9 5 2
## Brown 66 34 29 14
## Red 16 7 7 7
## Blond 4 64 5 8
Whoa, this isn’t a data frame. What is it?
class(HairEyeColor)
## [1] "table"
It’s a three-way contingency table. This makes it easier to look at and is suitable for some analyses. But we can coerce it to a data frame, which will make it easier for us to work with:
hairEyeColor = as.data.frame(HairEyeColor)
hairEyeColor
## Hair Eye Sex Freq
## 1 Black Brown Male 32
## 2 Brown Brown Male 53
## 3 Red Brown Male 10
## 4 Blond Brown Male 3
## 5 Black Blue Male 11
## 6 Brown Blue Male 50
## 7 Red Blue Male 10
## 8 Blond Blue Male 30
## 9 Black Hazel Male 10
## 10 Brown Hazel Male 25
## 11 Red Hazel Male 7
## 12 Blond Hazel Male 5
## 13 Black Green Male 3
## 14 Brown Green Male 15
## 15 Red Green Male 7
## 16 Blond Green Male 8
## 17 Black Brown Female 36
## 18 Brown Brown Female 66
## 19 Red Brown Female 16
## 20 Blond Brown Female 4
## 21 Black Blue Female 9
## 22 Brown Blue Female 34
## 23 Red Blue Female 7
## 24 Blond Blue Female 64
## 25 Black Hazel Female 5
## 26 Brown Hazel Female 29
## 27 Red Hazel Female 7
## 28 Blond Hazel Female 5
## 29 Black Green Female 2
## 30 Brown Green Female 14
## 31 Red Green Female 7
## 32 Blond Green Female 8
Much better. And since we renamed the data frame (to the “mixedCase” style), we can remove the old object:
rm(HairEyeColor)
Visit: https://osf.io/s7d9d/
Read the description of the data set. The source of the data is:
You can also click “Codebook for Gombe data personality variables.pdf” in the “Files” box of the OSF page to learn about how the data are coded.
In the “Files” box, click “gombe_128.csv” and then the “Download” button on the top right of the following page to download it. Place it in your class /data
directory (or wherever you are placing your raw data for the course).
You can open the file (in Excel or a text editor) to see how it is formatted.
gombe = read.csv(file="./data/gombe_128.csv", header=TRUE)
head(gombe)
## chimpcode sex kasekela dom sol impl symp stbl
## 1 E131 0 0.1428571 2.428571 3.857143 3.000000 5.571429 4.285714
## 2 P70 1 1.0000000 4.666667 3.333333 4.333333 4.666667 4.000000
## 3 G74 1 0.0000000 3.333333 3.166667 3.500000 5.500000 5.166667
## 4 A364 0 0.0000000 1.666667 1.333333 2.000000 2.666667 4.666667
## 5 B89 0 1.0000000 3.000000 4.666667 3.000000 4.333333 2.666667
## 6 G19 1 1.0000000 4.000000 2.666667 2.666667 3.333333 4.000000
## invt depd soc thotl help exct inqs decs
## 1 4.142857 4.285714 4.571429 1.857143 5.000000 3.714286 3.285714 4.571429
## 2 2.666667 4.666667 4.333333 2.333333 6.333333 4.000000 3.666667 6.666667
## 3 4.166667 5.666667 5.666667 2.833333 5.500000 3.666667 3.666667 4.833333
## 4 3.333333 2.666667 5.333333 2.000000 3.666667 3.333333 4.000000 4.333333
## 5 3.000000 5.000000 6.000000 3.000000 4.666667 3.000000 3.333333 4.000000
## 6 2.333333 5.000000 6.333333 3.000000 5.666667 2.666667 3.333333 4.644833
## indv reckl sens unem cur vuln actv pred
## 1 3.142857 2.000000 4.571429 2.714286 3.142857 3.000000 5.000000 3.428571
## 2 4.000000 4.333333 6.000000 2.666667 3.333333 4.666667 4.333333 5.333333
## 3 4.000000 3.000000 4.666667 3.500000 3.000000 3.666667 4.500000 3.166667
## 4 3.666667 2.333333 4.666667 2.666667 3.000000 3.000000 4.333333 4.000000
## 5 3.000000 3.000000 3.333333 2.666667 3.666667 4.000000 2.666667 3.666667
## 6 4.000000 4.000000 2.000000 3.333333 4.666667 5.000000 4.333333 4.000000
## conv cool innov dominance extraversion conscientiousness
## 1 4.285714 5.285714 4.000000 3.571429 4.642857 4.809524
## 2 5.333333 3.666667 4.333333 4.888889 4.333333 4.222222
## 3 3.333333 4.833333 4.666667 3.500000 4.750000 4.222222
## 4 3.000000 4.333333 4.666667 3.777778 5.166667 5.222222
## 5 3.333333 5.333333 5.333333 3.333333 4.250000 4.555556
## 6 3.666667 4.333333 4.000000 3.881611 5.000000 4.444444
## agreeableness neuroticism openness
## 1 5.047619 3.714286 3.642857
## 2 5.666667 4.000000 3.500000
## 3 5.222222 3.250000 3.875000
## 4 3.666667 3.333333 3.750000
## 5 4.111111 4.166667 3.833333
## 6 3.666667 3.333333 3.583333
Visit: http://www.randomservices.org/random/data/HorseKicks.html
Read the description of the data set.
At the bottom of the page, click the highlighted text “Horse-kick data” to download it. Place it in your class /data
directory (or wherever you are placing your raw data for the course).
You can open the file to see how it is formatted.
horseKicks = read.table(file="./data/HorseKicks.txt", header=TRUE, sep="\t")
horseKicks
## Year GC C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C14 C15
## 1 1875 0 0 0 0 0 0 0 1 1 0 0 0 1 0
## 2 1876 2 0 0 0 1 0 0 0 0 0 0 0 1 1
## 3 1877 2 0 0 0 0 0 1 1 0 0 1 0 2 0
## 4 1878 1 2 2 1 1 0 0 0 0 0 1 0 1 0
## 5 1879 0 0 0 1 1 2 2 0 1 0 0 2 1 0
## 6 1880 0 3 2 1 1 1 0 0 0 2 1 4 3 0
## 7 1881 1 0 0 2 1 0 0 1 0 1 0 0 0 0
## 8 1882 1 2 0 0 0 0 1 0 1 1 2 1 4 1
## 9 1883 0 0 1 2 0 1 2 1 0 1 0 3 0 0
## 10 1884 3 0 1 0 0 0 0 1 0 0 2 0 1 1
## 11 1885 0 0 0 0 0 0 1 0 0 2 0 1 0 1
## 12 1886 2 1 0 0 1 1 1 0 0 1 0 1 3 0
## 13 1887 1 1 2 1 0 0 3 2 1 1 0 1 2 0
## 14 1888 0 1 1 0 0 1 1 0 0 0 0 1 1 0
## 15 1889 0 0 1 1 0 1 1 0 0 1 2 2 0 2
## 16 1890 1 2 0 2 0 1 1 2 0 2 1 1 2 2
## 17 1891 0 0 0 1 1 1 0 1 1 0 3 3 1 0
## 18 1892 1 3 2 0 1 1 3 0 1 1 0 1 1 0
## 19 1893 0 1 0 0 0 1 0 2 0 0 1 3 0 0
## 20 1894 1 0 0 0 0 0 0 0 1 0 1 1 0 0
Visit: https://royalsocietypublishing.org/doi/suppl/10.1098/rsos.150645
You can click “View Full Text” on the left side to read the article (or just the abstract) to learn about the data set.
Under the “Supplemental Material” heading, click the highlighted text “rsos150645supp1.xlsx” to download it. Place it in your class /data
directory.
You can open the file to see how it is formatted.
install.packages("tidyverse")
# OR
install.packages("readxl")
library(readxl)
Documentation is available at: https://readxl.tidyverse.org/
folktales = read_xlsx(path="./data/rsos150645supp1.xlsx",
sheet=1, range="A2:JP52")
folktales
## # A tibble: 50 x 276
## X__1 `300` `300A` `301` `301D` `302` `302B` `302C*` `303` `303A` `304`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Ital~ 1 0 1 0 1 0 0 1 0 0
## 2 Ladin 1 0 1 0 1 0 0 1 0 1
## 3 Sard~ 1 0 1 0 1 0 0 1 0 0
## 4 Wall~ 1 0 1 0 0 0 0 1 0 0
## 5 Fren~ 1 0 1 0 1 0 0 1 1 1
## 6 Span~ 1 0 1 0 1 0 0 1 0 1
## 7 Port~ 1 0 1 0 1 0 0 1 0 1
## 8 Cata~ 1 0 1 0 1 0 0 1 0 0
## 9 Roma~ 1 1 1 0 1 0 1 1 1 1
## 10 Welsh 0 0 0 0 0 0 0 0 0 0
## # ... with 40 more rows, and 265 more variables: `305` <dbl>, `306` <dbl>,
## # `307` <dbl>, `310` <dbl>, `311` <dbl>, `311B*` <dbl>, `312` <dbl>,
## # `312A` <dbl>, `312 C` <dbl>, `312D` <dbl>, `313` <dbl>, `313E*` <dbl>,
## # `314` <dbl>, `314A` <dbl>, `314A*` <dbl>, `315` <dbl>, `315 A` <dbl>,
## # `316` <dbl>, `317` <dbl>, `318` <dbl>, `321` <dbl>, `322*` <dbl>,
## # `325` <dbl>, `325*` <dbl>, `325**` <dbl>, `326` <dbl>, `326A*` <dbl>,
## # `326B*` <dbl>, `327` <dbl>, `327 A` <dbl>, `327B` <dbl>, `327C` <dbl>,
## # `327D` <dbl>, `327F` <dbl>, `327G` <dbl>, `328` <dbl>, `328A` <dbl>,
## # `328*` <dbl>, `328A*` <dbl>, `329` <dbl>, `330` <dbl>, `331` <dbl>,
## # `332` <dbl>, `332C*` <dbl>, `333` <dbl>, `334` <dbl>, `335` <dbl>,
## # `360` <dbl>, `361` <dbl>, `361*` <dbl>, `362*` <dbl>, `363` <dbl>,
## # `365` <dbl>, `366` <dbl>, `368C*` <dbl>, `369` <dbl>, `400` <dbl>,
## # `401A*` <dbl>, `402` <dbl>, `402*` <dbl>, `402A*` <dbl>, `403` <dbl>,
## # `403C` <dbl>, `404` <dbl>, `405` <dbl>, `406` <dbl>, `407` <dbl>,
## # `408` <dbl>, `409` <dbl>, `409A` <dbl>, `409A*` <dbl>, `409B*` <dbl>,
## # `410` <dbl>, `410*` <dbl>, `411` <dbl>, `412` <dbl>, `413` <dbl>,
## # `425` <dbl>, `425A` <dbl>, `425B` <dbl>, `425C` <dbl>, `425D` <dbl>,
## # `425E` <dbl>, `425M` <dbl>, `425*` <dbl>, `426` <dbl>, `430` <dbl>,
## # `431` <dbl>, `432` <dbl>, `433B` <dbl>, `434` <dbl>, `434*` <dbl>,
## # `440` <dbl>, `441` <dbl>, `442` <dbl>, `444*` <dbl>, `449` <dbl>,
## # `450` <dbl>, `451` <dbl>, `452B*` <dbl>, ...
folktales = as.data.frame(folktales)
folktales[1:5,1:15]
## X__1 300 300A 301 301D 302 302B 302C* 303 303A 304 305 306 307 310
## 1 Italian 1 0 1 0 1 0 0 1 0 0 0 1 1 1
## 2 Ladin 1 0 1 0 1 0 0 1 0 1 0 1 1 0
## 3 Sardinian 1 0 1 0 1 0 0 1 0 0 0 0 1 1
## 4 Walloon 1 0 1 0 0 0 0 1 0 0 0 0 1 0
## 5 French 1 0 1 0 1 0 0 1 1 1 0 1 1 1
colnames(folktales)[1] = "society"
folktales[1:5,1:15]
## society 300 300A 301 301D 302 302B 302C* 303 303A 304 305 306 307 310
## 1 Italian 1 0 1 0 1 0 0 1 0 0 0 1 1 1
## 2 Ladin 1 0 1 0 1 0 0 1 0 1 0 1 1 0
## 3 Sardinian 1 0 1 0 1 0 0 1 0 0 0 0 1 1
## 4 Walloon 1 0 1 0 0 0 0 1 0 0 0 0 1 0
## 5 French 1 0 1 0 1 0 0 1 1 1 0 1 1 1
folktales$society
## [1] "Italian" "Ladin" "Sardinian" "Walloon"
## [5] "French" "Spanish" "Portuguese" "Catalan"
## [9] "Romanian" "Welsh" "Irish" "Scottish"
## [13] "Luxembourgish" "German" "Austrian" "Flemish"
## [17] "Dutch" "Frisian" "English" "Swedish"
## [21] "Norwegian" "Danish" "Faroese" "Icelandic"
## [25] "Czech" "Slovak" "Lusatian" "Polish"
## [29] "Byelorussian" "Ukrainian" "Russian" "Bulgarian"
## [33] "Macedonian" "Serbian" "Croation" "Slovenenian"
## [37] "Latvian" "Lithuanian" "Pakistani" "Indian"
## [41] "Nepali" "Gypsy" "Tadzhik" "Iranian"
## [45] "Kurdish" "Afghan" "Ossetian" "Albanian"
## [49] "Greek" "Armenian"