4 Reading in data

R can import data from (but not only):
- Excel
- SAS
- SPSS
- STATA
- …
Typically, is easier to install a package to read in data. For instance, the package Hmisc contains functions that handles import from SPSS.

4.1 Reading data from user input

Frequently, data is collected in white space separated columns, where the first line indicate the variable name (try to use the scan() function):

> # Read 7 items for each x and y:
> # x <- scan()
> # y <- scan()

4.2 The R working directory

R has a search path, the R working directory, where it stores its workspace and look for files. You can locate the working directory with the “get working directory” command:

> getwd()
> # will show your own working directory

The working directory can be changed with the “set working directory” command:

> setwd("c:/otherdir")

For files stored in the working directory or subfolders, you can just specify the path from the working directory when reading them. Example: if the data is located in the “Data” folder in your working directory, write:

> mydat<-read.table("Data/filename.xyz")

4.3 Reading data from a text file (*.dat)

The function read.table() is designed to read this format:

> mydat <- read.table("InFiles/filename.dat", header = TRUE)
> # mydat <- read.table("use your own working directory", header = TRUE)
> mydat
##   x1  x2   x3
## 1  2 0.3 0.01
## 2  2 1.0 0.11
## 3  3 2.1 0.04
## 4  3 2.2 0.02
## 5  1 0.1 0.10
## 6  1 0.2 0.60
> str(mydat)
## 'data.frame':    6 obs. of  3 variables:
##  $ x1: int  2 2 3 3 1 1
##  $ x2: num  0.3 1 2.1 2.2 0.1 0.2
##  $ x3: num  0.01 0.11 0.04 0.02 0.1 0.6

The read.table() function has a lot of optional arguments:

> args(read.table)
## function (file, header = FALSE, sep = "", quote = "\"'", dec = ".", 
##     numerals = c("allow.loss", "warn.loss", "no.loss"), row.names, 
##     col.names, as.is = !stringsAsFactors, tryLogical = TRUE, 
##     na.strings = "NA", colClasses = NA, nrows = -1, skip = 0, 
##     check.names = TRUE, fill = !blank.lines.skip, strip.white = FALSE, 
##     blank.lines.skip = TRUE, comment.char = "#", allowEscapes = FALSE, 
##     flush = FALSE, stringsAsFactors = FALSE, fileEncoding = "", 
##     encoding = "unknown", text, skipNul = FALSE) 
## NULL

Some of the important ones are:
- header: is the first line variable names or not?
- sep: what character is used to separate the columns?
- dec: what character is used as decimal separator?
- nrows: how many rows do we want to read?
- na.strings: what string represent a missing value?
- skip: how many lines to skip before start reading?
- comment.char: what character in the beginning of a line should indicate that the line should be skipped?
Consider the data file testdat1.dat and open it with Excel. Read the file with attention.
Consider now the same file imported to R:

> dat<-read.table("InFiles/testdat1.dat", header=TRUE, skip=5, nrow=2)
> dat
##   a b c
## 1 1 2 3
## 2 4 5 6

Consider the data file testdat2.dat and open it with Excel. Read the file with attention.
Consider now the same file imported to R:

> dat2<-read.table("InFiles/testdat2.dat", header=TRUE, na.strings=".", comment.char=";", dec=",")
> dat2
##   A   B  C
## 1 1 2.0  3
## 2 4 3.2  2
## 3 1 5.0 NA
## 4 5 4.0  6

4.4 Variants of `read.table()`

Other functions which are useful for reading data frames from files are:
- read.csv(): comma separated, dot as decimal point
- read.csv2(): semicolon separated, comma as decimal point
- read.fwf(): fixed width format
read.csv() and read.csv2() are adapted to Excel tables saved as csv files. Which one you need to use depends on your system’s regional settings.
Additional arguments are similar to those of read.table().
Eg:

> read.csv("file", header=TRUE, sep=";", dec=",")

4.5 Reading text files from Excel

Install package readxl and use the read_excel function.

> library(readxl)
> my_excel_data <- read_excel("InFiles/Babies.xlsx", sheet= "Babies")
> my_excel_data <- as.data.frame(my_excel_data)
> my_excel_data
##     Age Male babies Female babies
## 1  1841       40.15         42.25
## 2  1842       40.07         41.92
## 3  1843       40.59         42.62
## 4  1844       40.20         42.28
## 5  1845       41.12         43.22
## 6  1846       39.08         41.22
## 7  1847       37.46         39.57
## 8  1848       38.81         40.97
## 9  1849       36.82         38.54
## 10 1850       41.83         43.71
## 11 1851       39.97         41.93
## 12 1852       39.35         41.47
## 13 1853       38.86         41.11
## 14 1854       38.52         40.44
## 15 1855       39.62         41.86
## 16 1856       41.36         43.58
## 17 1857       39.99         41.86
## 18 1858       38.59         40.50
## 19 1859       39.44         41.37
## 20 1860       40.76         43.13
## 21 1861       40.45         42.79
## 22 1862       40.94         43.32
## 23 1863       39.19         41.55
## 24 1864       38.37         40.80
## 25 1865       38.40         41.10
## 26 1866       38.77         41.40
## 27 1867       40.54         43.45
## 28 1868       40.37         43.03
## 29 1869       39.94         42.74
## 30 1870       39.16         42.05
## 31 1871       39.62         42.65
## 32 1872       41.17         44.27
## 33 1873       41.66         44.92
## 34 1874       40.62         43.59
## 35 1875       39.91         42.99
## 36 1876       40.96         44.37
## 37 1877       41.93         45.45
## 38 1878       40.45         43.63
## 39 1879       41.88         45.13

Other possibility:

> my_excel_data <- read_excel(file.choose())
> 
> # Try this one using the babies file!

4 Reading in data

4.1 Reading data from user input

4.2 The R working directory

4.3 Reading data from a text file (*.dat)

4.4 Variants of read.table()

4.5 Reading text files from Excel

4.4 Variants of `read.table()`