R read csv 函数 - 编程之门

R 的 read.csv 函数非常方便，可以从文件系统和 URL 导入 CSV 文件，并将数据存储在数据框（Data Frame）中。在本文中，我们将向您展示如何使用此 read CSV 函数，以及如何在编程中处理 CSV 数据，并附带一个示例。

R 读取 CSV 语法

在 R 编程中读取 CSV 文件数据的基本语法如下所示。

read.csv(file, header = , sep = , quote = )

read.csv 支持许多参数。以下是在此编程语言函数中实际使用 read CSV 时一些最有用的参数

file: 您必须指定文件名或完整路径以及文件名。您还可以使用外部（在线）文件的 URL。例如，sample.csv 或“C:/Users/ Suresh/ Documents/sample.csv”
header true: 如果文本文件包含列名作为第一行，请将 header 参数指定为 TRUE，否则指定为 FALSE
sep: 它是分隔符的缩写。您必须指定分隔字段的字符。“,” 表示数据由逗号分隔
quote: 如果您的字符值（如 FirstName、Education 列等）用引号括起来，那么您必须指定引号类型。对于双引号，我们在 R read.csv 函数中使用 quote = “\””
as.is: 请指定一个布尔向量，其长度与列数相同。此参数将根据布尔值将字符值转换为因子。例如，如果我们有两个列（FirstName、Sales），那么我们可以使用 as.is = c(TRUE, FALSE)，这将使字符 FirstName 保持为字符（而不是隐式因子）
nrows: 这是一个整数值。您可以使用此参数限制行数。例如，如果您想要前 5 条记录，请使用 nrows = 5
skip: 请指定在开始读取之前要从文件中跳过的行数。例如，如果您想跳过前 2 条记录，请使用 skip = 2
strip.white: 当 sep 参数不等于“”时，您可以使用此布尔值来修剪字符字段中多余的前导和尾随空格。
comment.char: 如果您的文件中存在任何注释行，则可以使用此 R Read CSV 参数来忽略这些行。您已经描述了用于注释行的单个特殊字符。例如，如果您的数据包含以 $ 开头的注释，则使用 comment.char = “$” 来跳过此注释行进行读取。
stringsAsFactors: 一个布尔值，指示 CSV 文件中存在的文本字段是否应转换为因子。

下面的屏幕截图将显示我们 employee 文件中的数据，我们将使用这些文本来演示 R read.csv 函数。如您所见，它有列名，14 行和 7 列。

如果您想使用相同的数据，请复制以下数据粘贴到记事本中，并将其保存为 employee

FirstName,LastName,Education,Occupation,YearlyIncome,Sales,HireDate
John,Yang,Bachelors,Professional,90000,3578.27,28-01-06
Rob,Johnson,Bachelors,Management,80000,3399.99,29-12-10
Ruben,Torres,Partial College,Skilled Manual,50000,699.0982,29-12-11
Christy,Zhu,Bachelors,Professional,80000,3078.27,28-12-12
Rob,Huang,High School,Skilled Manual,60000,2319.99,22-09-08
John,Ruiz,Bachelors,Professional,70000,539.99,06-07-09
John,Miller,Masters Degree,Management,80000,2320.49,12-08-09
Christy,Mehta,Partial High School,Clerical,50000,24.99,05-07-07
Rob,Verhoff,Partial High School,Clerical,45000,24.99,15-09-13
Christy,Carlson,Graduate Degree,Management,70000,2234.99,25-01-14
Gail,Erickson,Education,Professional,90000,4319.99,02-10-06
Barry,Johnson,Education,Management,80000,4968.59,15-05-14
Peter,Krebs,Graduate Degree,Clerical,50000,59.53,14-01-13
Greg,Alderson,Partial High School,Clerical,45000,23.5,05-07-13

从当前工作目录读取 R CSV 文件

在此示例中，我们将向您展示如何在当前工作目录中读取 CSV（逗号分隔值）文件中的数据，在该编程语言中。

# From Current Working Directory

# Locate the Current Working Directory
getwd()

employee <- read.csv("Employee.csv", TRUE, sep = ",")

print(employee)

从自定义目录读取 R CSV 文件

在此示例中，我们将向您展示如何读取位于自定义目录中的文件中的数据。

getwd(): 此方法将返回当前工作目录。通常，它是您的“Documents”文件夹
setwd(“系统地址”): setwd 函数可以帮助我们根据您的要求更改当前目录
list.files(): 它将显示该目录中存在的文件列表

# From Optional Working Directory

# Locate the Current Working Directory
getwd()

setwd("R Programs") # Or use Full path C:/Users/Suresh/Documents 
list.files()
getwd()

employee <- read.csv("Employee.csv", TRUE, sep = ",")
print(employee)

访问 CSV 文件数据

在 R 编程中，read.csv 函数会自动将数据转换为数据框。因此，数据框支持的所有函数都可以用于 CSV 数据。请参阅数据框文章以了解函数的说明。

# Accessing Data
 
# Locate the Current Working Directory
getwd()
employee <- read.csv("Employee.csv", TRUE, sep = ",")
print(employee)

# Accessing all the Elements (Rows) Present in the 3rd Column (i.e., Occupation)
Index Values: 1 = FirstNmae, 2 = LastName, 3 = Education, 4 = Occupation, 4 = Yearly Income 5 = Salary, and 6 = HireDate
employee[[5]] 

# Accessing all the Elements (Rows) Present in the Occupation Item (Column)
employee$Occupation

# Accessing Element at 4th Row and 3rd Column 
employee[4, 3] 

# Accessing Item at 1st, 2nd 4th Rows and 4th, 5th, 6th, 7th Columns 
employee[c(1, 2, 4), c(4:7)]

常用函数

当我们在 R 编程中处理或读取 CSV 文件数据时，以下函数是常用函数。

max 方法将返回列中的最大值
min 方法将返回列中的最小值
subset(data, condition): 此方法将返回数据的子集，数据取决于条件

# Common Functions 
# Locate the Current Working Directory
getwd()
employee <- read.csv("Employee.csv", TRUE, sep = ",")
print(employee)

# It returns the Maximum Value within the Yearly Income Column
maximum.salary <- max(employee$YearlyIncome)
print(maximum.salary)

# It returns the Minimum Value within the Sales Column
minimum.sales <- min(employee$Sales)
print(minimum.sales)

# It will calculate and returns the Sales Column Mean Value
mean.sales <- mean(employee$Sales)
print(mean.sales)

# It returns all the records, whose Education is equal to Bachelors
subdata <- subset(employee, Education == "Bachelors")
print(subdata)

# It returns all the records, whose Education is equal to Bachelors and Yearly Income > 70000
partialdata <- subset(employee, Education == "Bachelors" & YearlyIncome > 70000)
print(partialdata)

重要的 R 读取 CSV 函数

以下函数是在此编程中读取 CSV 文件时一些最有用的函数。

typeof 方法将告诉您变量的类型。由于数据框是一种列表，此函数将返回一个列表
class 方法将告诉您 CSV 文件中数据的类
length 方法将计算项目（列）的数量
nrow 方法将返回存在的总行数。
ncol 方法将返回可用的总列数。
dim 方法将返回总行数和列数。

# Important Functions

# Locate the Current Working Directory
getwd()

employee <- read.csv("Employee.csv", TRUE, sep = ",")
print(employee)

typeof(employee)
class(employee)
names(employee)

length(employee)
nrow(employee)
ncol(employee)
dim(employee)

Head 和 Tail 函数

在 R 编程中，以下函数是处理外部数据（读取 CSV 文件）非常有用的函数。如果您的 CSV 文件很大，并且您想提取表现最佳的记录（前 20 条记录），那么您可以使用这些函数

head(Data, limit): 此方法将返回前六个元素（如果您省略 limit）。如果您将 limit 指定为 3，则它将返回前三条记录。这有点像选择前 20 条记录。
tail(Data, limit): 此方法将返回最后六个元素（如果您省略 limit）。如果您将 limit 指定为 4，则它将返回最后四条记录。这有点像选择最后 10 条记录。

# head and Tail

# Locate the Current Working Directory
getwd()

employee <- read.csv("Employee.csv", TRUE, sep = ",")
print(employee)

# No limit - It will Display Top Six Records 
head(employee)

# Limit is 4 - It will Display Top Four Records
head(employee, 4)

# No limit - It will Display Bottom Six Records 
tail(employee)

# Limit is 3 - It will Display Bottom Three Records
tail(employee, 3)

R 读取 CSV 的特殊函数

以下两个函数是编程在读取 CSV 文件时支持的非常实用的函数。在开始操作或插入新记录之前，检查外部数据的结构总是一个好主意

str(Data): 此方法将返回 CSV 文件中存在的记录的结构。
summary(Data Frame): 此方法将返回外部数据源的性质以及统计摘要，例如最小值、中位数、平均值、中位数等。

# str and summary functions

# Locate the Current Working Directory
getwd()

employee <- read.csv("Employee.csv", TRUE, sep = ",")
print(employee)

print(str(employee))
print(summary(employee))

Read CSV 函数中的 StringsAsFactor

如果您的 CSV 文件包含字符和数字变量，则字符变量会自动转换为因子类型。为了防止这种自动转换，我们必须显式指定 stringsAsFactors = FALSE。

# Factors to String

# Locate the Current Working Directory
getwd()

employee <- read.csv("Employee.csv", TRUE, sep = ",", stringsAsFactors = FALSE)
print(employee)

str(employee)

如果您观察下面的屏幕截图，它返回 FirstName 为 char，而不是 Factor 类型。