R 中的数据框

R 中的数据框(Data Frame)是一个表格或二维数据结构。在数据框中,记录存储在行和列中,我们可以使用行索引和列索引来访问元素。以下是它的一些特性:

  • 数据框是变量的列表,并且必须包含相同数量的具有唯一行名的行。
  • 列名不应为空。
  • 虽然数据框通过使用 `check.names = FALSE` 支持重复的列名。但始终建议使用唯一的列名。
  • 其中存储的数据可以是字符型(Character)、数值型(Numerical)或因子型(Factors)。

在本文中,我们将向您展示如何在 R 编程中创建数据框以及如何访问列和行。我们还将介绍如何操作单个元素、行级或列级元素,以及创建命名数据框。此外,文章还将解释数据框支持的一些重要函数。

如何在 R 中创建数据框

此示例创建了一个包含不同元素的数据框,最常见的方法是:

Id <- c(1:10)
Name <- c("John", "Rob", "Ruben", "Christy","Johnson", "Miller", "Carlson", "Ruiz", "Yang","Zhu")
Occupation <- c("Professional", "Programmer","Management", "Clerical", 
                "Developer", "Programmer", "Management", "Clerical", "Developer","Programmer")
Salary <- c(80000, 70000, 90000, 50000, 60000, 75000, 92000, 68000, 55000, 82000)

employee <- data.frame(Id, Name, Occupation, Salary)
print(employee)

首先,我们创建了四个不同类型向量,然后使用这四个向量创建了数据框。

Data Frame in R Programming 1

在 R 中创建命名数据框

这展示了在此编程中创建命名数据框的步骤,其语法为:

DataFrame_Name <- data.frame(“index_Name1” = Item1, “index_Name2″ = Item2,… ,”index_NameN” = ItemN )

Id <- c(1:6)
Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu")
Occupation <- c("Professional", "Management", "Developer", "Programmer", "Clerical", "Admin")
Salary <- c(80000, 90000, 75000, 92000, 68000, 82000)

# We are assigning new names to the Columns
employee <- data.frame("Empid" = Id, "Full_Name" = Name, "Profession" = Occupation, "income" = Salary)

print(employee)

# Names function will display the Index Names of each Item 
print(names(employee))
names function 2

访问 R 数据框元素

我们可以通过多种方式访问数据框元素。这里,我们将向您展示如何使用索引位置访问元素。索引值从 1 开始,到 n 结束,其中 n 是元素的数量。

例如,如果我们声明一个存储十个元素(10 列)的数据框,索引从 1 开始,到 10 结束。要访问第一个值,请使用 `DataFrame_Name[1]`;要访问第十个值,请使用 `DataFrame_Name[10]`。

我们还可以使用双括号 `[[` 来访问数据框元素。此示例向您展示如何使用 `[[` 访问元素。它将以 R 编程向量(带有级别信息)的形式返回结果。

# Accessing Elements

Id <- c(1:6)
Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu")
Occupation <- c("Professional", "Management", "Developer", "Programmer", "Clerical", "Admin")
Salary <- c(80000, 90000, 75000, 92000, 68000, 82000)

employee <- data.frame(Id, Name, Occupation, Salary)
print(employee)

# Accessing all the Elements (Rows) Present in the Name Items (Column)
employee["Name"]

# Accessing all the Elements (Rows) Present in the 3rd Column (i.e., Occupation)
employee[3] # Index Values: 1 = Id, 2 = Name, 3 = Occupation, 4 = Salary
Access Data Frame Items 3

使用 `[[` 访问元素

我们还可以使用双括号 `[[` 来访问数据框元素。此示例展示了如何使用 `[[` 访问数据框项。它将以带有级别信息的向量形式返回结果。

# Accessing Elements

Id <- c(1:6)
Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu")
Occupation <- c("Professional", "Management", "Developer", "Programmer", "Clerical", "Admin")
Salary <- c(80000, 90000, 75000, 92000, 68000, 82000)

employee <- data.frame(Id, Name, Occupation, Salary)
print(employee)

employee[["Name"]]
employee[[3]]

它返回与上述示例相同的结果。但是,它返回的是一个向量而不是一个数据框。

Access DF Elements using [[

使用 `$` 访问 R 数据框项

我们还可以使用 `$` 符号来访问元素。在此示例中,我们将展示如何使用 `$` 访问数据框的元素。它将以带有级别信息的向量形式返回结果。其语法是:`$Column_Name`

Id <- c(1:6)
Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu")
Occupation <- c("Professional", "Management", "Developer", "Programmer", "Clerical", "Admin")
Salary <- c(80000, 90000, 75000, 92000, 68000, 82000)

employee <- data.frame(Id, Name, Occupation, Salary)
print(employee)

# Get all the Elements (Rows) Present in the Name Item (Column)
employee$Name

# Get all the Elements (Rows) Present in the Salary Item (Column)
employee$Salary
Accessing Data Frame items using $

访问低级别元素

在 R 编程中,我们可以使用索引位置来访问数据框项的低级别元素(或单个单元格)。使用此索引值,我们可以访问每个单独的项。索引值从 1 开始,到 n 结束,其中 n 是行或列中的元素数量。其语法是 `[Row_Number, Column_Number]`。

例如,如果我们声明一个包含六行元素和四列元素的数据框。要访问或修改第一个值,请使用 `DataFrame_Name[1, 1]`;要访问第二行第三列的值,请使用 `DataFrame_Name[2, 3]`;要访问第六行第四列,请使用 `DataFrame_Name[6, 4]`。

# Accessing Low level elements

Id <- c(1:6)
Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu")
Occupation <- c("Professional", "Management", "Developer", "Programmer", "Clerical", "Admin")
Salary <- c(80000, 90000, 75000, 92000, 68000, 82000)
employee <- data.frame(Id, Name, Occupation, Salary)

print(employee)
# Accessing Element at 1st Row and 2nd Column 
employee[1, 2]

# Get Element at 4th Row and 3rd Column 
employee[4, 3] 

# Get All Elements at 5th Row 
employee[5, ] 
         
# Get All Item of the 4th Column 
employee[, 4]
Accessing DF Low level elements

访问多个值

这展示了如何访问数据框的多个项。要实现这一点,我们使用 R 向量。

# Accessing Subset of elements

Id <- c(1:6)
Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu")
Occupation <- c("Professional", "Management", "Developer", "Programmer", "Clerical", "Admin")
Salary <- c(80000, 90000, 75000, 92000, 68000, 82000)
employee <- data.frame(Id, Name, Occupation, Salary)
print(employee)

# Accessing Item at 1st, 2nd Rows and 3rd, 4th Columns 
employee[c(1, 2), c(3, 4)]

# Accessing Item at 2nd, 3rd, 4th Rows and 2nd, 4th Columns 
employee[2:4, c(2, 4)] 

# getting All Item at 2nd, 3rd, 4th, 5th Rows 
employee[2:5, ] 
         
# Printing All Item of 2nd and 4th Column 
employee[c(2, 4)]
Access Multiple Items 6

使用 `$` 访问低级别元素

使用 `$` 符号,我们还可以使用 `$` 符号以低级别(单个单元格)访问数据框的元素。让我们看看如何使用 `$` 访问单个单元格。它将以带有级别信息的向量形式返回结果。

# Accessing elements

Id <- c(1:6)
Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu")
Occupation <- c("Professional", "Management", "Developer", 
                "Programmer", "Clerical", "Admin")
Salary <- c(80000, 90000, 75000, 92000, 68000, 82000)
employee <- data.frame(Id, Name, Occupation, Salary)
print(employee)

# Accessing Item at 2nd, 4th Rows of Name Columns 
employee$Name[c(2, 4)] 

# getting Item at 2nd, 3rd, 4th, 5th Rows of Occupation Column 
employee$Occupation[2:5] 
Access Data Frame Elements at Lower Level using $ 7

修改 R 数据框元素

我们可以使用索引位置来访问元素和提取数据。使用此索引值,我们可以修改或更改每个单独的元素。在此,我们将修改特定单元格的值和整个列项。

# Modifying elements

Id <- c(1:6)
Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu")
Salary <- c(80000, 90000, 75000, 92000, 68000, 82000)

employee <- data.frame(Id, Name, Salary)
print(employee)

# Modifying Item at 2nd Row and 3rd Column 
employee[2, 3] <- 100000
print(employee)

#  Modifying All Item of 1st Column 
employee[, 1] <- c(10:15)
print(employee)
Modify DF Elements 8

添加元素

此示例向现有数据框添加新元素。

  • `cbind(DataFrame, Values)`:`cbind` 函数会添加具有值的额外列。我们通常偏好使用向量作为 `values` 参数。
  • `rbind(DataFrame, Values)`:`rbind` 函数会添加具有值的额外行。
# Adding elements

Id <- c(1:6)
Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu")
Salary <- c(80000, 90000, 75000, 92000, 68000, 82000)

employee <- data.frame(Id, Name, Salary, stringsAsFactors=FALSE)
print(employee)

# Adding Extra Row 
rbind(employee, list(7, "Gateway", 105505))

# Adding Extra Column 
Occupation <- c("Management", "Developer", "User", "Programmer", "Clerical", "Admin")
cbind(employee, Occupation)
Add element to DF using cbind and rbind 9

R 数据框的重要函数

以下数据框函数是最有用的函数。

  • `typeof(DataFrame)`:返回数据类型。由于它是一种列表(list),因此返回 `list`。
  • `class(DataFrame)`:返回其类。
  • `length(DataFrame)`:计算其中项目的数量(列数)。
  • `nrow(DataFrame)`:返回存在的总行数。
  • `ncol(DataFrame)`:返回总列数。
  • `dim(DataFrame)`:返回存在的总行数和列数。
# Important Functions

Id <- c(1:10)
Name <- c("John", "Rob", "Ruben", "Christy","Johnson", "Miller", "Carlson", "Ruiz", "Yang","Zhu")
Occupation <- c("Professional", "Programmer","Management", "Clerical", 
                "Developer", "Programmer", "Management", "Clerical", "Developer","Programmer")
Salary <- c(80000, 70000, 90000, 50000, 60000, 75000, 92000, 68000, 55000, 82000)

#employee <- data.frame("empid" = Id, "name" = Name, "Profession" = Occupation, "income" = Salary)
employee <- data.frame(Id, Name, Occupation, Salary)
print(employee)

typeof(employee)
class(employee)
names(employee)

# Number of Rows and Columns
length(employee)
ncol(employee)
nrow(employee)
dim(employee)
typeof, class, length, rnow, ncol, and dim functions of Data Frame 10

R 数据框的 Head 和 Tail 函数

如果您的记录太多,并且您想提取表现最佳的记录,那么您可以使用这些数据框函数。

  • `head(DataFrame, limit)`:返回前六个元素(如果您省略 `limit`)。例如,如果您将 `limit` 指定为 2,它将返回前两条记录。这有点像选择前 10 条记录。
  • `tail(DataFrame, limit)`:返回最后六个元素(如果您省略 `limit`)。例如,如果您将 `limit` 指定为 4,它将返回最后四条记录。
# Head and Tail Function

Id <- c(1:10)
Name <- c("John", "Rob", "Ruben", "Christy", "Johnson", "Miller", "Carlson", "Ruiz", "Yang","Zhu")
Occupation <- c("Professional", "Programmer","Management", "Clerical", "Developer", "Programmer", 
                "Management", "Clerical", "Developer","Programmer")
Salary <- c(80000, 70000, 90000, 50000, 60000, 75000, 92000, 68000, 55000, 82000)

employee <- data.frame(Id, Name, Occupation, Salary)
print(employee)

# No limit - It means Displaying First Six Records 
head(employee)

# Limit is 4 - It means Displaying First Four Records 
head(employee, 4)

# No limit - It means Displaying Last Six Records 
tail(employee)

# Limit is 4 - It means Displaying Last Six Records 
tail(employee, 4)
Data Frame Head and Trail Functions 11

R 数据框特殊函数

以下两个是非常有用的函数。在开始操作或插入新记录之前,最好检查结构。

  • `str(DataFrame)`:返回其结构。
  • `summary(DataFrame)`:它返回数据的性质以及最小值、中位数、平均值、中位数等统计摘要。
# Important Functions

Id <- c(1:10)
Name <- c("John", "Rob", "Ruben", "Christy","Johnson", "Miller", "Carlson", "Ruiz", "Yang","Zhu")
Occupation <- c("Professional", "Programmer","Management", "Clerical", 
                "Developer", "Programmer", "Management", "Clerical", "Developer","Programmer")
Salary <- c(80000, 70000, 90000, 50000, 60000, 75000, 92000, 68000, 55000, 82000)
#employee <- data.frame("empid" = Id, "name" = Name, "Profession" = Occupation, "income" = Salary)
employee <- data.frame(Id, Name, Occupation, Salary)
print(employee)

print(str(employee))
print(summary(employee))
Special Functions of Data Frame 12