How to apply a function to a matrix/tibble
Scenario: we got a table of id-value, and a matrix/tibble that contains the id, and we need the labels.
It may be useful when predicting the Key (or Ids) of in a classification model (like in Keras), and we need the labels as the final output.
There are two interesting things:
- The usage of apply based on column and rows at the same time.
- The creation of an empty tibble and how to fill it (append columns)
How to apply a function to a matrix/tibble
Scenario: we got a table of id-value, and a matrix/tibble that contains the id, and we need the labels.
It may be useful when predicting the Key (or Ids) in a classification model (like in Keras), and we need the labels as the final output.
There are two interesting things:
- The usage of apply based on column and rows at the same time.
- The creation of an empty tibble and how to fill it (append columns)
library(tidyverse)
# mapping table (id-value)
map_table=tibble(id=c(1,2,3),
value=c("a", "b", "c")
)
map_table
## # A tibble: 3 x 2
## id value
## <dbl> <chr>
## 1 1 a
## 2 2 b
## 3 3 c
# given a key, retrun the label
get_label <- function(x)
{
res=filter(map_table, id==x)$value
return(res)
}
# the data to get the label
X_data=tibble(v1=c(1,2,3),
v2=c(2,2,2),
v3=c(3,2,1)
)
X_data
## # A tibble: 3 x 3
## v1 v2 v3
## <dbl> <dbl> <dbl>
## 1 1 2 3
## 2 2 2 2
## 3 3 2 1
Option 1: as matrix
mat_res=apply(X_data, 1:2, get_label)
## Checking...
mat_res
## v1 v2 v3
## [1,] "a" "b" "c"
## [2,] "b" "b" "b"
## [3,] "c" "b" "a"
Option 2: as tibble (using 'for')
# creating a 1 column with NAs same length as nrow(X_data)
tib_res=tibble(V1=rep(NA, nrow(X_data)))
for(i in 1:ncol(X_data))
{
vec=X_data[,i]
vec_lbl=sapply(t(vec), get_label) # if X_data is a matrid, no need to transpose with t()
tib_res[,i]=vec_lbl
}
## Checking...
tib_res
## # A tibble: 3 x 3
## V1 V2 V3
## <chr> <chr> <chr>
## 1 a b c
## 2 b b b
## 3 c b a
Option 3: as tibble (using 'mutate_all')
tib_res_2=mutate_all(X_data, .funs = get_label)
tib_res_2
## # A tibble: 3 x 3
## v1 v2 v3
## <chr> <chr> <chr>
## 1 a b b
## 2 b b b
## 3 c b b
Finally...
Option 2, to my surprise, is faster than the option 1.
I didn't use the add_column
because of the need of replacing the first dummy NA
column.
Other approaches may include dictionaries.
Any improvement in the code is welcome.
Thanks for reading 🚀
Blog | Linkedin | Twitter | 📗 Data Science Live Book