map a custom function with two input columns within summarize or mutate #189
-
Hi all, I have a question as to wether this operation in R is possible using tidypolars / polars. I will preface this discussion by saying it is possible that I have misinterpreted the how to use polar expressions in tidypolars. Below I have created a reproducable dataset (in R) and part of the function I am looking to translate into python. I have the function written in python already and it evaluates properly if the input is a two column tibble. My issue is that I can't see a way to call two columns as an input to one function within .mutate() or .summarize(). Can one do this? library(tidyverse)
#function definition
get_spec = function(x){
x$original = resid(lm(x$original~x$time))
x$original = x$original-median(x$original)
original = x$original/mad(x$original,center=0)
return(original)
}
#representative data
time=rep(seq(1, 50, by = 1/10),2)
series=c(rep(1,length(time)/2),rep(0.5,length(time)/2))
original=(sin(time*series)+(time*series))
data=tibble(series=factor(series),time,original) `
#apply function
data=data%>%
group_by(series)%>%
mutate(residual=get_spec(tibble(time,original))) %>%
pivot_longer(c(original,residual)) %>%
arrange(series,name,time)
#plot
ggplot(data,aes(x=time,y=value,col=series,group=series)) +
geom_line() +
facet_grid(~ name) |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
In a general sense - yes, you can create a custom function that you can input two columns into. There are some specific functions that aren't implemented in You're also using a tibble/data frame as an input to your function in your example which isn't how you would do it in library(tidyverse)
#function definition
get_spec = function(time, original){
original = resid(lm(original~time))
original = original-median(original)
original = original/mad(original,center=0)
return(original)
}
#apply function
data%>%
group_by(series)%>%
mutate(residual=get_spec(time,original)) Here's a simple example that shows what you're going for: import tidypolars as tp
from tidypolars import col
def multiply_cols(col1, col2):
out = col1 * col2
return out
df = tp.Tibble(x = range(3), y = range(3))
df.mutate(new = multiply_cols(col('x'), col('y')))
Hope this helps - if you have any questions let me know. |
Beta Was this translation helpful? Give feedback.
-
@JP-Solomon Just to double check, if my R code were: library(tidyverse)
# define a user defined function
# that is reducing (length 1 output)
reducing_f = function(x,y){
cor(x,y)
}
# define a user defined function
# that is non-reducing (same length
# output as input)
non_reducing_f = function(x,y){
resid(lm(x~y))
}
#make fake data
dat = tibble(
a = rep(c(1,2),each=10)
, b = rnorm(20)
, c = rnorm(20)
)
(
dat
#group by one of the columns
%>% group_by(a)
# use the non-reducing user-defined function
%>% mutate(
d = f(b,c)
)
# use the reducing user-defined function
%>% summarise(
e = reducing_f(d,b)
)
) is my understanding correct that there's no possible translation to tidypolars that can achieve either the non-reducing nor reducing operations that take as input multiple columns? |
Beta Was this translation helpful? Give feedback.
In a general sense - yes, you can create a custom function that you can input two columns into. There are some specific functions that aren't implemented in
polars
ortidypolars
that you use (likelm()
) that limits recreating your exact example.You're also using a tibble/data frame as an input to your function in your example which isn't how you would do it in
polars
/tidypolars
. You would actually want to input columns. So in R your function would look like this: