Skip to content

Instantly share code, notes, and snippets.

@bnm3k
Last active March 3, 2024 21:25
Show Gist options
  • Save bnm3k/d81ee2493b19a3653f6381c3d6056447 to your computer and use it in GitHub Desktop.
Save bnm3k/d81ee2493b19a3653f6381c3d6056447 to your computer and use it in GitHub Desktop.

How do I convert an expression back into a series

I'd like to convert an expression into its output, particularly as a series. I couldn't find anything in the docs for doing so. I've got the following:

import polars as pl
from polars_general_plugin import get_len_keep_odd


df = pl.DataFrame(
    {
        "strs": ["foo", "1234" "123", "12345"],
    }
)
lengths = get_len_keep_odd(
    df["strs"],
).alias("lens")

On printing lengths I get:

Series[strs]./polars_general_plugin/polars_general_plugin/polars_general_plugin.abi3.so:get_len().alias("lens")

Instead of something like 3 7 5.

Context

A little bit more context. I'm currently learning how to write polars plugins. This plugin takes in a String series and returns the length of the strings, filtering out those whose length is even:

From the rust side, I've got the following:

#[derive(Deserialize, Debug)]
struct Kwargs {}

#[polars_expr(output_type=UInt32)]
fn get_len(input: &[Series], kwargs: Kwargs) -> PolarsResult<Series> {
    let ca = input[0].str()?;

    // config
    let out: UInt32Chunked = ca
        .into_iter()
        .map(|v: Option<&str>| v.map(|s| s.len() as u32))
        .filter(|v: &Option<u32>| if let Some(l) = v { l % 2 != 0 } else { false })
        .collect();
    Ok(out.into_series())
}

From the python side, I've got the following:

from polars_general_plugin.util import parse_into_expr

lib = _get_shared_lib_location(__file__)


def get_len_keep_odd(
    expr: IntoExpr,
) -> pl.Expr:
    expr = parse_into_expr(expr)
    return expr.register_plugin(
        lib=lib,
        symbol="get_len",
        is_elementwise=True,
        kwargs={},
    )

For the sake of completion, parse_into_expr is as follows:

def parse_into_expr(
    expr: IntoExpr,
    *,
    str_as_lit: bool = False,
    list_as_lit: bool = True,
    dtype: PolarsDataType | None = None,
) -> pl.Expr:
    if isinstance(expr, pl.Expr):
        pass
    elif isinstance(expr, str) and not str_as_lit:
        expr = pl.col(expr)
    elif isinstance(expr, list) and not list_as_lit:
        expr = pl.lit(pl.Series(expr), dtype=dtype)
    else:
        expr = pl.lit(expr, dtype=dtype)
    return expr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment