94 points by paddy_m 21 hours ago | 8 comments
Auto Cleaning looks at columns and heuristically suggests common cleaning operations. The operations are added to the lowcode UI where they can be edited. Multiple cleaning strategies can be applied and the best fit retained. Autocleaning without a UI and multiple strategies is very opaque. Since this runs heuristically (not with an LLM), it’s fast and data stays local.
I'm eager to hear feedback from data scientists and other users of dataframes/notebooks.
ZeroCool2u 20 hours ago
1: https://marketplace.visualstudio.com/items?itemName=ms-tools...
paddy_m 19 hours ago
The Buckaroo lowcode UI is capable of working with Polars, but I don't currently have any commands plumbed in. I will work on that.
I'm aware of Data Wrangler and they did nice work, but it's closed source and from what I can tell non-extensible. What features do you like in Data Wrangler, what do you wish it did differently?
paddy_m 19 hours ago
I need to make some updates to the polars functionality, I just completed some extensive refactorings of the Lowcode UI focussed on pandas, time to clean that up for polars too.
Also the python codegen for polars is non-idiomatic with multiple re-assignments to a dataframe, vs one big select block. I have some ideas for how to fix that, but they'll take time.
21 hours ago
epistasis 8 hours ago
Currently I use a mix of quak (preferred) and itable (if starting fom a colab notebook). It will be interesting to compare for my use cases, which most consist of checking for the distribution of data in a new file, or verifying that a transform I did resulted in the right sort of stuff.
hodder 8 hours ago
franky47 18 hours ago
trsohmers 15 hours ago
RyanHamilton 18 hours ago
leelou2 14 hours ago
gitroom 9 hours ago