200 points by chenglong-hn 1 day ago | 34 comments
Data Formulator blends UI interaction with natural language so that you can create visualizations with AI much more effectively!
You can:
* create rich visualizations beyond initial datasets, where AI helps transforming and visualizing data along the way
* iterate your designs and dive deeper using data threads, a new way to manage your conversation with AI.
Here is a demo video: https://github.com/microsoft/data-formulator/releases/tag/0....
Give it a shot and let us know how it looks like!
d_watt 13 hours ago
I quickly ran into a wall trying to do interesting things like "forecast a dataset using ARIMA." On the surface it just does a linear prediction, seeming to ignore me, but under the hood you can see the model tried importing a library not actually in my environment, failed, and fell back to linear.
Given that you're approaching this in a pythonic way, not sql, my default way of working with it is to think about what python stuff I'd want to do. How do you see handling these things in the future. Go the route of assuming anaconda, and prompt the model with a well known set of libraries to expect? Or maybe prompt the user to install libraries that are missing?
chenglong-hn 13 hours ago
While we design it targeting more end-user analysts scenarios (thus much simpler UI and function support), we see the big value of "freeing" GPT-4o for advanced users who would like to have AI do complex stuff. I guess a starting point could be having an "interactive terminal" where AI and the user can directly communicate about these out of the box concepts, even having the user instruct AI to dynamically generate new UI to adapt to their workflow.
paddy_m 16 hours ago
I had some core views that shaped what I built.
1. When doing data manipulation, especially initial exploration and cleaning, we type the same things over and over. Being proficient with pandas involves a lot of recognition of patterns, and hopefully remembering one with well written code (like you would read in Effective Pandas).
2. pandas/polars is a huge surface space in terms of API calls, but rarely are all of those calls relevant. There are distinct operations you would want on a datetime column, a string column or an int column. The traditional IDE paraidgm is a bit lacking for this type of use (python typing doesn't seem to utilize the dtype of a column, so you see 400 methods for every column).
3.It is less important for a tool to have the right answer out of the box, vs letting you cycle through different views and transforms quickly.
------
I built a low code UI for Buckaroo that has a DSL (JSON Lisp) that mostly specifies transform, column name, and other arguments. These operations are then applied to a dataframe, and separately the python code is generated from templates for each command.
I also have a facility for auto-cleaning that heuristically inspects columns and outputs the same operations. So if a column has 95% numbers and 1% blank strings, that should probably be treated as a numeric column. These operations are then visible in the lowcode UI. Multiple cleaning methods can be tried out (with different thresholds).
[1] https://github.com/paddymul/buckaroo
[2] https://youtu.be/GPl6_9n31NE?si=YNZkpDBvov1lUYe4&t=603 Demonstrating the low code UI and autocleaning in about 3 minutes
[3] There are other related tools in this space, specifically visidata and dtale. They take different approaches which are worth learning from.
ps: I love this product space and I'm eager to talk to anyone building products in this area.
chenglong-hn 13 hours ago
I wish multiple ways of interacting with data can co-exist seamlessly in some sort of future tool (without overwhelming users (?)) :)
paddy_m 13 hours ago
A tool like buckaroo requires investment into knowing where to click and how to understand the output intitially.
zurfer 20 hours ago
As a builder of something like that [2], I believe the future is a mix, where you have chat (because it's easy to go deep and refine) AND generate UIs that are still configurable manually. It's interesting to see that you also use plotly for rendering charts. I found it non-trivial to make these highly configurable via a UI (so far).
Thank you for open sourcing so we can all learn from it.
[1] https://news.ycombinator.com/item?id=41885231 [2] https://getdot.ai
flessner 16 hours ago
Microsoft Office, VS Code, Adobe Photoshop and most other large software platforms have all embraced this.
I have genuinely not seen an AI product that works standalone (without a preexisting platform) besides chat-based LLMs.
zurfer 20 hours ago
Some of these "agents" are used for surprising things like sorting: https://github.com/microsoft/data-formulator/blob/main/py-sr... [this seems a bit lazy, but I guess it works :D]
chenglong-hn 12 hours ago
Thus the sorting agent, and now running by default in the background!
DeathArrow 19 hours ago
chenglong-hn 12 hours ago
zurfer 18 hours ago
goose- 23 hours ago
larodi 22 hours ago
DeathArrow 19 hours ago
croes 17 hours ago
chenglong-hn 13 hours ago
sometimes reading charts help, sometimes looking at data helps, other times only code can serve the verification purpose...