remix logo

Hacker Remix

Show HN: WASM-powered codespaces for Python notebooks on GitHub

202 points by mscolnick 3 days ago | 27 comments

Hi HN!

Last year, we shared marimo [1], an open-source reactive notebook for Python with support for execution through WebAssembly [2].

We wanted to share something new: you can now run marimo and Jupyter notebooks directly from GitHub in a Wasm-powered, codespace-like environment. What makes this powerful is that we mount the GitHub repository's contents as a filesystem in the notebook, making it really easy to share notebooks with data.

All you need to do is prepend 'marimo.app' to any Python notebook on GitHub. Some examples:

- Jupyter Notebook: https://marimo.app/github.com/jakevdp/PythonDataScienceHandb...

- marimo notebook: https://marimo.app/github.com/marimo-team/marimo/blob/07e8d1...

Jupyter notebooks are automatically converted into marimo notebooks using basic static analysis and source code transformations. Our conversion logic assumes the notebook was meant to be run top-down, which is usually but not always true [3]. It can convert many notebooks, but there are still some edge cases.

We implemented the filesystem mount using our own FUSE-like adapter that links the GitHub repository’s contents to the Python filesystem, leveraging Emscripten’s filesystem API. The file tree is loaded on startup to avoid waterfall requests when reading many directories deep, but loading the file contents is lazy. For example, when you write Python that looks like

```python

with open("./data/cars.csv") as f: print(f.read())

# or

import pandas as pd pd.read_csv("./data/cars.csv")

```

behind the scenes, you make a request [4] to https://raw.githubusercontent.com/<org>/<repo>/main/data/car....

Docs: https://docs.marimo.io/guides/publishing/playground/#open-no...

[1] https://github.com/marimo-team/marimo

[2] https://news.ycombinator.com/item?id=39552882

[3] https://blog.jetbrains.com/datalore/2020/12/17/we-downloaded...

[4] We technically proxy it through the playground https://marimo.app to fix CORS issues and GitHub rate-limiting.

westurner 3 days ago

> CORS and GitHub

The Godot docs mention coi-serviceworker; https://github.com/orgs/community/discussions/13309 :

gzuidhof/coi-serviceworker: https://github.com/gzuidhof/coi-serviceworker :

> Cross-origin isolation (COOP and COEP) through a service worker for situations in which you can't control the headers (e.g. GH pages)

CF Pages' free unlimited bandwidth and gitops-style deploy might solve for apps that require more than the 100GB software cap of free bandwidth GH has for open source projects.

mscolnick 3 days ago

Thanks for sharing these resources

HanClinto 3 days ago

I absolutely love that this can be hosted on Github Pages. Am I correct in understanding that these notebooks will run independently, and will not need to proxy through marimo.app (in case the app goes down), or is that what the CORS thing is about in note 4, and it will still need to go through this domain?

mscolnick 3 days ago

Yea, this can be hosted on GitHub pages without any vendor infra (no marimo.app)

These are two separate features:

1) marimo.app + github.com/path/to/nb.ipynb does run on marimo.app infra. this is what the Show HN was about

2) separately, you can use the marimo CLI to export assets to deploy to GitHub page: `marimo export html-wasm notebook.py -o output_dir --mode run` which can then can be uploaded to GH pages. This does not find all the data in your repo, so you would need to stick any data you was to access in a /public folder for your site. More docs here: https://docs.marimo.io/guides/exporting/?h=marimo+export+htm...

wolfgangK 3 days ago

Nice ! Is it possible to connect to an in browser DB like WASM DuckDB https://duckdb.org/docs/api/wasm/overview.html or https://github.com/babycommando/entity-db ?

That would be most useful imho !

akshayka 3 days ago

duckdb works — just import duckdb. We also have built-in SQL cells, powered by duckdb, which should also work.

unrealhoang 3 days ago

can I config sql cells to use different data source (remote DB/APIs) instead of local duckdb?

hzuo 3 days ago

Super cool to see a real use-case of WASM outside of just game dev and nerding out.

pjmlp 3 days ago

We also have Flash, Java Applets, ActiveX and Silverlight back, running on top of WebAssembly.

mring33621 2 days ago

i hope this is true

pjmlp 2 days ago

It is, and it is quite easy to find where those implementations are available, not going to do the search engine work.

PKop 3 days ago

Blazor is another example