remix logo

Hacker Remix

Show HN: Decentralized robots (and things) orchestration system

51 points by hannesfur 4 days ago | 24 comments

Hi HN, we build an open-source operating system extension for orchestrating robot swarms fully decentralized.

This first beta version allows you to create fully decentralized robot swarms. The system will set up a wireless mesh network and run a p2p networking stack on top of it, such that nodes can interact with each other through various abstractions using our SDKs (Rust, Python, TypeScript) or a CLI.

We hope this is a step toward better inter-robot communication (and a fun project if you have a few Raspberry Pis lying around).

Our mesh network is created by B.A.T.M.A.N.-adv and we’ve combined this with optimized decentralized algorithms. To a user, it becomes very easy to write decentralized applications involving several peers since we’ve abstracted away much of the complexity. Our system currently offers several orchestration primitives (Key-Value Store, Pub-Sub, Discovery, Request-Response, Mesh Inspection, Debug Services, etc.)

Internally, everything except the SDKs is written in Rust, building on top of libp2p. We use gRPC to communicate between the SDKs and the CLI, so libraries for other languages are possible, and we welcome contributions (or feedback).

The C++ SDK and a ROS package that should feel natural to roboticists are in the works. Soon we also want to support a collaborative SLAM and a distributed task queue.

We’d love to hear your thoughts! :)

accurrent 14 minutes ago

This looks super cool, Ive been working in a similar space for some time. I work with open-rmf which is semi-decentralized and provides tools for task and traffic deconfliction (we don't handle the network layer at all). Excited to see more similar software coming up.

wngr 5 hours ago

Great idea combining batman with libp2p! You guys have the heart in the right place :-).

Currently, your project seems to be an opinionated wrapper ontop of libp2p. For this to become a proper distributed toolkit you lack an abstraction to for apps to collaborate over shared state (incl. convergence after partition). Come up with a good abstraction for that, and make it work p2p (e.g. delta state based CRDTs, or op-based CRDTs based on a replicated log; event sourcing ..). Tangentially related, a consensus abstraction might also be handy for some applications.

Also check out [iroh](https://github.com/n0-computer/iroh) as a potential awesome replacement for p2p; as well as [Actyx](https://github.com/Actyx/Actyx) as an inspiration of similar (sadly failed) project using rust-libp2p.

Oh, and you might want to give your docs a grammar review.

Kudos for showing!

hannesfur 4 hours ago

You are right. At the moment, we are an opinionated wrapper, but we take a different approach to discovery than other libp2p-based networks with our custom batman-adv-based neighbor discovery.

Abstractions for collaboration are currently in the works, and we hope to release that soon. The work on consensus has already started. Your suggestions seem all very interesting, and we'll definitely consider them. We are also currently in the process of talking to potential users to build handy and approachable abstractions for them.

I saw that [freenet](https://docs.freenet.org/components/contracts.html) went with CRDTs, but I think they made it too complicated. We were thinking about a graph (or wide-column) with an engine similar to Kassandara and a frontend like (or ideally just) SurrealDB.

I remember that iroh moved away from libp2p when they dropped IPFS compatibility and moved to a self-built stack: https://www.iroh.computer/blog/a-new-direction-for-iroh When we got started, the capabilities of iroh didn't really fit our bill, but it seems like it's time to reevaluate that. As a former contributor to rust-libp2p, I never quite got the frustration with libp2p that many people have, Iroh included, especially since many of the described problems seemed fixable, and I would have preferred if they did that instead, and libp2p remains the shared base people build these things on.

I remember Actyx being a rust-libp2p user, but I wasn't aware that they failed. Do you have more info? How and why? It would be great if we could learn from them.

Grammar will be reviewed ;) thank you!

Animats 3 hours ago

Read the architecture document here.[1]

The usual problems with these things are discovery and security. Discovery is done via local WiFi broadcast. Not clear how security is done. How do you allow ad-hoc networking yet disallow hostile actors from connecting?

[1] https://docs.p2p.industries/concepts/architecture/

hannesfur 3 hours ago

We do discovery via the mesh, yes, but instead of broadcasting (like mDNS), we query batman-adv for the current visible neighbors. If a new neighbor is discovered, we will contact them directly (via WiFi) to exchange the addresses in the P2P network and then dial them. From that, we populate the local Kademlia routing table with the content of the neighbor.

Security is still a big issue. In the current state, there is no security other than application-layer encryption (QUIC & TLS v1.3). That is fine for pilot projects, but it should not be used for anything sensitive. Maybe we should point this out more clearly in the docs. However, some Wi-Fi chips (not the ones on Raspberry Pi, sadly) also allow setting a password in adhoc (IBSS) mode and 802.11s has native support for encryption. In general is here a problem with lack of adoption of standards by the WiFi chip manufacturers and with Broadcom (the chip on the RP) a lack of support in the Linux kernel driver.

We are planning to implement authentication and encryption in the upcoming release, but this might be a paid feature.

Typically, enterprise networks are encrypted via 802.11x (since a leak of the key wouldn't compromise the whole network), and we might be able to build a decentralised Radius server, but I'm not very fond of that idea.

Ideally, the damage one can do by joining the network unauthorized should be very limited anyway, and authentication and encryption should happen on Layer 5.

Would love feedback / inspiration / suggestions

jazzyjackson 47 minutes ago

Might consider good old x509 certificates, mTLS authentication. You can query and find peers but don’t exchange any data with them unless they can present a certificate signed by whatever issuer. Agree its probably an enterprise upsell because the openssl tooling is a PITA if you’ve never done it before, but somebody pointed me to KeyStore Explorer [0] and I’m going to give that a try to be my own certificate authority.

I wish it could be a more mainstream, hobbyist auth solution tho, it’s completely free and open and self sovereign etc etc and makes strong security guarantees, just a steep learning curve to grok what’s happening. I think it would be a big achievement if somebody slapped a friendly API / wizard over configuring a CA and creating certs to install on each of your robots / IoT sensors whathaveyou. Corsha [1] is one provider in this space, and Yubico is contributing too [2], allowing you to sign cert requests with your Yubikey.

[0] https://keystore-explorer.org/features.html

[1] https://corsha.com/

[2] https://www.yubico.com/resources/glossary/what-is-certificat...

NotAnOtter 7 hours ago

Very fun. Is this primarily a passion project or are you hoping to get corporate sponsorship & adoption?

Can you provide some insight as to why this would be preferred over an orchestration server? In this context - Would a 'mothership'/Wheel-and-spoke drone responsible for controlling the rest of the hive be considered an orchestration server?

This isn't my area of expertise but I think "Hive mind drones" tickles every engineer.

lmeierhoefer 6 hours ago

> Is this primarily a passion project or are you hoping to get corporate sponsorship & adoption?

We are in the current YC W25 batch and our vision is to build a developer framework for autonomous robotics systems from the system we already have.

> Can you provide some insight as to why this would be preferred over an orchestration server?

It heavily depends on your application, there are applications where it makes sense and others where it doesn’t. The main advantages are that you don’t need an internet connection, the system is more resilient against network outages, and most importantly, the resources on the robots, which are idle otherwise, are used. I think for hobbyists, the main upsides is that it’s quick to set up, you only have to turn on the machines and it should work without having to care about networking or setting up a cloud connection.

> Would a 'mothership'/Wheel-and-spoke drone responsible for controlling the rest of the hive be considered an orchestration server?

If the mothership is static, in the sense that it doesn’t change over time, we would consider it an orchestration server. Our core services don’t need that and we envision that most of the decentralized algorithms running on our system also don’t rely on such central point of failure. However, there are some applications where it makes sense to have a “temporary mothership”. We are just currently working on a “group” abstraction, which continuously runs a leader election to determine a “mothership” among the group (which is fault-tolerant however, as the leader can fail anytime and the system will instantly determine another one).

NotAnOtter 6 hours ago

> The main advantages are that you don’t need an internet connection

To that end, I'm not clear on benefit in this model. To solve that problem I would just take a centralized framework and stick it inside an oversized drone/vehicle capable of carrying the added weight (in CPU, battery, etc.). There are several centralized models that don't require an external data connection

> the resources on the robots, which are idle otherwise, are used

But what's the benefit of this? I don't see the use case of needing the swarm to perform lots of calculations beyond the ones required for it's own navigation & communication with others. I suppose I could imagine a chain of these 'idle' drones acting as a communication relay between two separate, active hives. But the benefit there seems marginal.

> our system also don’t rely on such central point of failure

This seems like the primary upside, and it's a big one. I'm imagining a disaster or military situation where natural or human forces could be trying to disable the hive. Now instead of knocking out a single mothership ATV - each and every drone need to be removed to full disable it. Big advantage.

> We are just currently working on a “group” abstraction

Makes sense to me. That's the 'value add', might as well really spec that out

> leader election to determine a “mothership” among the group

This seems perfectly reasonable to me and doesn't remove the advantages of the disconnected "hive". But I do find it funny that the solution to decentralization seems to be simply having the centralization move around easily / flexibly. It's not a hive of peers, it's a hive of temporary kings.

lmeierhoefer 5 hours ago

Thanks for the feedback!

> I would just take a centralized framework and stick it inside an oversized drone/vehicle capable of carrying the added weight

Makes sense. I think there are scenarios where such “base stations” are a priori available and “shielded,” so in this case, it might make more sense to just go with a centralized system. This could also be built on top of our system, though.

> But what’s the benefit of this?

I agree that, in many cases, the return on saving costs might be marginal. However, say you have a cluster of drones equipped with computing hardware capable enough to run all algorithms themselves—why spin up a cloud instance for running a centralized version of that algorithm? It is more of an engineering-ideological point, though ;)

> But I do find it funny that the solution to decentralization seems to be simply having the centralization move around easily / flexibly. It’s not a hive of peers, it’s a hive of temporary kings.

Most of our applications will not need this group leader. For example, the pubsub system does not work by aggregating and dispatching the messages at a central point (like MQTT) but employs a gossip mechanism (https://docs.libp2p.io/concepts/pubsub/overview/).

What I meant is that, in some situations, it might be more efficient (and it’s easier to reason about) to elect a leader. For example, say you have an algorithm that needs to do a matching between neighboring nodes —i.e., each node has some data point, and the algorithm wants to compute a pairwise similarity metric and share all computed metrics back to all nodes. You could do some kind of “ring-structure” algorithm, where you have an ordering among the nodes, and each node receives data points from the predecessor, computes its own similarity against the incoming data point, and forwards the received data point to its successor. If one node fails, the neighboring nodes in the ring will switch to the successor. This would be truly decentralized, and there is no single point of failure. However, in most cases, this approach will have a higher computation latency than just electing a temporary leader (by letting the leader compute the matchings and send them back to everyone). So someone caring about efficiency (and not resiliency) will probably want such a leader mechanism.