Hacker Remix

Proof of location for online polls

108 points by c-riq 4 days ago | 86 comments

jawiggins 4 days ago

> Latency-based geolocation can help protect poll integrity by:

> Detecting when poll responses originate from outside the intended geographic region > Identifying attempts to manipulate polls through elevated VPN/proxy usage

Unless the user also needs to complete a reaction-time test, couldn't this be defeated by using a remote desktop connection to a machine that is physically located in the other geography?

It just shifts which functions need to run on the proxy, from network routing to the browser itself.

polon 4 days ago

I think this is covered on the page

"Successfully manipulating a poll which employs this method would require following efforts and resources:

Gaining control over a large number of devices in the target geographic region for submitting votes through those devices"

So yes, it seems like it can be defeated via a remote desktop (or any proxy in the allowed area)

comex 4 days ago

You don’t even need to gain control over a large number of devices in the region.

You just need _one_ device in the region, which can connect to the VPN or proxy service you were already using (the assumption seems to be that the attacker has a large number of IPs they can access through such a service). That device will get some added latency from going through the VPN/proxy, but because it’s physically close, the added latency will be small, probably not enough to reliably detect.

85392_school 4 days ago

If you're using a proxy, I don't think whether or not the source device is in the region changes anything. The only variance is in the time from where traffic exits the proxy to servers.

banana_giraffe 4 days ago

> Gaining control over a large number of devices in the target geographic region for submitting votes through those devices

Does AWS Lambda count as a machine for these purposes? If so, you can get a nearly infinite number of them just by cycling a config param and casting another vote.

gavinsyancey 4 days ago

I assume they'd just ban the entire AWS IP block. And similarly for other cloud providers.

ghayes 4 days ago

Couldn't the "test" add some variety of math challenge, thus making a simple proxy insufficient. Obviously, this method would add more noise to the final calculation, but if the proxy would need to forward its data to the end-user machine to perform the math, then a simple proxy in this case wouldn't be sufficient.

dheera 4 days ago

Yes, and also, I'd argue that anonymizing your location is a sacred feature of the internet that anytime someone builds a better mousetrap we WILL build a better mouse. The internet is not a place where requiring proof of location is welcome.

For online polls, it should never be necessary, either: My rights to vote somewhere should depend only on my membership status to that somewhere, and not my current physical location.

Larrikin 4 days ago

This is similar to the argument the failed experiment 4Chan showed the internet. Being fully anonymous, the best arguments don't rise to the top, bad actors lie and lie and when confronted with their lies, they just pretend to be someone else and lie some more. All completely anonymous online polls are effectively useless. It's nice to have some research in helping them be a little less useless.

dheera 4 days ago

Anonymity should still be a choice. Especially location anonymity.

While I don't mind 7 billion people knowing what I intentionally said publicly, I don't want 7 billion people knowing where I sleep or where I am at this exact moment.

frotty 3 days ago

I'd love to see your documentation on where it was ever claimed that 4chan was an experiment in anonymity creating a usable filter for quality?

Completely anonymous online polls are impossible, I'm thinking the goal is to have effectively non-publicly identifiable polling with the ability to disallow double voting. Seems absolutely trivial if Every Relevant Citizen was set up with their own API / digi-thumbprint.

Larrikin 3 days ago

It was one of the main selling points of 2chan and 4chan 20 years ago. I'm sure Moot is on record somewhere discussing it.

TrainedMonkey 4 days ago

Only a small subset of the IPs has proxies on them, so it would be detectable if a disproportionate amount of traffic is coming from them.

jagged-chisel 4 days ago

My state lottery app doesn’t let you play outside the state. It detects screen sharing and VPN configuration and refuses to run if it sees these things.

Depending on the importance of the poll, one could definitely apply these other requirements.

frotty 3 days ago

well yeah, that's against the point of "anonymity" ... you are feeding the app all the data it needs to fence you in.

By this logic every government gives a uniquely IDable device to its citizenry for engaging polls.

Besides ... if it was "important enough" to break, getting around geofencing etc. is a trivial/already solved part of this.

mac3n 4 days ago

having worked on IP geolocation in the past, I don't think this works. Though it can do a pretty good job of getting you in the right continent.

* Not all traffic goes through fiber - there are microwave links operating closer to the speed of light, though these are mostly reserved for high-speed trading. There's also satellite connections, but as long as they don't do satellite-staellite, they're slower.

* There are middleboxes messing with traffic, especially TCP, which add delay.

* If you rent servers in datacenters, you might not really know where they are. We had VMs relocated without our knowledge.

* Fibers links aren't direct, they tend to follow public right-of-ways. In much of the US, that's a rectangular grid along the highway system (look at a road map of the midwest sometime), increasing the delay by √2.

* Internet routing isn't shortest-path. It's get-this-crap-off-my-infrastructure, aka hot-potato.

* Anycast prefixes have IPs in multiple locations.

My experience was that with a lot of observation points, you could get within 10ms, 1000km in most places.

reincoder 3 days ago

I work for IPinfo, and we run active measurements through hundreds of servers. Allow me to highlight the fantastic points you made.

> Not all traffic goes through fiber - there are microwave links operating closer to the speed of light, though these are mostly reserved for high-speed trading. There's also satellite connections, but as long as they don't do satellite-staellite, they're slower.

That is a valid point. In that case, we have to fall back on the geofeed, WHOIS, or other methods of geolocation information. We are actively researching this area, though.

> There are middleboxes messing with traffic, especially TCP, which add delay.

This accounts for some problems but not all of them, as we have multiple servers running active measurements on individual IP addresses.

> If you rent servers in datacenters, you might not really know where they are. We had VMs relocated without our knowledge.

That is normal. At the scale at which we operate, server location validity is extremely important. We run daily checks and actively flag these issues. If things don't add up, we communicate with our vendors and try to understand whether it is a network-related issue or something else.

> Anycast prefixes have IPs in multiple locations.

Yes. With anycast IPs, we have hints of all their available locations, but when it comes to picking one, we default to the ASN-reported location.

> My experience was that with a lot of observation points, you could get within 10ms, 1000km in most places.

We have been significantly reducing the number of ASNs where we have high RTT by getting a server "networkly" close to them.

---

I understand that this is not the absolute best location system possible, but within the scope of our industry, we are miles ahead of everyone else. We are continuously investing in research and infrastructure to improve our data even further.

jampekka 4 days ago

> there are microwave links operating closer to the speed of light, though these are mostly reserved for high-speed trading

This is so sad.

mrguyorama 4 days ago

It's really not. The microwave links got decommissioned everywhere because nobody NEEDS that higher fraction of lightspeed. High speed trading is the only field where saving a singular millisecond is economically rewarded. The links used by high speed trading are the only ones left.

xethos 4 days ago

Sure, but if it becomes ubiquitous, web devs will assume lower latency. That wouldn't make it less sad, just makes different people sad - my first guesses are those at crowded areas with overloaded cellular connections, and Australians.

reocha 4 days ago

I think routing not being shortest path (nor being consistent) is the biggest issue with this method.

falcor84 3 days ago

Are you sure it'd only be accurate up to 1000km? I'm not as experienced but would have assumed that by sampling a dozen times, you would have at least 95% likelihood of a 100km radius

skaushik92 4 days ago

> Key Advantages: [...] Can provide supportive evidence for VPN/proxy usage, when the latency is too high for all server locations

I'm reading through the description, but I'm having trouble understanding the difference between a client having a higher overall latency due to bandwidth/connectivity concerns (e.g. a 3G phone) versus using a VPN. Both would have increased timings and the clock skew would be similar. Would both would be considered too high for proof of location?

c-riq 2 days ago

For slow connections you can still make use Geo IP data (such as maxmind.com) to infer location, which should be quite reliable in most cases. You just cannot meet the stricter hardware location proof criteria based on latency. You may still submit a poll answer but it may not be included in the analyses, which require a higher degree of confidence for the location. For the objective of obtaining a hard-to-manipulate sample of popular opinion, this would only be an issue if people with slow connections give systematically different responses for a given poll. But this should then also become apparent when analysing the data and can be considered for any decisions derived from the poll.

matthewdgreen 4 days ago

Why is clock skew being used here at all? I'm confused why the client's clock is being trusted or consulted in any way for a measurement like this. I should probably click through and read the details.

ETA Ok, reading the code turned up not a lot of comments. But it did produce the following line. I hope that's for testing and not the actual nonce generation process:

nonce = 'ieoskirlyzauuv6ehdug8lift65fkrddeuu6f5z6ka'

c-riq 4 days ago

> Why is clock skew being used here at all? You're right, it's not actually necessary to use the client clock at all. It was easier to implement it that way initially and I kept it in the description and didn't think about it again.. Thanks for pointing that out. Since all timestamps are measured, the calculations can actually also be made afterwards without using the client clock timestamps at all. However this may add a bit more noise. > not the actual nonce The nonce can only be used once so it's ok to share it afterwards.

38 4 days ago

No you read it right. The proposal is idiotic and Will resulted in rural voters being detected as foreign residents

croshan 4 days ago

A bit aggressive. No, wouldn't connecting to a slow 3g tower affect ping times to all global servers proportionately?

The proposal has other flaws, but phone to tower latency isn't one.

vitus 4 days ago

> No, wouldn't connecting to a slow 3g tower affect ping times to all global servers proportionately?

Yep. Per the article (last point under "How it works"):

> Users with a high latency to all servers can be excluded from polls, as this is a strong indicator of a VPN/proxy usage

Something seems off about how they're measuring latency (which seems to be "fetch various AWS Lambda endpoints"), since their system seems to think that I have hundreds of milliseconds of latency even to the nearest AWS region (even though in practice it should be an order of magnitude lower), and multiple seconds to the other side of the world.

edit: well, if the slowness is just on last-mile delivery, then it should be a fixed amount of overhead added to each connection (rather than a multiplier). For instance, I have about 8ms of latency added by my ISP just by the first hop into their network. But it's that same 8ms overhead whether I'm connecting to a server on the other side of town, or on the other side of the world.

jknoepfler 4 days ago

If eliminating signal from malicious, remote actors is more valuable than preserving signal from rural areas, which may very well be the case depending on the application, then adopting this might solve a real problem for you.

I don't see anything terribly idiotic in that.

edit: to be clear I think this is likely one of those solutions that creates more problems than it solves. There's a gulf of sympathy separating that from "idiocy," however.

kvdveer 4 days ago

There are a lot of valid critiques already, but here's another:

This technique will have to allow for over-all slow connections. This connection latency could be caused by over-provisioned office connections, torrents, bad gsm reception, cheap internet or a cheap device.

What prevents a client from strategically delaying specific requests, to simulate a slow device in the target geography. AFAIT, this would be indistinguishable from the scenarios mentioned above.

ranger_danger 4 days ago

> This technique will have to allow for over-all slow connections

I don't think the technique can accommodate that. But I would love to be proven wrong.

kvdveer 3 days ago

If only people with fast internet can vote, this is becomes a curious tool for voter suppression.

tony-allan 4 days ago

For example very slow ADSL connections or somewhere with poor phone reception.