220 points by olup 1 week ago | 89 comments
sashank_1509 5 days ago
My example was an image segmentation model. I managed to create an dataset of 100,000+ images and was training UNets and other advanced models on it, always reached a good validation loss but my data was simply not diverse enough and I faced a lot of issues in actual deployment, where the data distribution kept changing on a day to day basis. Then, I tried DINO v2 from Meta, finetuned on 4 images and it solved the problem, handled all the variations in lighting etc with far higher accuracy than I ever achieved. It makes sense, DINO was train on 100M + images, I would never be able to compete with that.
In this case, the company still needed my expertise, because Meta just released the weights and so someone had to setup the fine-tuning pipeline. But I can imagine a fine tuning API like OpenAI’s requiring no expertise outside of simple coding. If AI results depend on scale, it naturally follows that only a few well funded companies, will build AI that actually works, and everyone else will just use their models. The only way this trend reverses, is if compute becomes so cheap and ubiquitous, that everyone can achieve the necessary scale.
pmontra 5 days ago
We would still need the 100 M+ images with accurate labels. That work can be performed collectively and open sourced but it must be maintained etc. I don't think it will be easy.
goldemerald 5 days ago
EGreg 5 days ago
How did Chinese companies do it, is it a fabricated claim? https://slashdot.org/story/24/12/27/0420235/chinese-firm-tra...
NegatioN 5 days ago
This change of not needing ML engineers is not so much about the models, as it is about easy API access for how to finetune a model, it seems to me?
Of course it's great that the models have advanced and become better, and more robust though.
isoprophlex 5 days ago
Simply yeeting every "object of interest" into DINOv2 and running any cheap classifier on that was a game changer.
ac2u 5 days ago
isoprophlex 5 days ago
Not using it to create segmentations (there are YOLO models that do that, so if you need a segmentation you can get it in one pass), no, just to get a single vector representing each crop.
Our goal was not only to know "this is a traffic sign", but also do multilabel classification like "has graffiti", "has deformations", "shows decoloration" etc. If you store those it becomes pretty trivial (and hella fast) to pass these off to a bunch of data scientists so they can let loose all the classifiers in sklearn on that. See [1] for a substantially similar example.
[1] https://blog.roboflow.com/how-to-classify-images-with-dinov2
ac2u 5 days ago
IanCal 5 days ago
I was able to turn around a segmentation and classifier demo in almost no time because they gave me fast and quick segmentation from a text description and then I trained a YOLO model on the results.
Imnimo 6 days ago
Morizero 6 days ago
miki123211 5 days ago
Many LLM use cases could be solved by a much smaller, specialized model and/or a bunch of if statements or regexes, but training the specialized model and coming up with the if statements requires programmer time, an ML engineer, human labelers, an eval pipeline, ml ops expertise to set up the GPUs etc.
With an LLM, you spend 10 minutes to integrate with the OpenAI API, and that's something any programmer can do, and get results that are "good enough".
If you're extremely cash-poor, time-rich and have the right expertise, making your own model makes sense. Otherwise, human time is more valuable than computer time.
kjkjadksj 5 days ago
Terr_ 6 days ago
mattnewton 6 days ago
Terr_ 5 days ago
relativ575 5 days ago
Morizero 5 days ago
relativ575 5 days ago
> LLMs and the platforms powering them are quickly becoming one-stop shops for any ML-related tasks. From my perspective, the real revolution is not the chat ability or the knowledge embedded in these models, but rather the versatility they bring in a single system.
Why use another piece of software if LLM is good enough?
comex 5 days ago
Also privacy. Do museum visitors know their camera data is being sent to the United States? Is that even legal (without consent) where the museum is located? Yes, visitors are supposed to be pointing their phone at a wall, but I suspect there will often be other people in view.
titzer 5 days ago
msp26 5 days ago
If I have a project with a low enough lifetime inputs I'm not wasting my time labelling data and training a model. That time could be better spent working on something else. As long as the evaluation is thorough, it doesn't matter. But I still like doing some labelling manually to get a feel for the problem space.
JayShower 6 days ago
Con of this approach would be that it’s requires maintenance if they ever decide to change the illustration positions.
armchairhacker 5 days ago
arkh 5 days ago
Embedding a QR code or simply a barcode somewhere and you're done. Maybe hide it like a watermark so it does not show to the naked eye and doing some Fourier transform in the app won't require a network connection nor lot of processing power.
ndileas 5 days ago
jaffa2 5 days ago
suriya-ganesh 5 days ago
We ran a benchmark of our system against an LLM call and the LLM performed much better for so much cheaper, in terms of dev time, complexity, and compute. Incredible time to be in working in the space seeing traditional problems eaten away by new paradigms