Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Netflix Simplified Batch Compute with Kueue (netflixtechblog.com)

56 points by dalvrosa 5 days ago | 13 comments

lukax 2 days ago [-]

It's refreshing to see a tech article that isn't about AI. It feels like 5 years ago.

__turbobrew__ 2 days ago [-]

Anyone know if Netflix does anything for the k8s storage layer? I imagine they are at the scale where etcd starts to go kaboom? Or maybe they have enough cells where that isn’t a problem?

Given Amazon and Google have their own secret sauce for replacing etcd, I am wondering if Netflix does anything special?

scripni 2 days ago [-]

This runs on AWS managed EKS these days, this talk goes into more detail about Netflix's special sauce around the k8s control plane: https://www.youtube.com/watch?v=vaTOiXR2KSM

Netflix actually has much fewer cells than you'd expect btw, their special sauce IMO is federation and using a small subset of k8s APIs.

__turbobrew__ 2 days ago [-]

I am surprised a company at that scale is running on managed EKS, maybe I underestimate how large the clusters are.

zbentley 1 days ago [-]

EKS can get pretty damn big, well into the thousands of nodes without much special tuning, and beyond that with some care and control plane monitoring. Expensive, though.

__turbobrew__ 10 hours ago [-]

> Expensive, though.

That is my point. I work at a large multinational and we run tens of thousands of kubernetes nodes on-prem and Im pretty sure that would be in the hundreds of millions of dollars per year to run in EKS. We run on-prem nodes about equivalent to c6a.32xlarge and even with 2 year reserved pricing you are looking at $17k/year/node. At 20000 nodes you are looking at $340 million/year, not including egress fees or any other AWS service charges (such as EBS).

I can tell you with certainty that the all-in costs to run kubernetes on-prem (including staffing costs) is a lot less than $340 million/year AND we don’t have vendor lock in. In total we have 7 full time engineers building and running on-prem kubernetes. The more nodes you have, the more it makes sense as the team size is mostly independent of the number of nodes, so that team of 7 could also run 40000 nodes without issues. The cost becomes dominated by the capex to purchase hardware. I would say team size is log(nodes).

For a company the scale of Netflix, I would assume the math is similar — especially since they already have in house expertise to run their own hardware — but maybe they get a very steep discount from AWS.

stackskipton 2 days ago [-]

It's possible they are using kine: https://github.com/k3s-io/kine

whinvik 2 days ago [-]

I see Netflix pumping out tech articles but can't help but notice how much worse the UI experience is getting. Video erroring out, general slowness etc.

Did they just give up?

pjmlp 22 hours ago [-]

Easy answer, not the same team.

jamesblonde 2 days ago [-]

It certainly feels like Netflix is now a k8s shop. And it probably only a matter of time until they start repatriating workloads to optimize for costs. Then the world will sit up and notice.

beng-nl 2 days ago [-]

I don’t get what you’re implying. What is repatriating; You think they will move their workloads to on-prem?

Is there something different about the world that changed the trade-off calculus for cloud vs on-prem from how it was in the last 15 years compared to now?

(I’m as anti-cloud-overspend as the next guy on hn btw. Just trying to make sense of your comment’s worldview.)

jamesblonde 1 days ago [-]

Yes, coding agents have reduced the skills/knowledge required to operate workloads on virtualized hardware. K8S and its ecosystem has changed so that it now provides 90% of what you need from the public cloud providers. Big changes that make 8-15X savings by running your own workloads. I think it will be the big players who move first, as they have most to save and have the resources to make it happen.

scripni 2 days ago [-]

Congrats, this is awesome!

Rendered at 07:03:05 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.