The Uber Engineering Tech Stack, Part I: The Foundation

Animats · on July 24, 2016

It's interesting that they don't break the problem apart geographically. It's inherent in Uber that you're local. But their infrastructure isn't organized that way. Facebook originally tried to do that, then discovered that, as they grew, friends weren't local. Uber doesn't need to have one giant worldwide system.

Most of their load is presumably positional updates. Uber wants both customers and drivers to keep their app open, reporting position to Master Control. There have to be a lot more of those pings than transactions. Of course, they don't have to do much with the data, although they presumably log it and analyze it to death.

The complicated part of the system has to be matching of drivers and rides. Not much on that yet. Yet that's what has to work well to beat the competition, which is taxi dispatchers with paper maps, phones, and radios.

genmon · on July 24, 2016

Uber is pretty formidable in building and growing a two-sided market. I suspect it's tuned continuously, at high resolution (in space and time), with levers I mostly don't know. And that's got to be a big contribution to the complexity of the stack.

This of it this way. A standard e-commerce site, or SaaS with low-touch marketing... there are a crazy number of KPIs to monitor, loads of levers (e.g. what's the right discount to fix basket abandonment). The instrumentation to track all these conversation funnels (and to do A/B tests to see what works) is half the job.

But at least we have a common understanding of the metrics and the levers -- for SaaS, say, there are tons of similar services, and the knowledge is shared in the community.

Uber? How do you grow while maintaining market liquidity every evening of every week? If you artificially hike demand from passengers in a particular neighbourhood (say with coupons), does WOM amongst potential drivers work to increase the driver pool before the passengers get frustrated and move to Lyft?

All of this is new. So how do you create the tech to track not just all the data you need, but all the data you might need, plus the capabilities to do tests to figure out what levers to pull? Hard.

I have no particular insight into this. But my guess is that Uber isn't flying blind - their growth has been no accident - and the complexity of their tech is due to instrumentation not operations.

matthewrudy · on July 24, 2016

I work on an uber-like system, but with ~3 backend devs rather than 100s.

We made the opposite decision, cloning our full stack for each new market.

That's great for scalability, but is a nightmare for devops.

If anything we want to find a way to move to one global system, And then slice down the bits that can be local:

Eg. Create a local order matching service, but keep orders, payments, and user accounts global

tupshin · on July 24, 2016

Recipe:

Create a global Cassandra cluster with regional datacenters.

Use one keyspace per region

Use per-keyspace replication to only replicate that region's data locally, and to one or more additional datacenters

Have stateless app servers colocated with Cassandra in each DC handling all local traffic

Run spark on top of Cassandra to do analytics, or to do the etl to a dedicated analytics system

Optionally have a single "master" DC, with replicas of all data from all locations, that doesn't serve end user traffic, but is to allow efficient cross region analytics.

Profit (optional step)

And yes, the company I work for (Datastax) has a product and services to help make it simple.

scaleout1 · on July 24, 2016

Thats an interesting approach. One question though, in our use case user often travel from city to city and country to country. How do you model that if you are only using local DC and local replications?

methehack · on July 24, 2016

I think the regional keyspaces would be have to be caches -- denormalize it, basically. Pop/re-fresh people into the geo-based caches as they moved around. Truth sits behind it, centralized (perhaps partitioned in some way that makes sense globally but is sub-optimal from a regional cache perspective). Might not be worth it -- hard to know from here. :)

ajslater · on July 24, 2016

afaik, this is how Facebook does it, but with regional sources of truth. If you signed up for FB in Paris and move to San Francisco, your master profile lives in Europe in perpetuity and you'll use your regional cache forever in the USA. The number of people moving far away from their home DC's should be a reasonably small fraction of the total for it not to matter.

danpalmer · on July 24, 2016

As far as I'm aware, Hailo (>3 backend devs, not quite 100s) did exactly this as well, and the ex-Hailo devs I've spoken to considered it a pretty bad move. It took them ages to refactor into a global system if I remember rightly.

matthewrudy · on July 24, 2016

Yep, I had a long chat with one of their engineers (Matt Heath) and seems like they had the exact same problems we have.

And solved it with their restructure and move to microservices.

brianwawok · on July 24, 2016

If you are doing devops, wouldn't deploying to 1 data center vs 20 be no more work? It's just a loop around a script.

danpalmer · on July 24, 2016

All it takes is for one server in one datacenter to be slightly different, or perhaps you had a bugfix that needed to go out for users in one area, but you couldn't take the risk of a flaky deploy for the areas that didn't need it, now you've got a deploy that will be a lot more complicated or error-prone than a loop around deployment to one location.

brianwawok · on July 24, 2016

> All it takes is for one server in one datacenter to be slightly different

Don't do that. No one is allowed to ssh to boxes. If you need to enforce it by blowing up and rebuilding all servers once per week, do that.

> perhaps you had a bugfix that needed to go out for users in one area, but you couldn't take the risk of a flaky deploy for the areas that didn't need it

Feature flags. Default off, but flip on a new path of code for a set of users.

You should deploy so often that it is routine. Deploy 50 times a day. You will find bugs at first, but eventually you should get to the point that you could deploy for every single commit and no one would notice. (Now they may not be a good idea depending on your risk tolerance and other things, but you should be ABLE to deploy every single commit).

Not saying these are simple things to do, but if you are approaching servers with a devops mindsets, you literally should not care about number of servers or datacenters.

matwood · on July 24, 2016

> Don't do that. No one is allowed to ssh to boxes. If you need to enforce it by blowing up and rebuilding all servers once per week, do that.

Yep, this is change control 101. I used to have a boss who would go in and edit sprocs on the production server and never get them into source control. I finally just encrypted everything on the server so the only way to push out new code was to commit to source control. A sledge hammer yes, but sometimes it is required.

Use Terraform or some other tool to codify your entire infrastructure.

>You should deploy so often that it is routine. Deploy 50 times a day. You will find bugs at first, but eventually you should get to the point that you could deploy for every single commit and no one would notice. (Now they may not be a good idea depending on your risk tolerance and other things, but you should be ABLE to deploy every single commit).

This is a great point. At first it is terrifying, but then you realize that a deployment is so easy that bugs are generally not a big deal. Bugs happen, so the goal should be to shorten the time between bug found and fix deployed.

danpalmer · on July 24, 2016

> Don't do that. No one is allowed to ssh to boxes. If you need to enforce it by blowing up and rebuilding all servers once per week, do that.

I'd love to do that. I'd love to have no access to production servers, but ultimately that requires far more work to get right than Ansible configuring the same machines again and again. It also means you can't use dedicated hardware as easily, which restricts performance. It's a great situation to be in, but difficult to get to and requires a non-trivial amount of overhead.

At my place of work we deploy somewhere between 5 and 40 times on any given work day (on a team of 5 engineers). That's because we've managed to engineer a reliable and fast deployment process, but that took a long time to get right. It's powerful, but the overhead, particularly on a small team who are under pressure in a startup environment, can be quite large.

I'm not saying you're wrong, in terms of best practice I completely agree, but when the tradeoff is between sales/acquisition/product market fit/etc, and having a 'smooth' devops process, in many cases, the latter must come second.

brianwawok · on July 24, 2016

You can devops dedicated hardware. It's a little bit different but not that much. Heck there are boot2docker and such that let you just run docker on bare metal.

seanp2k2 · on July 25, 2016

>docker >bare-metal

To some, no virtualization is still best.

Edit: you can still PXE boot immutable images.

brianwawok · on July 25, 2016

I'm not sure boot2docker even has a virtualization level. Havent dug super far into it.

krinchan · on July 24, 2016

This is so weird. I hear this all the time. At my place of employment, we all have the ssh keys into our EC2 instances, but no one configures them. Ever. Period. Those ssh keys are purely for either validating changes in a test environment (like to .ebextensions) or diagnosing production issues (why did Puma fall over this time? why isn't syslog output making it to loggly?).

Of course, we lean heavily on Elastic Beanstalk; autoscaling regularly kills old instances daily since we scale from 2 to 18 and back to 2 instances in a 24 hour period across about 9 microservices.

So, if this ssh into boxes and change things is common, it means people aren't doing auto-scaling? THAT is scary.

brianwawok · on July 24, 2016

I'm very very pro devops. But I am cautious about auto scale.

It adds a lot of complexity. If your app has predprtable load it may simplify stuff a lot to not auto scale. Think a b2b app with manual account creation. You know your user levels. 0 reason to turn on auto scale.

krinchan · on July 25, 2016

...so you run the servers you need for peak load 24/7? Ew. So much money just thrown away because "it's too complicated". We just scale based off of Network Out and only really spent a week finding the right threshold.

brianwawok · on July 25, 2016

How long does your stack take to add a server. 3 minutes?

I have created a lot of services that couldn't have crappy perf for 3 minutes every time the load scales. Especially since in many apps the most expensive part is storage and dbs which cannot auto scale.

So say you spend 1k a month in your dbs and $100 for peak web load. You could spend a week fixing auto scaling bugs and try to save $25 by scaling web nodes down... But then you still have bad perf every time load spikes. I would not do that to save $25. I would question a company I work for that did that.

You need to be scaling up and down for hundreds of dollars per swing and have big spike loads before you add the complexity of auto scale.

krinchan · on July 26, 2016

We're sensitive enough that we preemptively scale as traffic "appears" to be increasing. So we add 5 EC2 instances, not at crisis levels of traffic, but "Hmmm...I feel a tingling in my extremities". We then remove one instance at a time if traffic falls below "Not doing anything" levels. The time between scaling actions is 15 minutes. Since ASG's go to remove older instances first, we don't end up getting charged a full hour for instances that are up for less than an hour that often.

Admittedly, our web traffic is very US Business Hours centric and peaks predictably between 3 and 4 in the afternoon.

Also, we're operating more at a scale of $40,000/month for peak traffic capacity 24/7 and $25,000/month once I got autoscaling worked out. So...yeah. I guess the scale for savings matters. :-)

icebraining · on July 24, 2016

All it takes is for one server in one datacenter to be slightly different

But doesn't a global system still run on multiple DCs, at least for redundancy?

perhaps you had a bugfix that needed to go out for users in one area, but you couldn't take the risk of a flaky deploy for the areas that didn't need it

But if you have a single global system, you can't even make that decision.

To be clear, I'm not arguing against it, I'm just trying to understand why, since my first instinct would be to divide geographically as well.

danpalmer · on July 24, 2016

> But doesn't a global system still run on multiple DCs, at least for redundancy?

If you've got 2 levels of separation - servers and 'groups' (whether they are datacenters, or whatever) - you've got 2 levels at which that special casing needs to happen. If you only have 1 level - servers - i.e. one deployment, even if that's across multiple datacenters, you only have 1 place to special case. I'd say that's easier.

> But if you have a single global system, you can't even make that decision.

Good point, but my point was that it will be simpler and less error prone in general. You might not be able to push the bugfix, or you might have to risk the deploy globally, but I think either would be better in the long run for a simpler deployment. It is a trade-off though.

matthewrudy · on July 24, 2016

There's the other aspect too, which is each country has their own cluster, so they can (and do) think of excuses to hold back updates just for them.

So software gets out of sync, the mobile app releases need to take that into account too.

BenPlanetary · on July 24, 2016

Your user accounts aren't global? So does that mean if I travel to a new city I have to create a whole new account?

matthewrudy · on July 24, 2016

Surprisingly although we have user accounts implemented, most of our customers don't use them.

But yes, as it stands, if you create a user account it is distinct per country.

praneshp · on July 24, 2016

Why would having federated logins be difficult to implement?

matthewrudy · on July 24, 2016

It's just not important enough to do.

Startup resources, business priorities, customer behaviour...

dbbk · on July 25, 2016

Surely customers travelling to new cities/countries they don't know is significant behaviour?

steven2012 · on July 24, 2016

I've traveled to a dozen cities with Uber and didn't have to change my account, even in China. So the answer is no.

dbbk · on July 25, 2016

They said they were an Uber-like system, not Uber itself.

uola · on July 24, 2016

"Yet that's what has to work well to beat the competition, which is taxi dispatchers with paper maps, phones, and radios."

In some part (maybe even a large part) of the world, yes. But in markets where taxis are from different companies (or in other ways make ordering one in advance more common) already has fairly sophisticated technology. Let's not promote the myth of startup exceptionalism. Uber has modern (but not futuristic) technology, but the real difference is in the business model.

zer0gravity · on July 24, 2016

You know that saying : "I didn't have time to write a short letter so I wrote a long one".

With this kind of technology stack you end up when you try to move fast. I'm sure that if more time and thought would have been put into it, it would have been more elegant and simple. But has time these days ?

chiph · on July 24, 2016

I had similar thoughts. "Wow. All those moving parts. Each one of which could fail." Each new piece of unique technology added means that it's probability of failure gets multiplied against what you already have.

carterehsmith · on July 24, 2016

According to the article, this is not their first take at the architecture. There are links that point to their older setups.

So the current architecture is what they came up with after some time and some thought have been put into.

daniel-levin · on July 24, 2016

Well, to some extent, they do. Their geofencing [0] service takes in coordinates and returns the geofences these coordinates fall into. They make this faster by pruning irrelevant geofences:

>> Instead of indexing the geofences using R-tree or the complicated S2, we chose a simpler route based on the observation that Uber’s business model is city-centric; the business rules and the geofences used to define them are typically associated with a city. This allows us to organize the geofences into a two-level hierarchy where the first level is the city geofences (geofences defining city boundaries), and the second level is the geofences within each city.

[0] https://eng.uber.com/go-geofence/

asdfaoeu · on July 24, 2016

> We assign cities to the geographically closest data center, but every city is backed up on a different data center in another location.

matthewrudy · on July 24, 2016

I tried to debug this from Hong Kong, and the API calls definitely go to the US.

Maybe in Europe they resolve to a European data center, but in Asia they still seem to send everything via the US.

taxicabjesus · on July 24, 2016

> Yet that's what has to work well to beat the competition, which is taxi dispatchers with paper maps, phones, and radios.

Large taxi companies started to switch to GPS-enabled electronic dispatch systems at least 12 years ago.

mooted1 · on July 24, 2016

Sharding your application geographically is a quite a bit of complexity and requires a lot of work developing support infrastructure to manage load balancing, failover, and placement. One of the advantages of SOA is that different services can have different architectures.

To be precise, we do do geographic sharding in the services that benefit from it, but avoid it in the services that don't.

Also note that the assumption of region based partitioning doesn't extend to all applications. Analytics, for example, may want to dice and slice the data along different dimensions. Partitioning is a convenient abstraction for managing marketplace scale, as you mentioned, but inconvenient elsewhere :).

jdavis703 · on July 24, 2016

Riders and drivers are also not local. I travel a lot and yet my star rating, profile picture and payment details work regardless of if I'm in the Bay Area, Berlin or DC. Further I've heard of Uber drivers giving rides to other regions, e.g. SFO airport to Sacramento (apparently fairly common as the Sacramento airport has limited service and is expensive).

golergka · on July 24, 2016

> Facebook originally tried to do that, then discovered that, as they grew, friends weren't local.

Interesting. Do you have a link on that?

stygiansonic · on July 24, 2016

Basically, it was a way to partition users as some form of horizontal scaling. See these two Quora posts:

https://www.quora.com/How-did-Zuckerberg-code-Facebook-so-fa...

https://www.quora.com/In-what-order-did-Facebook-open-to-col...

Animats · on July 24, 2016

EE380 talk at Stanford with Facebook devs. They were there to talk about their PHP compiler, but got into how Facebook scales.

nl · on July 24, 2016

Yet that's what has to work well to beat the competition, which is taxi dispatchers with paper maps, phones, and radios.

Taxis are a only part of the picture.

Logistics is where the money is. Logistics is where network effects are important.

FedEx doesn't use paper maps.

undergrowth54 · on July 24, 2016

Why would Uber/Lyft use be localized? I tend to use it far more when in an area where I'm unfamiliar with the local transit network.

imaginenore · on July 24, 2016

But you are local to that unfamiliar area when you search for a ride. That's what they mean.

e1g · on July 24, 2016

I'd love to know how many people are responsible for devops/operations/app at various stages of any company's journey. Wikipedia says Uber employs 6,500 people so if even 15% of that is on the tech side of the business that's still 1,000+ people allocated to tech. I think this metric would be a useful reality check for a "modern" SaaS project with 3-10 people that's trying to emulate a backend structure similar to the big league.

There are 20+ complex tools listed in the stack, and to run a high-visibility production system would require high level of expertise with most of them. Docker, Cassandra, React, ELK, WebGL are not related in required skills/knowledge at all (as, for example, Go and C are). Is it 5 bright guys and girls managing everything, like the React time within Facebook? Or a team dedicated just to log analytics?

jlarocco · on July 24, 2016

I don't know the numbers, but at least some of Uber's tech employees are working on things that aren't directly connected to the app and rides, like mapping and self-driving cars.

One of their recruiters contacted me a while back, and it sounds like they're working on some really neat stuff, but I don't agree with all their business practices, so I didn't pursue it :-/ In any case, he pointed to their website: https://www.uberatc.com/

_qc3o · on July 24, 2016

That's all bloat. Pure and simple. At the end of the day Uber just does routing and basic allocation. It's a simple operations problem that has been solved since the 70s and no one back then needed ELK, Docker, Cassandra, etc.

I've seen this bloat everywhere. It is usually a result of internal politics and posturing by management types. The kinds of people Steve Jobs would have called B and C players. Now the actual people operations is another matter entirely but the tech stack definitely doesn't need to be that complicated.

Animats · on July 24, 2016

I have to admit, I could see this running for a city the size of SF on a desktop machine under the table at the taxi depot. Uber has 11,000 drivers in SF, but probably only a few thousand are on at any one time. A ride takes a few minutes, so if you figure 3,000 active drivers and 4 rides per hour, that's only about 3 ride transactions per second. You have a transaction at ordering, one at ride start, and one at ride end. Plus you have tracking of where all the active drivers are, pinging maybe once a minute. That adds up to only at 10-20TPS. You can offload the routine web and app stuff to some front end machines. And you want to do some analysis every minute or two to see where there are "surges".

The only non-trivial part of this is assigning drivers to rides.

ricardobeat · on July 24, 2016

It's easy to imagine the simplest stack that can serve the core features of any service, and that is well served by a single box. What's missing from the picture is the infrastructure to replicate this 500 times by separate teams, monitoring all of it, backup, auditing, aggregating customer and business metrics, back-office systems, and more. Plus the fact that these things always grow organically and embed a host of imperfect decisions - the imaginary system will always be better designed.

tlrobinson · on July 24, 2016

This. Armchair software architecture is so easy when you can gloss over the details that make a product great.

Also routing, ETAs, geocoding/search, etc.

WebGL visualizations and such are probably overkill, but if it makes the company more fun to work for then it probably breaks even, at worst.

danpalmer · on July 24, 2016

Couldn't agree more. It's all the invisible details that cause the load.

It's a much more trivial example, but highlights the point well I think - we have pages in the app I work on that would respond in ~100ms, but might have a single sentence on them that takes another 100ms to generate because of the complex data relationships involved in figuring out what that sentence needs to say. The 'request handler' might be 20 lines of code, with a 50 line util function to generate that line of text. No armchair architect will ever take into account things like that, but the end result is a page that is just a bit more personalised to the user and therefore improves their experience.

In an app of any real size, I imagine there are anywhere from hundreds to many thousands of tiny little details like this that all together drastically increase the amount of power needed to run a service.

orf · on July 24, 2016

> No armchair architect will ever take into account things like that

An armchair architect would say it's not needed. They would question whether spending 50% of your response time generating a single sentence is in any way worth it, and wonder what kind of architectural mistakes led to that.

jdavis703 · on July 24, 2016

The problem with this line of reasoning is that it implies the business exists to serve the software. Unless you work at a tech-focused non-profit, the software actually exists to serve the business.

orf · on July 24, 2016

> the software actually exists to serve the business.

Sure it does, but the business also wouldn't exist without the tech in Ubers case (and a lot of other cases). And it's going to be your head on the line when you keep adding these 100ms sentences because the business wants it for no good reason and your page takes 3 seconds to load, and nobody buys anything from the site.

robbles · on July 24, 2016

You're making the assumption that the additional features slowing down the service aren't adding value.

More common is a "death by 1000 cuts" scenario where the various causes of slowness are apparent to the developers, but quite difficult to remove because they've become necessary to the continued success of the business.

orf · on July 24, 2016

> You're making the assumption that the additional features slowing down the service aren't adding value.

No, I'm questioning whether the value added is greater than the value lost, and in this hypothetical example clearly not. So it's your job to point that out to whoever and not silently obey.

Animats · on July 24, 2016

The Uber attract screen that seems to show lots of cars near your location has no relationship to reality.

WebGL is mostly client side. Ship the list of data points and let the client do it.

gaius · on July 24, 2016

We saw this in the 90s, people with a whole rack of machines running Java or Perl CGIs to serve a site with less traffic than we were doing with a single, ordinary box running NSAPI. You need loads of scaffolding that you mention, only if you are trying to fit a square peg into a round hole.

mkagenius · on July 24, 2016

> tracking of where all the active drivers are, pinging maybe once a minute.

Once every 5 seconds will be more accurate.

dzhiurgis · on July 24, 2016

Driver app pings central server every 15 seconds. If any ride requests are available, it displays it to driver.

ambicapter · on July 24, 2016

And will 30x the number of requests...

karmelapple · on July 24, 2016

I'm wondering why matching is nontrivial.

When I use these services, it doesn't always give me the closest car of the ones shown. It's all about which driver accepted my request first. Isn't it a fairly easy calculation of "all cars within X km or the nearest Y cars", and then whichever driver taps first gets matched with you?

_qc3o · on July 24, 2016

Indeed. Most SaaS operations are over-provisioned both in terms of hardware and software.

goldenkey · on July 24, 2016

And over-provisioned in terms of employees, I'd add.

asdfaoeu · on July 24, 2016

> At the end of the day Uber just does routing and basic allocation.

The thing is though that that algorithm is easily copied as evidenced by lyft and etc. So really Uber's business model needs to be all about differentiation, marketing, analytics and prediction otherwise they won't survive.

I think analytics and other fuzzy avenues are where those technologies shine.

gaius · on July 24, 2016

That's their business model? Really?

I think we both know that their business is nothing to do with the technology they use.

UK-AL · on July 24, 2016

Docker and Cassandra don't solve basic routing. They solve the fact you have have millions of people hitting you site, and many servers to manage.

You didn't have that problem in the 1970s

gaius · on July 24, 2016

Credit card processors and airlines did millions of transactions a day in the 1970s, and they managed it with CICS.

uberonhn · on July 24, 2016

It sounds like you have no idea how credit card transactions were or are processed. They are almost exclusively file and batched once a day, even today. Back in the 1970s it was even worse because there was no real time authorization.

dang · on July 24, 2016

This is helpful information if accurate, but please edit incivility like "It sounds like you have no idea" out of your comments here. The site guidelines ask you to omit this sort of thing, so please post civil and substantive comments only:

https://news.ycombinator.com/newsguidelines.html

https://news.ycombinator.com/newswelcome.html

gaius · on July 25, 2016

He or she is correct that the cash settlements were and are batched, but authorizations, reservations etc were online - to the mainframe that's just another transaction.

The reason settlements were batched was to save on wire fees.

pfarnsworth · on July 24, 2016

I love how any armchair quarterback on HN can sit back and dismiss the work of thousands of engineers as bloat, with no actual qualifications of their own.

Build a multi-billion dollar company that has satisfied customers around the world, and then let's hear what you have to say.

Let me guess, you also came up with the idea for Google Adsense and the iPhone in high school, right?

_qc3o · on July 24, 2016

The business idea is a separate concern. A solid business model can withstand all sorts of abuse and incompetence at all levels. Many eBay CEOs besides their best efforts to destroy the company have been unable to do so. Similarly for PayPal and a few other companies that have excellent product/market fit.

Each of those companies survives despite the best efforts of 1000s of engineers and managers to over-engineer and justify their salaries. So the logic of "1000s of people have worked on this so it must be valuable" is incorrect reasoning. The more pertinent question is how do these companies survive despite all the over-engineering that is happening? Once you ask that question you are almost surely led to the conclusion that the technology is not as relevant as people would like to think and inefficiencies at the technology level have very little effect on actual business outcomes when the product itself provides value people are willing to pay for.

gaius · on July 24, 2016

Like bureaucracy, the complexity of your tech stack grows to accommodate the number of people available to work on it. Go crazy on the hiring, and what do you expect all those people to do all day?

danpalmer · on July 24, 2016

That's a very uncharitable views, and it could be argued the other way around, that the hiring occurs to support the need for more people in engineering. I'm not certain which way around it goes with Uber, but I've seen both.

rimantas · on July 24, 2016

Exactly. I saw the presentation of one of thei tech guys and was very surprised by it. First, by a number of IT people they have, second, by the work they do. It looked inventibg problems and solving problems for the sake if problems and solutions, i. e. no business value in it.

candidates · on July 24, 2016

I disagree. Even on hacker news, people rarely express such absurd things with so little confidence. You fail to take into account many of the following:

* Extremely high volume. Uber has indicated elsewhere that they receive upwards of a few hundred thousand requests per second on just one service. Please show me the logistics stack that did this in the 70s.

* Yes, building the first version of something is extremely cheap and easy. But being able to improve it becomes harder and harder. Especially given high volume, modern companies need sophisticated analytical tools that provides reliable data to both technical and non technical staff. Please show me the analytics stack that was able to ingest, store, and analyze terabytes of business data in realtime from the 70s.

* Reliability. Modern web applications need to fail gracefully and be debugged quickly. Please show me the logistics and routing stack that was capable of extremely high uptime while being deployed constantly and serving hundreds of thousands of requests per second from the 70s.

* Extensibility. Businesses need to extend to new markets. Moves like this often invalidate past assumptions. In order to support business flexibility, modern engineers deliberately invests considerable time into building decoupled components that can be reused as platforms instead of stuffed into a monolithic codebase. Please show me the operations and routing stack that could easily be reconfigured to enable such products as Amazon Web Services, Uber's external API, the google maps API, or Uber EATS—from the 70s.

To make this more concrete, I worked on a routing stack at another company which probably works similarly to Uber's ETA systems. When considering these things, it's important to keep in mind the dependency tree of each new problem set and the work required to make those dependencies work reliably at big scale.

To give you an idea of what this area alone entails:

1. Machine learning. - wiring together of and improving algorithms: linear regression to begin with, then random forest, then neural networks. - ensuring data required for learning is reliably available and correctly computed. - tools to launch, deploy, test these models.

2. Working with map data in memory many times larger than what fits onto the smallest consumer laptop. - how do you handle updates of data? - what if you want to use different data sets in different places, because they're more accurate? - how do you debug errors in the data without visual tools (hint: it's really hard and time consuming)? - how do you optimize loading this data into memory without requiring hours to deploy your application? - where do you even store this data?

3. Requests per second in the hundreds of thousands and latency requirements (in order to ensure the app responds quickly) hovering around 10ms. - how do you profile complex distributed applications? - what optimizations are available to make graph search faster (hint: A* isn't fast enough)? - how hard is it to implement these optimizations?

4. Data science and data science tools - Visualizations! - again, reliable data pipelines

That's about what one team works on over the course of a year. Note the dependencies we have here:

1. We assume access to cloud infrastructure that doesn't require us to do all of our own devops.

2. We assume mature and automatically scaling data infrastructure: that kafka and storm have been set up and tuned to a degree that we don't have to worry about it. In reality, kafka alone requires a team of at least a dozen at linkedin to keep up with the maintenance, operations, and optimization burden of keeping up with scale.

3. We assume mature and scalable service oriented architecture tooling—if a call to another service is slow, I should be able to see on a dashboard what service is slow, how frequently, it's slow, why it's slow (if it depends on another service) etc.

and countless other things I could spend days enumerating for you but I guess it'd be wasted on you because you're pretty convinced you already solved these problems in the 70s, so why am i wasting my breath

dang · on July 24, 2016

It's important to counter the trivializing sort of dismissal that people often post to HN (the old "I could build Twitter in a weekend" and whatnot). We want the culture to move more toward thoughtful, substantive critique. So your detailed argument here, based on experience, is valuable. Please don't spoil it by becoming uncivil like this:

> I guess it'd be wasted on you because you're pretty convinced you already solved these problems in the 70s, so why am i wasting my breath

With that your comment does more harm than good: it poisons the atmosphere and detracts from your substantive contribution.

You're definitely not "wasting your breath" even if you fail to persuade the other person not to be snarkily dismissive, because the real audience for a comment like yours is everybody else: i.e. the rest of us who are curious about how (in this case) Uber operates and why things might be the way they are. That audience needs to see both good information about the challenges involved (as opposed to this-has-been-trivial-since-the-70s) and a good example of how to patiently respond to a trivializing comment with a thoughtful one. It's bad if, instead, you give us a reason to wince and an example of replying to a dismissive comment with a rude one.

10921809211 · on July 24, 2016

This is an non-exhaustive list of technologies that the article mentions:

"Terraform, Schemaless, Riak, Cassandra, Hadoop, Redis, Twemproxy, Celery, Kafka, Elasticsearch, Logstash, Kibana, Docker, Mesos, Aurora, HAProxy, Hyperbahn, Ringpop, TChannel, Nginx, Thrift, Protobuf, Phabricator, OpenGrok, Packer, Vagrant, Boto, Unison, Jenkins, Clusto, Puppet, Grafana, Storm, Spark, React, SVG, Canvas 2D, Gufaru, DropWizard"

How are you going to discuss all that in a substantive manner? All anyone can do in the limited time frame of a HN discussion is draw parallels to previous job experiences or previous user experiences.

My own experiences are more aligned with the sentiment expressed in the "dismissal" comment.

dang · on July 24, 2016

Mine too, but that doesn't make it a good comment. In fact its first paragraph is almost a parody of the know-it-all internet comment.

There are a zillion ways to make the same kind of argument thoughtfully. Talking about one's own concrete experiences helps. So does not acting like you know everything about somebody else's situation.

Bloat is a problem, and so (in my view) is the kitchen-sink software culture of hauling in libraries and frameworks without thought for overall complexity. But we need to be able to talk about this at a higher level than other-people-are-idiots-compared-to-me. A much higher level.

_qc3o · on July 24, 2016

Point taken. I'll try to do better next time but it does get old after a while of seeing the same set of mistakes and articles parroted over and over again. Trivial problems blown out of proportions because people don't know the proper science, theory, and history and have opted to re-invent things badly. Uber is especially known for this since they re-invented/re-wrote basic geospatial algorithms in Go and hailed it as innovation.

The dismissal comes from years of reading such articles and then chipping away at the veneer to see what's really underneath and being disappointed every time and then working on such things and experiencing first hand how the bloat comes about.

dang · on July 24, 2016

I understand, but we need you to give us the experience and omit the dismissal. The former can dramatically improve the quality of this site; the latter only degrades it. And the former will actually be persuasive while the latter merely gets people's backs up (or makes them cheer if they happen to hate the same thing) without teaching the reader.

I get irritated at having to repeat the same things over and over, too, but the internet is basically stateless and so (sadly) is the software business. And like everyone, I get peevish when people say/do wrong things and act like they know what they don't. The longer one has been around, the more occasions one has to secrete bile. But it's a humor one must metabolize internally and not release into the community—hard work and not fun at first, but far more rewarding in its effects, and maybe our only chance at creating an actually functional culture.

_qc3o · on July 24, 2016

Agreed.

_qc3o · on July 24, 2016

I've built things that handle less than 1 tps and things that handles more than a few thousand without any significant memory or CPU load. All of these things have had uptime that has been unmatched with other systems that it has had to interface with all the while degrading gracefully and handling everything else in between. So lets just say I understand a thing or two about designing fault-tolerant systems that need to operate under high loads and degrade gracefully.

* Re: workload in the 70s. You are missing the point about logistics stacks than handle 1000s of transactions per second. The point is that Uber's problem is self-imposed. Stepping back and thinking about the problem a little will let them handle the same amount of work with 1/10 the hardware costs.

* Re: first version. The first version and the n-th version when properly designed requires the same set of gradual steps. If you build the first version to throw away then whose problem is it that you built it that way and need 10x the hardware to handle the workload because of shitty architecture? Again, stepping back and taking a holistic view and thinking a little bit is the trick.

* Re: extensibility. Same deal. Design your architecture properly and you can extend it as far as any business requirement forces it without spending 10x on hardware and software. How do you do this? Same as above. Thinking.

* Re: reliability. See above. Thousands of transactions a second with unmatched uptime. It is more likely the systems I interface with will go down or even for AWS to have an outage than for a properly designed system to fail.

1. Machine learning - already doing it wrong. You've failed to learn from history and instead are following fads and trends. When properly framed routing/allocation is a linear program and there are solvers than will solve such problems with millions of variables. Instead you have opted to complicate the problems with latest fads and trends that are not even suited to the problem you are solving. In essence you've made my point.

2. Consumer laptop? I'd hope the software runs on server grade hardware. Bringing up a consumer laptop as a restriction on memory is a non-sequitur.

3. Hundreds of thousands. Great. I can handle several thousand connections per second on a dinky c4.2xlarge instance with 10-20ms guarantee with a ruby stack. There are plenty of ways to optimize it further but I've never needed to. The literature is full of optimized and distributed graph search algorithms. Operationalizing any one of them wouldn't be much work. How do I know? Because I've done it before.

4. Reliable data pipelines have been a solved problem since hadoop and friends. This is a solved problem. Again making my point about bloat.

Re: one team over a year. Seems like you need better engineers or better designed systems. If you're developing software with more than 100 engineers and the boundaries between teams are so ill-defined that you need more than 10 per team then that's an organizational problem and highly inefficient way to do things. How do I know? Worked on teams that gelled and those that didn't. The determining factor was always reducing communication overhead by proper architectural design. The amount of communication overhead was almost directly correlated with software bloat and sprawl.

1. Devops: Solved problem. Chef, ansible, puppet. Pick one they're all the same.

2. Kafka is not good software. Pick something else for your event management pipeline. Heck, build it from scratch. Neither Kafka nor Storm are novel or required. Chances are you've over-engineered it if you are reaching for those and need to step back and think.

3. Simplify your call graph. There is no magic bullet here. No amount of dashboards, logs, and metrics will let you get around an ill-designed and bloated service architecture. Again you've made my point.

kod · on July 24, 2016

I'd love to hear why Kafka is not good software, and what open source alternatives are available that scale the way it does.

_qc3o · on July 24, 2016

Same reason any other software is not good software. Chances are you don't need it and are reaching for a shiny tool. Kafka requires zookeeper and in my experience zookeeper is an operational nightmare. If you need an event bus then there are many out there that are much simpler and easier to maintain operationally with much simpler failure modes.

Don't just reach for something because it has been the most common thing posted on programming forums. The behavioral psychologists and economists consider this a well known cognitive bug.

kod · on July 24, 2016

Many out there that scale like Kafka, so surely you can name some?

Yes, zookeeper is a turd, but it's a battle tested turd. Distributed systems aren't easy to get right.

_qc3o · on July 24, 2016

Rabbitmq, perfectly fine message bus in pretty much all use cases. Easier to operate and maintain without any extra dependencies and much simpler failure modes. A few more: zeromq, sqs, hornetq, nats, nsq, etc. Any one of those will most certainly fulfill whatever use case you have.

The point being kafka has a very heavy operational overhead and you better understand what you are getting into and what bargain you're making for the scalability you mention.

kod · on July 24, 2016

I've used Rabbitmq, it most certainly does not fulfill the volume requirements I have.

The fact that you are comparing zeromq to Kafka is pretty good evidence that you have no idea what you are talking about, and are just tossing out names from google. I'm a little disappointed, honestly, I hoped you were aware of something I hadn't heard of.

_qc3o · on July 24, 2016

There are two ways to solve problems in engineering. You either bring the problem closer to your existing solutions by redefining the problem or you keep the problem the same and bring your solutions closer to the problem.

Sounds like you are unwilling to redefine your problem so that it is amenable to solutions that are not kafka.

kod · on July 24, 2016

Yeah, you recommended a sockets library as an alternative to a distributed durable circular buffer. Not obviously clueful. Might as well recommend Nginx as an alternative to JavaScript.

I need to durably handle billions of events per day. No amount of redefining changes the underlying business problem.

Kafka, on the other hand, has been helping me solve that problem for years.

_qc3o · on July 24, 2016

Let's see. I can handle a few million on a single instance and I have yet to hit any memory or CPU limits indicating I can handle 10x of what I'm currently handling. Oh and it's about 100k or more transactions per hour at peak load. Just from basic operational observation and logs. Also, have yet to see any durability issues and I've managed to do it without kafka. So pretty basic math says the entire thing can be scaled to a "few billion" transactions in a pretty straightforward way. Then again I'm more willing to redefine my problems to come up with simpler solutions.

But this discussion has devolved into personal insults at this point. We have nothing to teach each other it seems.

kod · on July 24, 2016

If nothing else, I could teach you that zeromq has nothing to do with queing or durability.

It's certainly possible that Rabbit has improved in the years since I used it, if it works for your use cases, great. But don't assume that everyone using a popular technology is doing so because of a fad or without understanding the tradeoffs.

kccqzy · on July 24, 2016

You certainly have an interesting argument but it doesn't sound convincing. Can you explain more about how 1970s technology can solve the problems Uber is facing in 2016?

_qc3o · on July 24, 2016

Here you go https://en.wikipedia.org/wiki/Operations_research. Go to town. The history section and the problems addressed section is more than sufficient. You can expand further if necessary and decide for yourself whether what Uber does is in any way novel and whether it requires all the bloat.

1128128112 · on July 24, 2016

https://blogs.harvard.edu/philg/2009/05/18/ruby-on-rails-and...

garyclarke27 · on July 24, 2016

True, I think their CTO must be confused and easily mislead by techies who just want to get the latest buzz words onto their cvs. Silly comments such as - we gain true insight from pretty graphics rather than tedious sql queries - says it all for me. I suppose they have to burn the insane amount of capital they raised $50 Billion somehow?

danpalmer · on July 24, 2016

I recommend reading "The Visual Display of Quantitive Information" by Tufte. I would have partially agreed with you before, but I really do think that correct visualisation of data can make it vastly more useful, and as a few other commenters have noted, Uber has a big challenge to differentiate themselves from Lyft and others, and effective use of data could well be one of their differentiators.

icebraining · on July 24, 2016

That's quite condescending; their CTO is no rookie.

NotQuantum · on July 24, 2016

Uber is really strapped for engineering talent. Especially when it comes for SRE. Myself and many friends working SRE at various Bay Area companies get consistently hit up for free lunches and interviews. It's really weird considering that their stack doesn't NEED to be this complex....

brown9-2 · on July 24, 2016

It's really weird considering that their stack doesn't NEED to be this complex....

This is such a silly statement to make from the outside of any organization.

joeblau · on July 24, 2016

It probably could be more simplistic. It seems like with enough engineers every company I've ever worked at eventually ends up using every technology they can because of the one thing it does well.

r2dnb · on July 24, 2016

>because of the one thing it does well.

This "one thing it does well" business is then presented as : "using the right tool for the right job" and it's difficult to argue against that because the counterpart can easily deride you as a fanatic of some technology, someone not objective enough, etc...

It is however interesting that we used relational databases for virtually everything for decades even though SQL is suboptimal at most things if we take them in isolation. Some will argue that people are now realizing their mistake, but the truth is these companies were successful and we were all getting our paychecks. (PS: I choose to use NoSQL for virtually all my projects)

The real driver shouldn't be the one thing it does well. Many times - if not most of the time - it's preferable to use a tool optimal for the most important parts and suboptimal for the rest. I personally prefer to provision two more instances, than to add two more technology stacks.

rantanplan · on July 24, 2016

> It is however interesting that we used relational databases for virtually everything for decades even though SQL is suboptimal at most things

You have no clue what SQL or ACIDity is. For 99% of the cases SQL/RDBMS is the right choice. You probably think you belong in that 1%, but from your comment, I suspect you do not.

> I choose to use NoSQL for virtually all my projects

That's because you have no important data to store.

When you get to store data that are important to your customers you're gonna have a big revelation.

anonx · on July 24, 2016

>> You have no clue what SQL or ACIDity is.

"NOSQL" doesn't mean "no ACID". There are plenty of NOSQL DBs that are ACID compliant. And SQL is not the only way to write your queries. There are a lot more QLs.

rantanplan · on July 24, 2016

Even though what you say it's true, my comment is still correct and relevant to the OP.

Also, of the NoSQL DBs that support ACID, I wouldn't touch them for any serious work or primary data at least. None of them are battle-tested in the same way Postgres is for example.

And again, people who really need these type of DBs fall into the 1%, and I'm being very generous.

r2dnb · on July 24, 2016

>You have no clue what SQL or ACIDity is.

That's quite an attack. I trained for an Expert SQL certification from Microsoft back then, when I was writing 3000K+ long stored procedures to migrate an Access application at a fortune 40 company. So I know what it is and I know quite a good deal about RDBMS. I'm not among those who criticize what they don't know.

Regarding the gist of your comment on NoSQL, I haven't been able to convince people coming from where you are with two days of meetings in a row, so I'm fairly confident I'm not going to change your mind on HN.

rantanplan · on July 24, 2016

If you have a clue, as you say, and still believe that it's a good idea to store critical data with NoSQL then I don't know what to say.

Obviously you don't care enough that almost every NoSQL solution out there has been found to make false claims about their guarantees. The billions that have been sunk in the blackhole that's called NoSQL in the last decade is unprecedented.

You don't have to change my mind. I have(and still use) both. And I still maintain that people who use NoSQL for 99% of their projects are making the wrong choice.

brianwawok · on July 24, 2016

So I think a lot of people agree with you on

> Many people choose the wrong tool for the job

I think that is far past database choice. I think many people build SPAs that end up hurting the product over a traditional setup. I think many people use microservices where a monolith would have much better performance and reliability.

As for the basic premise

> that people who use NoSQL for 99% of their projects are making the wrong choice.

Is perhaps kinda right? Some people may really only touch giant data sets. So for them always using NOSQL is smart. The people that write webapps with 12 users? More questionable.

Most cases you can decide if you need to leave RBMS with something like.

1) Do you need to store in the next year > 100GB of data that you need to access in realtime?

2) Do you need in the next year to store > 1TB of data that you need to access in semi-realtime?

3) Do you need in the next year to handle > 1000 writes per second?

4) Do you need in the next year to handle > 1000 reads per second?

Not a perfect guide, and I am sure you can think of edge cases that can still be dealt with in a RDBMS.. but it is a decent starting place. One tricky part is that if you are optimistic, almost any app can check off #3 or #4 (Like Uber but for Baby Strollers). Knowing how to realistically estimate demand for a possibly viral startup is hard.

rantanplan · on July 24, 2016

The above is good as a rule of thumb indeed.

Another one that I'd add is: - "Are the records in each table in the hundred of millions? Then most probably you'll do fine with an RDBMS".

If you go above that, or you have operations that will extrapolate that number in the billions then you can offload them into whatever non-RDBMS storage you want and do your thing. But that's the thing with RDBMS, you can always move(or offload part of) your data to a non-RDBMS solution afterwards.

But doing the inverse? I wouldn't want to be in that person's shoes ;)

brianwawok · on July 24, 2016

Does row count matter that much compared to data size? I.e. if I have a billion rows but they are 2 32-bit ints, that isn't a lot of data (2 GB + index). I guess the index starts to get pretty big.. but I always just think of raw data size vs # of rows.

rantanplan · on July 24, 2016

Remember, it's just a rule of thumb. Now... tables with 2 32-bit ints as columns are not exactly typical RDBMS data.

Also, data in RDBMS are... well relational :) Meaning, the rows of just one table are not that important. The data are going to be queried and combined with data from other tables. And I know that typical relational data that consist of hundreds of millions of entries in each table is something that most DBs can handle.

Again, rule of thumb :D

brianwawok · on July 24, 2016

Fair enough!

orf · on July 24, 2016

> 3000K+ long stored procedures

You wrote a 3,000,000 lines long stored procedure in Access? That's ridiculous and it's hardly a testament to "how much you know about RDBMS".

r2dnb · on July 24, 2016

Wow, I meant 3K thanks for pointing this out ;) And I didn't wrote the Access application, I was migrating it.

brianwawok · on July 24, 2016

Why do you think NoSQL means not important data?

MongoDB eats data but I am not sure every NoSQL database does.

NoSQL does add certain kinds of complexity, but also simplifies certain problems. Depends where your hard problems are..

rantanplan · on July 24, 2016

Because in the 2 decades I'm in the industry I see RDBMS make the world spin and NoSQL DBs destroying companies and families.

MongoDB and CouchDB eat data for breakfast, I know that from 1st hand experience. And all the others DBs that claim that do not keep cropping up in Aphyr's blog.

I ain't saying that all NoSQL dbs are useless. I'm just saying that proposing and choosing an RDBMS solution is going to be the right choice for 99% of the projects.

Yes, most people think that they belong in that 1% where they have the infrastructure problems and big data of Google, FB and Twitter but.... they don't.

threeseed · on July 24, 2016

In the last 2 decades in the industry as well I've never lost data with MongoDB, Riak or Cassandra but have with Oracle, DB2 and PostgreSQL. After all databases are just software and there will always be bugs. Some people just get tripped up by different ones.

And you are woefully ignorant to think the RDBMS is the right choice for 99% of projects. Especially since you think that the 1% of remaining users are purely worried about scalability. Hint: think about the schema problems associated with storing auto generated features from deep learning models.

rantanplan · on July 24, 2016

>In the last 2 decades in the industry as well I've never lost data with MongoDB, Riak or Cassandra but have with Oracle, DB2 and PostgreSQL

Yet every test proves otherwise. Also, use Google to see how people have lost data with MongoDB. Mongo is not considered a serious piece of technology by any scientist or engineer I know. Postgres though is universally considered an engineering marvel.

>Hint: think about the schema problems associated with storing auto generated features from deep learning models.

Hint: The problem you mentioned? Even less than 1%

Calling me ignorant doesn't change reality you know.

dominotw · on July 24, 2016

Is data-loss something inherent to nosql tech or just poor implementations?

If its the latter why haven't there been any reliable nosql implementations.

Perhaps its well suited to non transactional, low fi data?

rantanplan · on July 24, 2016

NoSQL DBs usually target distributed environments.

So... enter CAP theorem. There's no free lunch. People think we can simply throw away half a century's worth of science because JSON and schemaless are teh awesome derp derp.

Implementation is surely an issue, if you take into account that the mongodb guys had to acquire another company [1] in order to overcome their abysmal write performance. And yet there were people, and benchmarks that were trying to tell us that mongo was faster than RDBMS alternatives. All this circa 2009-2012.

You know what's faster than everything? Writing to /dev/null ;)

Anyways, depending on your use case there might be a NoSQL out there that might fill your needs and it might actually deliver what it claims it can deliver. But it's hard to sift through all this ad-driven, buzzword-ridden informacials that gets thrown around by start-up companies in the DB domain.

Also, DBs are like filesystems; even if the match/science is correct, it needs at least a decade of proven track record before you can say that it works as advertised.

[1] http://www.informationweek.com/software/information-manageme...

dominotw · on July 24, 2016

> NoSQL DBs usually target distributed environments. So... enter CAP theorem.

Surely FB is not running MYSQL on a single machine. Perhaps i am misunderstanding what you are saying but saying SQL db's dont face the issues of distribution seems a little strange.

Distribution comes into picture from shape and size of the data not data saving/retrieval techniques. yea?

rantanplan · on July 24, 2016

FB and all big companies are a very bad example. They have ton of resources and usually they don't use vanilla products, since they have the engineering capacity to support their own forked versions. e.g. see their own version of PHP.

Also distributing reads is easy, writes... not so much. NoSQL systems usually offer distributed writes with the caveat of eventual consistency. RDBMS have referential integrity and other constraints which by definition cannot migrate into a distributed environment. Or at least there's not a one size fits all solution.

> Distribution comes into picture from shape and size of the data not data saving/retrieval techniques. yea?

Most definitely not. It has nothing to do with the shape and size of data. Also.. there's not such thing as "distribution" in our context. Only "distributed", from "distributed computing"[1] and it's everything to do about data saving and retrieval :)

[1] https://en.wikipedia.org/wiki/Distributed_computing

dominotw · on July 24, 2016

>RDBMS have referential integrity and other constraints which by definition cannot migrate into a distributed environment.

so,

Use RDBMS if your data can be handled by a single machine( or have the resources of FB) ? '99% ppl need RDBMS' argument boils down to 99% of ppl have data that can be handled by a single machine RDBMS.

Is that a good conclusion?

rantanplan · on July 24, 2016

The single machine shouldn't be the deciding factor.

If your application is like most apps(far more reads than writes) then you can easily distribute the load across multiple machines. If you have more writes than reads(quite rare but still) then scaling an RDBMS will be challenging.

In this case, if eventual consistency is something you can live with, a NoSQL store might be best for you.

threeseed · on July 24, 2016

MongoDB doesn't eat data any more than any other database.

It's used by eBay, Foursquare, Adobe, Facebook etc.

And NoSQL databases underpin most of the popular websites around today. It's nonsense to assume all of that data isn't valuable.

rantanplan · on July 24, 2016

Foursquare? For real?

Like what's gonna happen if they have a couple of corrupt data? A minor incovenience at worst?

Is anyone gonna lose millions? Nah. Anyone gonna die? Nah. Anyone gonna get sued? Naaaaaah

Also Facebook uses MySQL for their primary data. Pretty sure it's the same for ebay. Don't know about Adobe, I bet it's the same deal there too.

People get so excited when they hear some big company using X, but they have no clue in what capacity it's used. I can guarantee you that all the data that matters, that need to be consistent and whole are in some kind RDBMS.

threeseed · on July 24, 2016

MongoDB is used in Facebook for Parse, eBay for analytics and Adobe for Experience Manager. All are pretty important parts of their business. In particular the latter which if there was data loss would cause the biggest shockwave in the web community.

But no point discussing it with you since you think: Sony Playstation Network, Apple iCloud, Office 365 etc aren't important data to these companies.

goldenkey · on July 24, 2016

Parse is absolute garbage. Thats why it was shuttered https://techcrunch.com/2016/01/28/facebook-shutters-its-pars...

rantanplan · on July 24, 2016

Have you actually used Parse? Obviously not, because you wouldn't dare mention that POC in this discusssion. Hint: search around about experiences.

There's no point discussing with me, because you can't have a coherent debate. Analytics data are not critical neither primary. You really have to reread what I said.

dominotw · on July 24, 2016

what are some of the skills/experience needed to be an SRE?

I've been having a really hard time finding a job due to being a 'jack of all trades' and having no specialty. Just an assumption. I have over a decade of experience building webapps.

I've spent over 500 hrs on interviews over the past 3 months doing countless coding tests/exercises, whiteboard interviews. I just seem to never get past on-site interviews.

jedberg · on July 24, 2016

> I have over a decade of experience building web apps.

Were you also running those apps? SRE means you understand the intricacies of running an app too.

When I hire SREs, I look for people who have the following skills, in this order:

1. Leadership under pressure. What I mean is can you stay cool and calm and keep everyone around you cool and calm when everything is melting down

2. Experience operating a platform. Do you know basics like networking, system startup, system and OS tuning, etc. Can you diagnose a problem on a running instance?

3. Coding. Can you write decent code and can you understand good code.

The reason it is in that order is because staying cool under pressure is something I can't really teach you, it's just sort of innate for the most part.

Coding can be learned.

GarrisonPrime · on July 24, 2016

If they made it only as simple as it needs to be, then they couldn't patent very much. Investors want exclusivity, lots of convoluted tech-speak, big grants, etc. :P

sandGorgon · on July 24, 2016

What I'm really wondering about is their app. The UI of the app can be impacted without an app update. For example the UI during the pride parade. Or minute of silence ( http://gizmodo.com/uber-makes-riders-take-a-moment-of-silenc... )

I wonder what's the architecture of the app and the API for this.

habosa · on July 24, 2016

Most major apps phone home for a big config object (likely JSON) at the start of a run. This would contain things like car icons, etc. You can see an example of this in the 3p Uber API which has a call to get which car types are available at a given geolocation. This API returns not only the vehicle types (Uber X, Uber Black, etc) but also a jpeg icon representing the car. In this way Uber can roll out new car types in locations without a client side update.

I'm biased for this next part (since I work on the product) but if you're interested in making your app have abilities like this check out Firebase Remote Config (https://firebase.google.com/docs/remote-config/). While setting up your own config service is not rocket science, having a free one with a web UI is pretty nice.

sandGorgon · on July 25, 2016

That is for your answer. I suppose that the interesting code is on the app side rather than the server side (which basically returns a Bunch of Json) How do you architect a view layer that's so malleable - for example even the routes in Uber were shown in rainbow during pride.

nstj · on July 24, 2016

Last time I checked their iOS app pinged the backend around every 10 seconds for a big ol' payload of JSON config etc, which contained most of the A/B stuff and UI config (which car types to display, e.g., as where you are they might not have all of X/taxi/Black/Lux etc).

cwilkes · on July 24, 2016

The UI for their app can be changed without a deploy? You mean like an HTML page from yesteryear? :)

why-el · on July 24, 2016

Most likely just a simple alerts or announcements model that they fetch from the server and display into a pre-allocated section in the ios view (whatever they call it in iOS), if any. It's actually really good thinking.

tasteup · on July 24, 2016

They likely already preset views and controllers to react to backend events. If they add an event like "pride" to the backend, the app just has to render the view associated with that event.

possibleNoob · on July 24, 2016

If its updating without your consent to upgrade in the app store then its a webview youre looking at and they are just updating the webview.

tlrobinson · on July 24, 2016

I don't think it's that simple. The distinction between "code" and "data" is somewhat arbitrary. I'm sure Uber could get away with a rules engine that supports the cases the parent comment is talking about.

nevir · on July 24, 2016

Apple changed that rule a while back.

wildmusings · on July 24, 2016

Quite an intricate architecture. I can't help but wonder if all of the complexity and different moving parts are worth it. Does it really make more sense than throwing more resources at a monolithic web service? Clearly the folks at Uber think it does, and they've obviously thought about the problem more than me, but I'd love to understand the reasoning.

UK-AL · on July 24, 2016

There's only so far throwing more resources at a monolithic app can take you.

At a certain scale you have to turn distributed. Uber is at a large scale

sixo · on July 24, 2016

This is just about all the tech there is, right?

mickyd54 · on July 24, 2016

'wildly complex' wow. and they now have 'eaters'

haosdent · on July 24, 2016

"We use Docker containers on Mesos to run our microservices with consistent configurations scalably, with help from Aurora for long-running services and cron jobs."

legulere · on July 24, 2016

> Screenshots show Uber’s rider app in [...] China

Interesting to see Google maps being used, isn't that blocked in mainland China?

xorgar831 · on July 24, 2016

US phones roaming in China aren't blocked from using Google services, they may make an exception for Uber too.

Here's what it looked like for me last May: https://www.dropbox.com/s/9rkor22hn3q6z5t/IMG_2183.jpg?dl=0

ashitlerferad · on July 25, 2016

Anyone know if Uber supports the projects they use with human and financial resources?

50CNT · on July 24, 2016

So much technology, yet I still had to load the site 3 times and fiddle with uMatrix to get the page to scroll. Now, lots of people do silly things with javascript, but on a blog article on your tech stack it doesn't speak well of things.

tinganho · on July 24, 2016

This sounds like a blog post for emphasizing the more buzz word you use the better.

creatine_lizard · on July 24, 2016

If it is easy, it'd be nice to edit this the title to be not in all caps.

pfarnsworth · on July 24, 2016

[flagged]

dang · on July 25, 2016

Even if you're correct in this reading, please don't get personally rude about it.

We detached this comment from https://news.ycombinator.com/item?id=12154325 and marked it off-topic.

stickfigure · on July 24, 2016

[flagged]

icebraining · on July 24, 2016

Presumably by someone at Uber

Why would you assume that? Especially since the blog post is already a few days old, and the submitter doesn't have any other Uber-related posts.

praneshp · on July 24, 2016

Could you explain what is arrogant about it? The all caps doesn't come across as arrogant or weird to me on the blogpost.

stickfigure · on July 24, 2016

When I saw the story, it stood distinctly apart on the HN homepage as the only story title with ALL CAPS LOOK AT ME. It was definitely a HN culture faux-pas. Alone by itself this is not a serious indictment, but coming from a company with a reputation for arrogance it seemed to be in particularly poor taste.

The title of the story on HN has since been corrected to normal case.

I'm not an Uber hater, if anything I'm inclined to defend the company. But posting ALL CAPS to HN is either arrogance or carelessness or (most likely) some combination of both. I would not normally pay attention except this is a company which already has a reputation, so maybe it's actually part of the corporate culture? Or maybe I'm overthinking it.

skylark · on July 24, 2016

I think you're massively overthinking it. The poster probably just copy pasted the headline directly from the blog when submitting the link. "All caps" is sorta the Uber aesthetic and looks totally normal in the context of the blog post itself.

mandeepj · on July 24, 2016

I know all caps is a hindrance in reading but I will focus on their tech stack and try to learn the good stuff

wildmusings · on July 24, 2016

dang · on July 24, 2016

Sometimes this happens because people use the HN bookmarklet to submit a post and the original article uses all-caps typography for its title (as this one does). But it's so rare for such cases to make the front page that I don't think we need to worry much about it. If it ever becomes a problem it shouldn't be hard to deal with.

All: it's fixed now, so please let's talk about the article rather than title mishaps.

minimaxir · on July 24, 2016

In fairness, the title of the original article is in ALL CAPS (due to text-transform: uppercase), so don't assume malice on the OP's part.

scrollaway · on July 24, 2016

In Firefox, copypasting text-transform:uppercased text retains its pre-transform case. I'm a little sad that feature isn't in Chrome.

ben_jones · on July 24, 2016

I find it funny that they bothered doing all caps for an article demonstrating a (sophisticated and well engineered) mostly boring architecture.

mikecke · on July 24, 2016

For those of you complaining about the title being all caps, it was done so for aesthetic purposes. Which means somehow the submitter went through the time to uppercase each character of the HN title before submitting.

    text-transform: uppercase;

cynicalkane · on July 24, 2016

What are you talking about? If you copy-paste the title you get all caps.

mikecke · on July 24, 2016

Oops, I didn't know the behavior persists.

adamnemecek · on July 24, 2016

I think that this is new though. I don't think it used to be like this (?). I'm on Chrome.

dang · on July 24, 2016

> What are you talking about?

Please don't be rude in comments here.

thesimpsons5454 · on July 24, 2016

different for chrome and firefox :)

joering2 · on July 24, 2016

Sounds like a very solid foundation! I'm glad to see they have sufficient system in place to continue spamming the heck out of people who never opted into their advertisement in the first place.

/sarcasm

I only wish LE would treat CAN-SPAM seriously and put more sources into criminal enforcement.

ryanlm · on July 24, 2016

I just got rejected from them. I applied for a SE position, but they didn't like me I guess. They send you this really condescending rejection letter. I showed them my programming language that I built in C from scratch, and also my data structure library where I implement all the common data structures found in high level languages that I built from scratch in C, among the many projects

I have. It must have been my state school that turned them off. I know I could keep up there, but maybe they also turned me down because I'm 5 states away and they thought I wasn't worth the recruiters time.

edit: downvoter, if you could provide your rationale that would be great.

mooted1 · on July 24, 2016

When I first started interviewing, I got turned down at a lot of companies because they were concerned about my self taught background, often in spite of strong project work and interviews. In spite of so many rejections (2 offers after 17+ interviews), I've been wildly successful in my current job—I received a promotion in the first 6 months and have since held down tech lead roles.

Look, the bottom line is that companies optimize for false negatives. In order to achieve a high accuracy rate, [tests must have exceptionally low false positive rates](https://www.math.hmc.edu/funfacts/ffiles/30002.6.shtml), just based on stats. I won't work out the math for you—you sound like you're perfectly capable of plugging numbers into bayes theorem—but that implies that even very good engineers are likely to get false negative rejections at many companies. It does not mean, however, that those companies are necessarily judging you based on unfair criterion, and I don't thin it's fair, thoughtful, or mature to indicate otherwise—(especially because some of my strongest coworkers at Uber are from less prestigious schools in the midwest...)

nojvek · on July 25, 2016

I've interviewed at Uber as well. The truth is failing / passing an interview is a really bad indicator on how good a coder you are. My uber interview was like any other interview. Don't take it personally. Keep calm and move on.

minimaxir · on July 24, 2016

You are likely getting downvoted because it is off-topic, at best.