Airbnb Senior Program Yuan: Gossip Database

Lei Feng Network (search "Lei Feng Network" public concern) Note: The author Zhu Hao, Silicon Valley Airbnb senior beauty program Yuan.

Topic: Mark Bird

The previous two days Uber post had an article saying that they had switched from Postgres SQL to MySQL. The blog is well written and popular science basic knowledge of DB indexing and replication. At that time, it was transferred to a friend, and friends said that in fact, Uber only published an article in 2013 that said they transferred from MySQL to Postgres SQL. Oh, look for it. After reading this question, in fact, the circumstances behind the two articles and the reasons behind the two articles in transition can also be guessed.

As two mainstream open source databases, the "war" of MySQL and Postgres has never stopped, although the smoke is not as strong as the language of the programming language. You can go to Quora or Stack Overflow to search for related MySQL vs Postgres posts. My feeling is that each has its own advantages and practical scenarios. There is no overwhelming advantage over the other.

For most programmers, which DB the company uses, it's basically not your turn to make a decision. When you join a company, unless it is a startup company, or if you are a CTO, VP, or Director, most of the technical options should already be settled. In particular, once the DB is selected, the cost of the migration is great. Therefore, unless there is a subversive advantage or problem, very few companies will take time and effort to make such a large migration. Regardless of the selection or transformation , one factor that cannot be ignored is which technology the engineer you hired is more able to control, or which technology the decision-makers who have the right to speak tend to favor. This is actually similar to the selection of a programming language.

In fact, Uber has undergone two high-profile transitions. Similar events have also occurred at Square. Although Square was the first to use MySQL, in about 2012, due to various advantages of Postgres (such as support for geospatial data and search, etc.), and the encouragement of several senior engineers at the time, many new services tried. Has opted for using Postgres. So the company's architecture was MySQL and Postgres coexist. For me at that time, it was just an opportunity to learn, understand, and compare two different technologies. Both have their own characteristics, some things seem to be more convenient in MySQL, and others are the other way around. There are always plans to achieve, and do not think that you have to choose which one.

If DB is safe, it is sunny.

A company, if the DB never fails. It must be because there is no business or traffic. All technologies have their own application scenarios. In addition to some happy cases, there must be pits. Being able to avoid these pits as much as possible, or being able to fix them as soon as possible, is a crucial factor. During the coexistence of Square's two types of DBs, there were few people in Postgres. However, MySQL's experts have several extremely reliable ones. Most of the engineers, not database experts, have the comparative advantage of Postgres and MySQL. For us, it's more important than having someone solve doubts or save fires. Moreover, it is a burden for a company to maintain two sets of similar database systems. As a result, these services using Postgres have slowly turned to MySQL.

Because we need to pay for the transactional support, we will not use the NoSQL type, mainly MySQL or Postgres . Although the database-related knowledge and technology, because of the use of a lot more work, and slowly understand a lot, but if you really have a problem online, he is still not very smooth. Fortunately, every company will have some big databases, and some companies are called DBAs. Many small and medium-sized companies do not have full-time DBAs and are supervised by system administrators. There are a few personal relationships. In addition, you often need to ask questions related to your own system, so you come to know better things.

I always admire the database Daniel. As long as the company is a little scaled, the database is not doing well and the basics are not fun either. This piece does not have a problem. If something goes wrong, the basic problem is to see the blood seal throat. The website is directly linked to the problem. So what are the most common problems?

The first is the selection.

Each company's business is different, and the application scenarios of database systems are different. Which one is most appropriate is not the same. No system is necessarily the best. For example, to do payment must be strong transactional, consistent support, and many social platforms actually need high availability; some business writes special heavy, some business more important is reads; some businesses can only care about the recent few Days of data, therefore, can trade off the inefficiency of reading and writing old data, and some have to access historical data frequently; some businesses can solve query efficiency by adding index, and some can only be added through caching and so on. . This is why many companies have multiple database systems coexisting to optimize support for each scenario and business.

The selection was wrong, basically fell in the pit, and did not frequently step on the pit.

The other is related architecture.

What does that mean? This includes the design of the upper cache system of the database, the processing of DB connection by the program language, the function of the proxy layer (if any), and the construction of the data pipeline related to binlog. Of course, it also includes the specific design of partitioning and backup of the database system. Many companies' early tables are all in one DB. Because of the limitations of various connection pools and throughput, this is basically impossible to scale. Can reasonably arrange the separation of different tables, so that data-related stay together, unrelated or not relevant in another DB. Similar to these very simple truths, many times can alleviate the problem of scalability to a large extent.

The most common problem that we encountered in the past was human mistake.

No matter how good the system is, using a wrong posture is also a loss. Not to mention that not all engineers are database experts.

Human errors fall into two categories.

One is an error made in DB operation.

This probability is relatively low, but it is usually the greatest harm. Almost all companies will have similar legends. Common versions include:

Some engineer has either inadvertently or intentionally deleted all the data in a database core table by accident. It's not a joke. This happened to Facebook and it was a friend. Fortunately, it recovered later. This matter also became a glorious achievement in the history of his engineers.

When an engineer made an online schema change, he accidentally misstepped. The resulting database is locked for several hours. The companyâ€™s website also hangs for hours.

The last version of this version was heard by a friend of a large company in China. The details are really not remembered. Just remember when the two servers, master slave switch or something, pull the wrong power plug, and then ... there is no then.

The other is an error made in a programmer's program or in a script.

This is very common. Give a very simple example. We know that Ruby on Rails access to the database is basically done through Active Record. Active Record can limit the connection from application to DB through a connection pool. If the query in a program or script is a query without index data, which results in a full table scan, plus some parallel implementation of the web server, it is very often that all connections to the entire DB are occupied, even the kill query Can not be implemented. Only human beings can do some similar restart operations. More common is the N+1 query in the program. These are very silent, but if you have never met before, it may be... Hehe.

The last kind is the database access bottleneck caused by natural traffic growth or sudden increase in traffic.

As long as it is a database, there is a limit to throughput. As long as your business is growing, database access will one day reach a ceiling. Therefore, before this warning comes, we must do a variety of horizontal or vertical scales to continuously increase this limit. Or the traffic is offloaded by other mechanisms such as caching. There are more things to do here. I think I can write a separate article.

The other is a sudden increase in various types of traffic, such as DDoS, Marketing, etc. If it is a planned marketing, you need to prepare for various battles in advance. If it is a malicious attack, it can only rely on various defense projects (such as IP blocking, etc.) to block these visits to ensure that the database is working properly.

Finally, recommend a blog site written by DBA. The content is very exciting and there are technical feelings. The article was written much better than me.

Lei Fengwang Note: This article is authored by the author authorized Lei Feng network, reproduced, please contact the authorize and retain the source and the author, not to delete the content.