- Programming Digest
- Posts
- Lessons learned from two decades of Site Reliability Engineering
Lessons learned from two decades of Site Reliability Engineering
Two decades ago, Google had a pair of small datacenters, each housing a few thousand servers
Lessons learned from two decades of Site Reliability Engineering
9 minutes by Adrienne Walcer, Kavita Guliani, Mikel Ward, Sunny Hsiao, and Vrai Stacey
Two decades ago, Google had a pair of small datacenters, each housing a few thousand servers, connected in a ring by a pair of 2.4G network links. We ran our private cloud using Python scripts such as "Assigner" and "Autoreplacer" and "Babysitter" which operated on config files full of individual server names. We had a small database of the machines which helped keep information about individual servers organized and durable. Our small team of engineers used scripts and configs to solve some common problems automatically, and to reduce the manual labor required to manage our little fleet of servers.
Apps, Not Ops
sponsored by Lightbend
Kalix makes building high-performance cloud-native apps so fast and easy that you won't need specialized skill sets, a massive headcount, or an endless budget for ongoing maintenance and expenses. Kalix abstracts away the complexity of the backend cloud-native landscape so your existing dev teams can build high-performance cloud-optimized applications without additional skills or resources. Explore Kalix Today.
The story when Elon Musk decided to rip servers out of a data center
10 minutes by Mike Masnick
Back on Christmas Eve of last year there were some reports that Elon Musk was in the process of shutting down Twitterβs Sacramento data center. In that article, a number of ex-Twitter employees were quoted about how much work it would be to do that cleanly.
Advice to a novice programmer
6 minutes by Mark Dominus
After you fix something significant, or add significant new functionality, make a checkpoint copy of the entire source code. This can be as simple as simply copying it all into separate folder. And more from a father to daughter advice.
Executing Cron Scripts Reliably At Scale
6 minutes by Claire Adams
Cron scripts are responsible for critical Slack functionality. They ensure reminders execute on time, email notifications are sent, and databases are cleaned up, among other things. Over the years, both the number of cron scripts and the amount of data these scripts process have increased.
The Case of a Curious SQL Query
7 minutes by Justin Jaffray
SQL is a great example of a language built on very solid foundations: it comes from the idea that we should define an algebra for data retrieval, and then we can formally define how that algebra should behave, and then we can have a common tongue between humans who want to query databases and databases who want to execute CPU instructions.
how did you like this issue? |