How I get a job at Google as SRE
Introduction
I want to describe my hiring path for the people that know me and maybe it can be interesting for those who are trying to get hired as an SRE (Site Reliability Engineer).
The hiring process lowers the uncertainty
There is a nice article about SRE hiring (https://www.usenix.org/system/files/login/articles/login_june_07_jones.pdf), it is nice because there is a beautiful graph and engineers love graphs:
In my opinion the article is interesting because it gives a good insight about how the hiring process is performed and why it can fail even if you are good candidate. But I think it is a bit scaring especially for the sentence
“Google’s practice is to hire candidates we believe to be better than our average current employee”
Another interesting point (from point of view of an engineer) is the hiring process means to collocate the candidate on the graph above. And more interviews you make, better the confidence of the score. In other words, the hiring process means lowering the uncertainty of the real score of a candidate.
Why SRE?
I was working as Unix System Administrator for 5 years. During this period I noticed a big change in the approach to the services. The causes were the rapid change of the software and the constant demanding of new resources/servers. Therefore, we were slowly pushed to manage cattle instead of pets:
While my mindset was changing, I was reading the first SRE book by Google. During the reading I found a lot of similarities with my work or scenarios where a SRE approach can reduce uncertainty and misunderstandings. For example, often the Service Level Objectives (SLOs) were not defined (instead in the SRE strategy is the first thing to do with a service), this creates conflicts between the user of the service and who takes care for the service: they have a different point of view of “it works”.
“Does the SRE role fit to me?”
The book wasn’t the only source that I used to understand the role of SRE. From the point of view of the book, it seems to be too beautiful to be real. Therefore I looked for someone that can tell me about the experience in first person.
I was lucky, because a former colleague was in that time (actually also currently) working for Google as SRE and I noticed her post on LinkedIn. I started a conversation to know better the role, how to achieve it and what was her opinion about it. She was a treasure of valuable information, we had more than one chat.
But my exploration wasn’t just LinkedIn feed browsing, I normally attend tech events if I can, and in that period I preferred events where SREs had talks and I got in touch with some of them. Even in this case I wanted to clarify the several aspects of the job.
In the end I found kind people, so it was impossible not to get along with them and add to my circle of friends. The SRE role doesn’t exist just in Google, but I think if they are the creators of it, it’s the best place where it adheres. And these friends convinced me about that.
In this paragraph I wanted to emphasized the fact that you should know what kind of job is before applying. In my opinion some people looks at it’s Google, so it’s OK. Maybe this can justify the low average career length of 3.2 years (https://www.businessinsider.com/average-employ).
How I get prepared
On the long term
Before applying for the Google SRE position, I clarified with my girlfriend what we want from our lives. We are on our thirties and the society imposes that it is time to make some choices: buying a house, having a child… We ended up that we wanted to develop our careers. For me a career progress meant be a Site Reliability Engineer, probably in a foreign country. Therefore I started to target interesting companies and attend an English course twice a week for 7 months in order to achieve the B2 certificate.
On the short term
I am an organized guy, I am that guy that before traveling has already checked all the itinerary, where to sleep and what to see. And for each objective that I want to achieve (a holiday, a big party, a home project..) I make a plan.
Therefore, I organized myself with a plan, but before making a plan I had had to retrieve information how I can get prepared. On this hand, my recruiter gave me an help sending some useful links that I can study. Besides that, I spent almost a day on searching on the web other resources.
My fear since the beginning was spending too much time in a specific topic and lacking in studying the other contents and jumping from one topic to another trying to cover all the topics. For this motivation I preferred to create a daily plan (actually just in the days off like weekends) where in each day I focus on just one or two topics.
The useful links that I collected are divided by interview topic at the end of the article, in the Appendix.
Mock-up Interviews
One of the most important thing to do during the preparation is pretending to have an interview with an interviewer. This can happen with an help of a friend. In my case I was lucky because the people who I got in touch were occasionally interviewers. We set up some online interviews like a real one. The interviews were about coding, so in front of a Google Document, I tried to develop an algorithm in Python (the language that I chose).
In my opinion the mock-up interview has two main benefits.
Firstly, you can feel the stress of that moment that is similar to a real interview. It is very important to manage the stress. Stress can undermine the performance, can block you. Learning what are the impacts of the stress (accelerated heartbeat, sweat..) on the body can help to manage it. About that, I can suggest this talk of Kelly McGonigal.
Secondly, every interviewer has a different approach. That forces you not to give for grant anything, it is better to externalize your thoughts.
Besides the mock-up interviews, I made a lot of coding exercises with a timer set to 45 minutes. It is important to be aware of the time and use it in the best possible way. For example, don’t dive in an easy and brute force solution, take the time to develop a better algorithm.
Not only I exercised the technical part, for the Googliness and leadership I recorded my own voice and listened to it in order to improve the clarity and the proficiency.
The whole process: 1+5+1 interviews
I decided to apply as SRE System Engineer, it means that I am more focused on the Unix systems than the coding (see the graph above where Unix systems = System Engineering skills, coding = Software Engineering skills). And this is an advantage because I can use my experience as Unix System Administrator.
I started knowing my recruiter, my former colleague had made as job referee for me (Thanks!), so I skipped the part of selection. The first chat was about a basic screening (education, former jobs..), what kind of job is SRE and, finally, he made some questions in order to understand if I was OK for the first interview. I found that questions a quite difficult because I was not prepared and this warned me about the level of knowledge I should have had. It was January and before that I had already started studying coding and had two mock-up interviews always about coding (my crux).
My first interview was “by phone”, actually hangout. The two topics covered were: Unix System Internals and Coding. It lasted 1 hour, half hour asking me how a Linux machine works under the hood and the other half implementing an algorithm in order to do operations on files. As you can understand I cannot reveal the actual questions. What I appreciate of the coding exercise was the practical meaning: it was not an abstract algorithm.
After one week I had the response: I passed it. One thing that sounded weird to me was the asking a feedback about my interview. I will understand later that there is a lot of mutual feedback in the interview process. Moreover, they gave me a feedback: good for system knowledge while I could improve the coding part.
After this first success I felt encouraged, OK the path was still long, but it was a first step and I quite understood what they are expecting for. In my opinion, from the point of view of the candidate, the hard part is to understand what the company is looking for. Because, unfortunately, the time in an interview is very little to show your real potential, so you have to select what is necessary.
For the next step, the on-site interview, I asked for 6 weeks to study. It can seem a lot, but if you are currently working they are few days cut out from the weekends. The on-site interview would have taken place in Dublin, the office where I was applying for. It consisted of 5 interviews, with some breaks in the middle, covering the following topics:
- Unix System Internals: how a Linux box works, from the system calls to the processes.
- Coding: the ability to implement an algorithm to achieve some results. Like in other interviews, but especially here, it is important to externalize your thoughts. If the interviewer is aware of what you are thinking, he can help you during the interview. Some questions are partial on purpose, make assumptions but tell them. The interview is done on Google Documents, so no spell-check.
- Troubleshooting: my favorite. A scenario of something that is not working is presented. With words, querying your interviewer, you have to understand what is wrong with the system.
- Googliness and Leadership: almost the classical interview about managing the projects, your time. How you interact with other colleagues. It should assess the “human” part of you.
- NALSD: you find few examples about that on internet. It is not like a general systems design. There is a chapter in the second book, but it is beyond what they can ask you in an interview. Again here, the main important thing is the way you are thinking. You have to realize if the system is feasible. I like the suggestion of Danrl who says to find a Bill of Material (BOM) of the servers. It implies some question about bottlenecks.
I would have had the on-site in Dublin, but I hadn’t. This because the Coronavirus was spreading in that days and Italy (my country) was preparing to lock-down all the citizens. In agreement with the recruiting office we postponed the interviews and arranged in hangouts a few weeks later. Like on-site interviews, I had 5 hangout interviews in the same day. Oddly, I had a good sleep the night before, normally I feel the stress.
It was very difficult to judge my performance, a lot of questions were open, I felt the coding was not too good because later I tried to answer the same question and I got with a better solution.
After 3 weeks I had a response from the hiring committee, or rather, I had another interview request for coding. It was explained that I was not so confident in the coding part and they wanted to have a follow-up interview. For me was the last resort. We arranged a new hangout meeting 4 weeks later. In the meanwhile I focused practicing Python every day.
This last coding interview was centered more on styling of the code. It was quite abstract and the interviewer seemed interested in the clarity of the code. In this case I didn’t sleep almost at all the night before, I was so nervous. I had the lucidity to do the interview and then I fell asleep.
After 10 days I had the offer! It was unbelievable. I was so happy, all my hard work paid off in the end.
It was an email by my recruiter in form of summary. Before scheduling the on-site interview I was asked for what I expected as an offer and this offer matched to my proposal.
Conclusion
The hiring process lasted 5 months. It is not so long if you take into account the fact that you do it at your own pace, and there is certain complexity behind the scenes.
From this adventure I learned that it is important to have perseverance if you want to reach a goal. I think this is the result of resisting. It was difficult to give up little happy moments with my girlfriend or friends because of studying. But in the end I made up for the lost time.
By the way, I also think I was lucky, because it is normal to not pass in the first instance. But it probably depends on the lack of knowledge for which you are being asked.
Hoping you enjoyed this adventure, cheers!
Appendix
Useful links for studying.
NALSD
- http://www.aosabook.org/en/distsys.html
- http://highscalability.com/numbers-everyone-should-know
- https://danrl.com/blog/2019/path-to-srm-nalsd/
- https://danrl.com/blog/2018/srecon18asia-day-3/
- https://storage.googleapis.com/pub-tools-public-publication-data/pdf/9b0aa90de33d2a5f6a5575f71e772f74c0f4b945.pdf
- http://highscalability.com/blog/2011/1/26/google-pro-tip-use-back-of-the-envelope-calculations-to-choo.html
- https://landing.google.com/sre/workbook/chapters/non-abstract-design/
- https://www.youtube.com/watch?v=modXC5IWTJI
- https://youtu.be/Gg318hR5JY0
Troubleshooting
- https://danrl.com/blog/2019/path-to-srm-troubleshooting-unix/
- https://tanelpoder.com/2013/02/21/peeking-into-linux-kernel-land-using-proc-filesystem-for-quickndirty-troubleshooting/
- https://groups.google.com/forum/?fromgroups=#!msg/google-appengine/6SN_x7CqffU/ecHIgNnelboJ
- https://www.youtube.com/watch?v=_8cH-QPVXsw&list=PLllx_3tLoo4c_aR8RKOOnizL5LiUH02YF&index=9
Unix system internals
- https://landley.net/writing/memory-faq.txt
- https://eli.thegreenplace.net/2018/measuring-context-switching-and-memory-overheads-for-linux-threads/
- The book “The Linux Programming Interface: A Linux and UNIX System Programming Handbook”
Googliness
- https://www.betterteam.com/situational-interview-questions
- https://www.betterteam.com/behavioral-interview-questions
- https://danrl.com/blog/2019/path-to-srm-management-leadership/
Coding
- The book “Cracking the coding interview”
- https://blog.tecladocode.com/30-days-of-python/
As you can see the blog danrl.com is recurrent, I have to thank him for his valuable resources. He described what is expected by the candidate for each topic.