Providing production support for an application is one of the most difficult aspects of software development. developer are assigned to the maintenance team and are working to troubleshoot the application. However, they are also available on call if a production downtime occurs. In this case, they are working to get the application going again as soon as possible.
This article contains a number of curated recommendations to help you avoid mistakes in production and find problems much faster. Handling these applications in production is a complicated task: often no documentation is available, the application was written in a legacy technology stack, or both. There is very little training and it is common for you to be called to support an application you know little about.
Many developers have no experience of using an application in production. There are a number of issues that arise in production environments that cause errors and failures that generally result in thousands and sometimes millions of dollars in lost revenue for the company. Since the majority of developers are not exposed to the environment, they also make some mistakes, which in turn cause these problems. This list of tips should make your work less painful by teaching from production experience.
Tip 1: Remove or automate any configurations that are required to run the application
How much configuration is required to install the software on a new server? In the past, this could sometimes take three days when a new developer was on the team. Installing the application would require many steps that would have to be done manually. Over time, the software evolves into new versions that are no longer compatible with these instructions, and of course instructions are usually not updated. Suddenly you spend a lot more time than necessary to get the application up and running.
With the advent of Containerization has made it much easier to get an application up and running in no time, without configuration, and with the added benefit that since the Docker image is self-contained, you run a much lower risk of encountering problems with different versions of the operating system, languages and frameworks used.
Simplify developer setup as well, so commissioning, including IDE setup, doesn’t take much time. A developer should be able to switch from zero to hero in less than 30 minutes.
When a production problem arises, your best experts are sometimes unavailable (e.g. vacation or illness) and you want everyone who deals with the problem to be able to solve it quickly.
Tip 2: Don’t fall into the tech stack soup trap
The fewer technologies used, the better. Of course, sometimes you have to use the right tool for the job. However, be careful not to overload the “right tools”. Even drinking water can cause serious health problems if you do too much. Every new language and every new framework that is added to the tech stack has to go through a clearly defined decision-making process and carefully weigh the effects.
- Don’t add a new framework dependency just because you need one
- Don’t add a completely new language just because you need to write a quick script to move files.
A large dependency heap can make your life miserable when libraries become incompatible or when security threats are identified either by the frameworks themselves or by their transitive dependencies.
Also keep in mind that additional batch complexities make it difficult to find and train new developers for the team. People switch roles to other companies and you need to find new ones. The turnover in engineering teams is very high, even in companies known for their excellent discounts and the compatibility of work and family. You want to find the new team member as soon as possible. Any new technology added to the technology stack extends the time to find a new candidate and has the potential to make new employees more expensive.
Tip 3: The logging must help you find the problem and not drown you out with useless details
The logging is very similar to the comments. It is necessary to document all critical decisions as well as all information to be used for your debugging techniques. It is not easy, but with a little experience it is possible to map some possible scenarios of production downtimes and then do the necessary logging to at least solve that. Of course, logging evolves along with the code base, depending on what problems are encountered. In general, you should have 80% of your logging on the most important 20% of your code – the part that is used most often. Important information includes values from arguments passed to a method, runtime types from subclasses, and important decisions made by the software, that is, the time it was at a crossroads and chose either left or right.
Tip 4: treat unexpected situations
Make the assumptions of the code very clear. If a particular variable should always have the values 2, 5, or 7, make sure that it is an enumeration type, not an int certain assumption fails. Everyone is looking for the problem in the wrong place because they take some things for granted.
Assumptions should be documented explicitly, and any errors in these assumptions should raise enough alarms that the Production aid Team can quickly correct the situation. There should also be code to prevent data from becoming invalid, or at least to create some kind of warning in this case. If certain information is to be stored in a data record and suddenly two data records are available, a warning should be triggered.
Tip 5: It should be easy to replicate a customer’s problem
One of the most difficult steps is always to repeat the customer’s problem. Often, you spend 95% of the time replicating the problem. Once you can replicate it, patching, testing and deploying is a matter of minutes. Therefore, the application architect should ensure that problem replication is extremely easy and fast. Much of this happens because the developer has to make a significant amount of application configurations to get into the same situation the customer is in. Many data records are stored, which together exacerbate the customer’s situation. The problem is that, as a developer, you have to guess exactly what the customer did. And sometimes they have carried out a series of steps, of which they only remember the last one.
The customer also explains the problem in business terms, which the developer must then translate into technical terms. And if the developer has less experience with the application, he can’t ask for the missing details because he doesn’t even know the missing details. It is not possible to copy the entire production database to your machine. So there should be a tool with which only the few data records required to simulate the situation can be imported quickly from the production database.
Suppose the customer has a problem with the order screen. You may need to import some of your orders, your customer record, some order detail records, order configuration records, etc. You can then export them to a database within a Docker instance, start that instance and you will see the same thing the customer sees. All of this, of course, should be done with care to ensure that no developer has access to sensitive data.
Tip 6: It should be obvious where the breakpoints should be placed in the application
If you have a customer screen, there should be a customer object where you can place the breakpoints for debugging a problem on that screen. Sometimes developers get into abstraction fever and develop incredibly intelligent concepts for dealing with events on the user interface. Instead, we should always rely on the KISS principle (Keep it Simple, Ster, Silly) and have an easy-to-localize method per UI event. Likewise, for batch jobs and scheduled tasks, there should be an easy way to see where the breakpoints are to assess whether this code works or not.
Tip 7: Make sure that all external dependencies are explicitly documented
Ideally, you do this in the README file within the source code management system so that the documentation is not lost. Document any external systems, databases, or resources that must be available for the application to run properly. Also note which of these are optional and add handling instructions if they are optional and unavailable.
Beyond debugging techniques
Once these recommendations are followed while new features are being created or a system is being maintained, production support will be much easier and your company will spend a lot less time (and money). As you already know, time is crucial when troubleshooting production errors and crashes. Every minute that can be saved makes a big difference. Have fun coding!
The Toptal engineering blog is a hub for detailed development tutorials and announcements of new technologies created by professional software engineers in the Toptal network. You can read the original piece by Flavio Pezzini here. Follow the Toptal Design Blog on Twitter and LinkedIn.
Chipmaker Qualcomm invests $ 97 million in Jio Platforms
Phew, hey you!
Would you like to receive the funniest daily tech newsletter in your inbox for FREE every day? Of course: Sign up for Big Spam here.