Almost everyone knows that working with third-party APIs can be challenging. Sometimes the errors happen unexpectedly. Sometimes the error information that you receive is inaccurate. While most people feel these pains acutely, I’d like to share how we answer these challenges at FireHydrant and how it’s helped us avoid headaches and stress.
You need to invest in having robust CI/CD infrastructure early because it’ll continue to pay compounding dividends for your entire engineering organization for the lifetime of the software.
A story of an error message that didn’t make sense?
So, I’m going to start off with a bit of a story. In our code, we’re sending a payload to a third-party API, and sometimes the payload can be significant. In this case, we received this error message: “Payload Too Large > 10KB”. My initial thought was, “oh, this will be an easy one to figure out!” However, I took a look at our source code where the error was happening and saw that we were already not sending the payload if it was over 10KB. We were comparing the payload size against the following 10,240 bytes. Unfortunately, we still saw the error.
The process that was erroring was retrying every 15 seconds and wasn’t logging the information that I need to know to troubleshoot this error. Not having all of the information that you need to debug is the worst. I knew that I needed to add some logging and get it into production to be able to correct the issue.
Developing with CI/CD at FireHydrant
Being able to add more logging and get it into production quickly is where the continuous integration (CI) and continuous delivery (CD) setup we have a FireHydrant shines. Within minutes, I added the code to log the information that we needed, precisely the payload size, and have it ready for pull request review and deployed to our staging server.
At FireHydrant, we have a straight-forward development process that utilizes our CI/CD processes to provide confidence that we can deploy soon after making changes. The CI/CD processes all occur once a branch is pushed to our GitHub repo.
Develop: Make the changes in a feature branch and this triggers the test suite execution on that branch
Staging: Merge changes to our “staging” branch and this triggers the test suite execution on the staging branch and deploys the updated code to the staging environment
Code review: All code must be reviewed by at least one other developer
Production: Merge the changes into the “master” branch and this triggers the test suite execution on the master branch and deploy to the production environment
Our test suite at FireHydrant takes just roughly 4 minutes to run, depending on the current load at our CI provider. Our entire build process, which includes various other checks and deploying to the appropriate environment can take up to 10 minutes total. This means that you don’t have to wait long to validate that your code change is working properly. Having it take too long means that the engineers on your team may not feel comfortable deploying to production to add some necessary logging and result in the time to final resolution may take longer.
Leveraging FireHydrant’s process to fix the bug
With the changes being deployed so quickly to production, I can review the logs to see the new information in our logging system and promptly figure what the issue is. In this case, the payload size was 10,235 bytes. Wait! That’s less than 10KB (1 KB = 1,024 bytes), so why did I see this error? Unfortunately, I don’t have access to the third-party API servers or code, so I don’t know why I was getting this error. The best I could tell is that 10KB actually meant 10,000 bytes. I proceeded to change the code accordingly. Again, I went through the same process of code review, testing, and deployment.
By having a test suite that I can rely on and an automated deployment process, I was able to quickly gather new information to determine the cause of the issue and deploy the fix with confidence. Oh, and did I mention that this was all done on a Friday? Something I would’ve never done at previous companies.
Not only was the issue fixed, and our customers can process as usual, but our test suite is more robust, and we have more logging information in our system for future troubleshooting. I updated a test to handle this specific case. I also left the extra logging in the code so that way we’ll have this information in our logs if we run into an issue in the same place.
Investing in robust CI/CD infrastructure matters
At FireHydrant, we’ve taken the time to invest in having a trustworthy test suite, automated testing, and automated deployments. Having confidence in our processes allows me and everyone on the engineering team to be able to dive in to investigate and resolve issues with the knowledge that the system won’t go down because we’re trying to fix something. This investment keeps paying us dividends because we use this same process regardless of the size of the issue or bug we are seeing.
So ask your team, when will it be worth your time and money to invest in a robust CI/CD solution for your software.