FOSSA Podcast: Early-Stage Technology Decisions

FOSSA has grown a lot since our early days when a small team worked out of a leaky San Francisco office. But we’re fortunate that several of those first employees are still with the company today.

In the second episode of the FOSSA Engineering Podcast, two of our longtime engineers reflect on early-stage technology choices, including some that our business faced during its formative years. They also offer guidance for developers currently facing similar decisions.

Episode Outline

Introductions
Single-page applications vs. multi-page applications: 3:40
Language choices and diversity: 8:59
Picking databases: 11:54
Whether (and how) to use ORMs: 13:38
How to think about rewrites: 17:45
Preparing for on-prem environments: 21:06
Linting in types: 26:12
Approaches to creating documentation: 31:11
Considerations for refactoring: 36:35
The sprint process: 42:34
Final thoughts and takeaways: 49:55

Episode Highlights

When to use an ORM

ORMs have their place in extremely basic applications. But I think most applications don’t stay there, so I’d advise against starting with an ORM.

The real problem with ORMs is that they encourage you to write methods that do a lot of database interaction in the business logic. When you look at the controllers that we have in our web application server, a lot of them have functions where they take ORM objects as parameters. And then in the middle of the function, they will reload the ORM object, or they will change a field and they will save the field. This function is impossible to unit test because the only way you can unit test it is by mocking out the entirety of the ORM contract, and there's no way I'm ever going to do that.

Instead, if you’re going to use an ORM, use it primarily for data access and then pass plain structs into your functions. Do business logic purely within the function, and then at the end of the function, return out a result that you then use the ORM to manifest back into the database.

The way ORMs are designed really discourages this sort of unit-testable, pure logic. The reason why I think you shouldn't even start with an ORM is that I have so many times written code and thought, “This code is terrible, and obviously we will throw it away in the next two weeks when we rewrite this subsystem” — only to see the code in production four years later, get angry at whoever wrote the code, and then run git blame and realized it was me.

When to rewrite (and when not to rewrite)

Rewrites are always about the balance between value and cost. And, I think there are a lot of mistakes on both ends of the spectrum.

On one end of the spectrum, people will often rewrite for technical purity, which is the wrong way to think about rewrites. Instead, you should always look at customer value. Is the rewrite actually going to enable a new feature that my customers will pay for that will make them happier? Is it going to enable increased velocity on new features? Is it going to enable me to fix bugs that my customers are upset about that are hard for me to fix today?

If the answers to those questions are no, then don't rewrite.

On the other hand, I think there's also a lot of fear related to doing a rewrite when you know it's the right thing to do. And, especially in earlier-stage companies, rewriting is much less scary than the horror stories will tell you because a lot of those horror stories come out of larger development companies.

Rewriting is easier in two scenarios. The first scenario is you have the people who wrote the original system who know the original program and who are deep experts in the original domain. I think a lot of rewrites fail because the rewriting team didn't actually understand how the original program worked and why it was designed the way it was. The rewriting team then has hard learnings about flaws that the original program was trying to actually fix

The second scenario where rewrites make a lot of sense is where the legacy code base is having an outsized impact on the morale of the team. I've said before that teams aren't real — individuals are real. And, when the specific individuals on your team are miserable at their day jobs, they are profoundly less productive.

Rewriting can often be a really good way to not just improve morale but also increase productivity by helping people get more motivated and bringing in line the primitives that you have in code with the actual concepts in your business domain that you need to be able to express.

Planning for an on-premises deployment

One key thing for companies to keep in mind when they're thinking about going on-prem is that you're not going to have access like you do in the cloud to insights into your application. This sounds kind of obvious, but it's easy to not think about in the moment.

Additionally:

Don't use cloud services. Don't use SQS or any cloud service that does not have a very strong on-premises, deployable equivalent. There are some services that you can use because they do have clear equivalents. S3, for example, has MinIO. RDS is just managed Postgres. Those are services you can rely on, but be aware that your customers will need to be able to operate them.
Kubernetes is this monstrosity of complexity, but somehow enterprises have been tricked into all being able to run it. So, if you want to target a common deployment platform, Kubernetes is actually great. When we started, we did not target Kubernetes; we targeted Docker containers. The ability to have and scale multi-node clusters when you're running at the Kubernetes level of abstraction as opposed to the Docker level abstraction is dramatically different. Kubernetes is great for being able to have a uniform on-premises deployment.
Before you have your second on-premises customer, automate the process of debugging the first one. Specifically, automate the process of pulling logs out every time you talk to that first customer and need to walk through commands to do that. Try to automate those actions — build a debug bundle, build a debug service — and that will save you so much pain in the long run. Rather than sitting on a Zoom call with your customer and telling them exactly what SQL query they need to type out into the terminal that they have screen-shared with you, you can instead ask them to press a button or run a script and generate the logs that you need them to. That makes a world of a difference when it comes to debugging.

Creating strong documentation

I have heard the take that code should be self-documenting enough that it does not require comments. I think this is completely absurd. The reason is that code is often describing a series of steps. It's describing the “how” of how to execute a program. The useful kinds of documentation are not about how to do something or necessarily what is happening, but often about why: what the specific semantics are for a function that cannot be expressed through its function signature, why we make this API call to work around this particular quirk in this particular third-party provider, and things like that. These show people who are later going to re-factor what the intent behind the behavior was, whether a particular behavior is intentional, whether it's a bug, or whether it's a workaround that was intentionally included to meet a product requirement.

The example that's always given when people say code should be self-documenting is that they'll show a comment above a function that says, “This function adds A plus B,” and then the function body adds A plus B. That comment is completely useless. The useful version of that comment is the 60-line explanation which says normally you wouldn't want to add A to B, but we do it here because of this particular edge case for this particular provider because this integration has a weird quirk and if you don't add A to B here is the Jira ticket to the bug that happens.

When to refactor (and when not to refactor)

I kind of view refactors like I view rewrites. They're obviously not the same thing, but I think people approach them with a similar hesitance a lot of the time, especially in more dynamically typed languages where you aren't too sure what contracts need to be upheld. But I actually think refactors often make sense and should be done kind of aggressively. As time goes by, we have changing requirements and semantics internally. And, if you or your team are concerned about doing these refactors, how can you possibly keep up with these changes?

Similar to rewrites, when I see teams considering refactors, especially early-stage teams, they often fall into traps on one or the other extreme end of the spectrum.

On the one hand, you have teams that love to refactor and love to write perfect code. That's nice, but your customers aren't paying you for perfect code. Your customers are paying you for whatever business value you're delivering. I think this is the much easier trap to avoid because it's pretty clear. You can usually feel it in the pit of your stomach when you're wasting time refactoring. A lot of times, that kind of refactor happens when you're not making enough money, and you viscerally feel that you're procrastinating.

The other kind of trap that people fall into is much harder to avoid, which is that you have a bunch of customers and need to ship a bunch of features — and you’re making a boatload of money — and because of that, you have no time to refactor. I think that line of reasoning is a little bit silly because your ability to refactor is sort of a direct reflection of your understanding of your current system.

As soon as you stop refactoring, you slowly begin to lose that muscle. You begin to lose the ability to refactor simply because you're not exercising that muscle anymore. For that reason, contracts will become less clear and semantics will become overloaded. These things inevitably happen when you're not focused on refactoring and you're not focused on documentation.

Episode Hosts and Guests

Sara Beaudet, Support Engineer, FOSSA: Sara is the host of the FOSSA Podcast. They are passionate about cybersecurity, open-source software, and helping people explore the world of technology.

Leo Zhang, Software Engineer, FOSSA: Leo is an engineer on FOSSA's Platform team, which owns the back-end analysis services that power FOSSA's underlying data platform.

Jessica Black, Software Engineer, FOSSA: Jessica is an engineer on FOSSA’s analysis team, which handles the translation of your projects into usable data for the FOSSA website. When not programming, she’s usually curled up with a dark fantasy book or playing an MMO.

The FOSSA Podcast: Early-Stage Technology Decisions and Regrets

FOSSA Editorial Team

Episode Outline

Episode Highlights

Episode Hosts and Guests

Containers and Open Source License Compliance

2023 Open Source Management Trends, Predictions, and Observations

Episode Outline

Episode Highlights

Episode Hosts and Guests

Try FOSSA for Free