Sitting here, having made it through the holiday Eat-a-thon with nothing worse than a slight cold, and reviewing 2014, I feel that we’re looking back at a pretty momentous year in IT. I think it’s fair to say that, in Docker and, more generally, containers and microservices, we’re facing perhaps the greatest potential change in how we deliver and run software services since the arrival of virtual machines.
Such a potential change naturally brings with it lots of interest and enthusiasm, and inevitably much hype and bandwagon-jumping too. But amidst all the noise, let’s not forget that it’s really early days yet. Not just the technologies, but also the process and practices around microservices, containers, and Docker: pretty much everything in this space is either still waiting to be invented, or needs a lot of fine-tuning.
These are the 8 questions I think every organization needs to be investigating in relation to containers before making any kind of decision: Storage, Failover, Delivery model, Release strategy, Ownership, Patching, Support and Licensing.
Don’t get me wrong: I think microservices and IoT will become a central part of the way we design and deliver IT services (there’s an interesting tension here between more and more decentralized computing capacity, and pulling back ever greater amounts of data to central locations because we can’t process it otherwise, but that’s a topic for a different blog post). Microservices feature heavily on the XebiaLabs roadmap, and I think every organization needs to be at least investigating them, and containers as a very promising candidate implementation technology.
Still, I think the gung-ho “Let’s Dockerize everything” initiatives we’re seeing organizations embark on are pretty brave. To put it in a more nuanced way: if you have identified and demonstrated a significant technical benefit that containers, and specifically Docker, will bring to the delivery of your applications, and you have the kind of technical resources at your disposal that can discover solutions to complex, unsolved problems, and are willing to go through the churn and reimplementation that comes with any v1.x technology, then adoption now certainly makes sense.
But if you’re about to Dockerize everything based purely on the notion that “containers will make everything better,” and are expecting to be able to combine existing off-the-shelf solutions, get your teams trained up on some best practice and quickly have a stable, secure runtime platform up and running, you’ll be in for a lot of surprises.
Having said that, and given my belief in microservices, I think it’s critical that enterprises continue to investigate how containers (Docker, or Mesos, or Rocket, or any of other alternatives we’ll doubtless be seeing soon) can work for them. On the basis of a number of recent discussions, here are eight questions I think any organization needs to consider in 2015: the best tools for a task and not be bound to a particular tool because a build and release pipeline depends on it.
Perhaps the most obvious technical question right now: how to handle persistent, writable storage for containers? How and where to store the data, how to link to it from containers, how to replicate and back it up, how to allow data to move with containers if a container is started on a different machine, and how to do all that quickly?
In a microservices environment, common approaches to finding other endpoints your service needs to talk to are to query some kind of service registry, and/or to use stable names for services and resolve these to the actual endpoints using e.g. DNS. In an environment where bringing up a new VM may take minutes, the time taken for service registries or DNS caches to update is small by comparison. With containers that can come back up in seconds, endpoint rebinding can quickly become a bottleneck, especially if many containers are failing over at the same time.
I recently saw a very impressive demo of a solution that, using commodity hardware, was able to recover from a full rack failure – restarting, remounting and rebinding 10,000 containers – in under 10s. But this was all custom engineering and experimentation…you can’t get this kind of thing off the shelf.
What are your failover requirements? How will you address these with containers?
3. Delivery model
If you’re going to be deploying containers, what are the developers going to be delivering? Do they hand over the container descriptor (Dockerfile or similar)? If so, how do you pass around any external resources referenced from the descriptor? Or are developers going to provide a “compiled” image, avoiding the external dependency problem but increasing the size of the deliverable and making it much harder for the recipient to inspect or verify what is actually going to be run.
Or are you looking to avoid asking developers to learn yet another “delivery format” (the difficulty of getting developers to learn Puppet, Chef or a similar provisioning “language” is one of the main problems we hear about in organizations that tried to introduce them as developer deliverables), and are looking for some kind of way to “auto-generate” container descriptors or images from the same code artifacts that developers are delivering today?
4. Release strategy
Are your services sufficiently independent for you to be able to deploy each service to production on every commit, or on a set schedule? Or do you need to bundle multiple services in a release train-type setup, where the latest candidate versions of multiple services are tested and, if successful, released together on a daily, weekly etc. basis? If the latter, how many release trains will you use? One for the entire organization? One per team, or business unit?
As the number of candidate versions in a release train goes up, the chances of test failures usually also increase. Do you abort the release train if any tests fail, thereby possibly preventing 199 “good” services from going live because of one problematic one? Or does it make sense to try to remove the “bad apples” and re-test the remaining candidates?
For the actual production deployment, will you create an entire “mirror” environment (perhaps the one used for your most extensive tests in the release train?) and then gradually direct traffic over to it? Or will you only deploy the new services alongside the existing ones? If so, how do you handle a gradual bleed-over, bearing in mind that the new services may not be ones that are directly hit by users? Or do you instead switch over to the new services totally, relying on the ability to quickly switch back if there is a problem? If so, do you switch one service (well, cluster of interdependent services, really) at a time, or all at once?
Part of the decision regarding the delivery model for containers is also who owns/is responsible for which piece of the deliverable. Does Operations simply provide the container runtime, with developers responsible for any problems that happen inside the container? Is Operations also responsible for the “base descriptor” or “base image” from which each developer-provided container must inherit? How to identify whether a problem is caused by the “base” part of the image, or the developer-added part, especially if the technology allows the developers to modify or override parts of the base image?
There is also a security-related aspect to these questions: can you sufficiently isolate/sandbox your containers to prevent them from affecting other systems? Or do you simply trust that the descriptor/image delivered by the development teams will not do “bad things” – unintentionally or otherwise?
If you’re thinking about giving your development teams responsibility for delivering the full container descriptor and trusting them to do The Right Thing, also in terms of things like configuring the file system for their container, have you asked them if they’d be willing to support the system in production?
Will you need the capability to make changes to running systems, e.g. to apply security patches or other critical changes? If you are thinking about adopting immutable infrastructure principles, how quickly can you update all affected container definitions or images and release new versions? If the patch can be applied to a base descriptor or image that development teams use, can you “override” the base image used in their build process or does this require all development teams to change their source code/build definition.
If you are thinking about modifying running containers, ensure the container technology you are considering supports this, and also provides an audit trail. How will you ensure that any changes made to running containers find their way back into the container definitions?
I recently heard the following story: a company was running Docker on an operating system for which they had a support contract. But the containers running in Docker were using a base image with an OS that was not supported. All of a sudden, none of the “machines” (i.e. the containers) that were effectively hosting their applications were supported. When they started having problems, their support contract didn’t help them at all.
Do you have the necessary support arrangements in place for whatever might be running in your containers?
If you are using any components in your services that are licensed on a per-system, have you checked what the vendor’s licensing policy regarding containers is? Does the vendor consider each container a separate system, or are they regarded simply as processes on the system that is actually hosting your container framework? In other words, if you are running 10 instances of licensed process on one box right now, would your cost stay the same if you ran them in containers, or would it increase by a full order or magnitude?
So much for the questions…but what are the answers? Unfortunately, based on many conversations, I think the candid answer at this point is: nobody knows. In some areas, there are lots of interesting ideas but no obvious winners yet. Other questions are only now showing up on the radar of discussions, and in many cases, the “right” answer will depend very much on an organization’s specific context.
Luckily, given the pace at which things are currently happening, I think we’ll know a whole lot more by the same time next year. Here’s to an exciting 2015!