“Well, you know. We all want to change the world”
My title on LinkedIn is “Reluctant Architect”. This should not be considered a reflection about how I feel about my job, the company I work at, or the work that I do. It’s more of a reflection about what the rest of the industry thinks of an Architect in the software sense.
Basically, the term has been completely hijacked and the use is completely wrong. For the most part architect in the computing industry is some old white guy who’s completely forgotten anything about software development (if they ever did any) and spends all day writing PowerPoint presentations on choosing WebSphere vs Weblogic. This is then inflicted on the organisation like stone tablets being handed down from the mountain.
I can’t find enough words to describe how much I disagree with the concept, the implementation and the horrors that are perpetrated by organisations that follow this model. It is the ultimate in disempowerment, it actively discourages teams from learning and puts power in the hands of people least capable of using it effectively.
So, while I’ve given a roasting to the way my industry has traditionally handled architecture, how, and what do I do differently?
My background is software development, and I’ve spent a lot of years working with, and mentoring teams, but without a doubt the biggest influence on my recent career has been becoming a parent. I have the most wonderful 8 year old boy, who has the same level of enthusiasm for life as his father along with the youthful confidence that everything he knows is right. At this point, you have to transition from “telling” to “experiencing”. No amount of me telling George that “eating that many lollies will make you feel sick” would convince him. So, short of doing things that would actually (or likely) kill him, I encourage him to safely explore his boundaries. Quite often there is joy in the discovery, and quite often there is cuddles and comforting words after a “learning experience”.
So, being an architect, and a reluctant one at that.
(From dictionary.com) 1550s, from Middle French architecte, from Latin architectus, from Greek arkhitekton “master builder, director of works,” from arkhi- “chief” (see archon) + tekton “builder, carpenter”. An Old English word for it was heahcræftiga “high-crafter.”
This pretty much sums up what I feel about the role in general. I am an old white guy with many scars from building systems the wrong way, or seeing other teams build things the wrong way and I wasn’t quick enough to help them. I try very hard to build relationships with the technical staff across the organisation so I can influence their approaches and thinking without needing to actually tell them “do it this way”. This sounds all a bit unicorn and rainbows and holding hands summer-time walks on the beach, but I’d say there’s very few people at REA that have any doubt about my position on various topics, and what my likely response is if they test those boundaries.
Specifically, what does this look like in practice? Glad you asked! I’ll outline the process that I go through (and have done) at REA focussing on architectural change.
When I joined REA nearly 4 years ago, there was a small number of large applications, there was strong coupling and releases were painful. We were tied strongly to data centres with applications running on racked tin. Applications made many assumptions about how close they were to each other (latency for example). Control of applications were “tier” based (back-end vs front-end) and there was contention across the organisation for product releases.
Working with Rich, the main goal was to structure the organisation to allow faster releases and improve the quality of the systems (reduce coupling) to make this possible. There was a heavy investment into cloud computing (using Amazon) as the means to reduce contention in the development and testing area, with still having a pathway to multiple production deployment environments controlled by the business aligned organisational structures (we call them “Lines of Business”)
A dashboard for each Line of Business that shows all cloud providers in the world, their latencies and suitability for applications, including cost. The teams are able to deploy services across the globe, according to data transfer requirements, latency rules and follow the “cheapest” computing and storage options.
Yeah, something like that, but less Neo and more graphs.
We need to have our monolithic coupled applications split so that each Line of Business can deploy them independently. Our operational staff need to have visibility into the health of the applications without actually knowing where they are. The systems need to support increased speed of delivery of new function for each of the Lines of Business.
The final attribute is considered one of the driving reasons for these changes – so I’m going to focus on it in future sections. However, at this point most of the work that I do is making sure the technical leaders for the various Lines of Business understand the vision and the direction without interfering too much in the actual implementation.
There’s also a lot more to the ongoing strategy involved, but that’s probably another topic for another time.
I strongly value autonomy and self-discovery by the teams. I think learning by doing is the most powerful approach and Open Source ecosystems have shown that the mutations from different development approaches (generally) improve the state of the art as the development teams learn from previous implementations.
In terms of the design of “the architectural direction and improvements” I’ll explain how I’m influencing the understanding and behaviour around application deployment, modularity and most importantly monitoring and fault tolerance.
I realise that “make application deployment, modularity etc, etc better” isn’t a desirable directive, because it’s not very useful and because in many cases people don’t have a clear idea what “better” is. For developers especially many of these concepts are quite foreign, so what I aim for is smaller fine grained directives that help to provide some gentle prodding for exploration in the right areas.
By doing this, what I’m trying to get teams to “work through” is the potential difficulties involved in implementing some of the architectural improvements in their specific contexts. If I actually knew the answers I’d probably work with the teams directly, but I rarely know “exactly how” teams should implement things. I’m blessed by being surrounded by an awesome group of developers and technical specialists that are very capable of implementing improvements in their own contexts, my role is to show them the path to take.
Taking the example of “improve modularity and decoupling”. What is needed under these circumstances is independent services. However, a key part of the total system improvements, especially when relating to multiple independent systems is monitoring and fault tolerance (and investigation). REA Group use AWS for many of our system deployments, so some of this is ‘more important’ than dealing with racked tin, but the same principles should apply.
So, now we think a bit. What can I do at this point, and what principles can I impose on the teams to move in the right direction. One of the most expensive parts of software is the operation of running systems. Most of this is because the monitoring, logging and debugging tools are “left as afterthoughts”. I could say “make sure all systems have monitoring, logging and debugging according to the checklist defined in subsection 42, document 27b-6”. That sort of directive could sound familiar to many people, and is pretty much everything I despise about “enterprise architects”.
My directive was “remove ssh access to all deployed systems, nobody can log in”.
To say the response was incendiary was possibly an understatement. Nomex underwear is standard issue for my job, but it’s very interesting to see how often it’s needed. The other thing that interested me was “what roles gave what responses”.
For the most part, experience ops people (hi Cos, love your work) saw through my facade and knew what was up. They’re also generally used to working in constrained environments, and as a result have a huge toolbox to still effectively do their work. The other good news is that these wonderful people also become great advocates for improvement, because most of the burden of running systems fall in their laps.
Developers are predictably “y u no luv me” because what their main focus is to develop and deploy rapidly, debug issues locally and repeat. There’s probably a good reason for excluding some of these principles during development, but as I will repeat, unless the developers feel the pain themselves, it’s unlikely that changes are going to be made. All that does mean is that the operation team gets sad.
Why did I choose that particular course of action?
Well, it’s pretty controversial, so there’s lots of talk (and complaining) so people communicate with each other about how terribly unreasonable that I am, and how I don’t understand “the reality’ of software development. It’s visible and easy to do (don’t start sshd, easy) and should it turn out to be a retrograde step, it’s easy to change back.
The other benefits we see from this is that our systems start to become immutable – a property that I find particularly valuable in coding, and it transfers nicely to system components as well. This is a great thing in AWS land because we can just shoot the cattle and launch another one, and I know that nobody has been on the box “fixing” it.
By not being able to log into the box, we have to think hard about how we log and monitor our system components, especially important things like tracing transactions through the system. What sort of identifiers? What do our components do under failure conditions, network connections etc
There’s a school of thought that I should carefully explain my reasoning behind my decisions so it’s clear to everybody, and there is limited “heat and light”. There may be some merit to that, but my role is not a dictator, it’s an educator and a communication enabler. I don’t mind if the teams are all plotting to subvert me, or even getting together to bitch about how unreasonable I am, but the point is, they’re talking about it – with each other. That’s a big win. I love watching our internal IRC when somebody proposes “how about I use a bastion box to get to the instances” and there tends to be a few comments like “don’t say that, you’ll make Jon angry”, or “shhhh, don’t say it out loud”. That’s fine. It means that people are paying attention, and that even tongue in cheek comments like that make me feel like a difference is being made.
The second part is that I’m not always sure that I’m right. Sometimes I just go with “this seems like a good idea”. Like parenting with George, provided nobody is going to die (or projects fail spectacularly) then making these sorts of decisions and directions will gain valuable insight into our projects and systems, even if we start to go in the wrong direction.
The astute readers here (well, let’s face it, if you’re reading my blog, you’re probably already astute) will notice that I’ve only described a very thin slice for the implementation. Yes, that’s true. This is a big thing, it’s a long term view and to be honest, it’s sometimes disheartening to have to wait. It’s worth the wait, just need to hold firm and be confident that what you’re doing is the right thing. So don’t be confused that the descriptions above cover all that is needed, even from a pure “this is the direction” point. There’s probably 20-30 separate parts of the implementation that is being influenced at any point in time.
I’m looking for long term change with the work I do. Not short term fixes. I want teams to participate in this journey and not just be told what it looks like. There’s also significant parts of cultural change that form part of what I’m aiming for. People do get stuck “thinking the same way” and it’s my role to try and encourage them to think in a way that systems will be constructed in superior ways.
I hope this turned out to be of value to some people, I’m happy to discuss it further in public, or private emails. I’m very happy to help organisations understand how architecture can be better integrated with the development and operation of systems.