Category Archives: Development

On Code Rot

April 7, 2015Developmentcode rot, development, quinn norton, software entropy, software rot, technical debt, yagniPatrick Gill

During a meeting recently, an open question came up along the lines of “code doesn’t rust, right?”. As a developer, I know this is not the case, because code rot, but I found it hard to describe the term succinctly. Code rot happens for many different reasons depending on the perspective you’re taking, and is linked to many other terms. Simply listing some of these seemed like both downplaying the problem and obfuscating the reasoning for addressing the problem behind jargon. These thoughts stayed in my mind afterwards, and I thought I’d have a look at how other people are describing it. Despite the general agreement on this Stack Exchange question that the Wikipedia description is poor, I actually felt it is a pretty good one:

Software rot, also known as code rot, bit rot, software erosion, software decay or software entropy describes the perceived “rot” which is either a slow deterioration of software performance over time or its diminishing responsiveness that will eventually lead to software becoming faulty, unusable, or otherwise called “legacy” and in need of upgrade. This is not a physical phenomenon: the software does not actually decay, but rather suffers from a lack of being responsive and updated with respect to the changing environment in which it resides.

I say this is a good description because it does not necessarily tie what is a fairly flexible term to one particular perspective, but also because it makes mention of two specific things:

Software entropy

The principle behind software entropy is that, as software is modified, its complexity, and thereby its entropy, increases. This link between complexity and entropy is what makes approaches such as KISS or YAGNI important in software development (more on this later). Entropy can (and indeed must) be mitigated through refactoring, but technical debt will increase entropy. And technical debt is interesting, because you already have some, even if you wrote your software yesterday. You might have only a small amount – that unit test you didn’t have time to write, or the documentation you didn’t think was necessary – but in any system with any history you will have accumulated some. Often there are business pressures behind that debt (and this is no comment on the business, because there are often good reasons for it). But that debt needs to be repaid, because the patchy solution that was traded off for an earlier release is going to come back and haunt you. Which leads nicely onto the second item…

A changing environment

With complex systems, the changes in the environment can be brought about by a staggering number of factors. Even when excluding hardware, new versions of operating systems, application servers, or browsers can break your product without touching a line of code. Protocols and standards change. Without realising it, your product has accumulated massive technical debt by not updating the frameworks and libraries chosen 10 years ago. That legacy part of your codebase? You know, the one no one wants to touch, ever, lest it collapse under its own weight, creating a black hole which pulls you right in, nullifying any concept of space and time? Yeah, it turns out that while it remained anchored in 2005 listening to Crazy Frog everything around it moved on and it doesn’t really work very well anymore, at least by modern standards. In other words, code which stagnates while everything around it fluctuates is rotting, and worse – it doesn’t even need to be buggy. What if your product is now too slow to compete with alternative products? If a particular feature runs noticeably slower than others or has a considerably different UI? If usability is poor? It is admittedly a very blurred line that gets drawn as to whether the culprit is poor design or maintenance (i.e. technical debt), or good old code rot, but the two are inextricably linked.

How did we get here?

So your code is now a soggy pile of rotting mush. How did it get there? As I mentioned earlier, the reasons for code rot are legion. Last year Quinn Norton wrote a particularly incisive article titled Everything Is Broken. The main focus of the article is on security, but there’s plenty to relate to software development in general, and I highly recommend you read the entire thing. Some particularly pertinent quotes:

Written by people with either no time or no money, most software gets shipped the moment it works well enough to let someone go home and see their family. What we get is mostly terrible.

Your average piece-of-shit Windows desktop is so complex that no one person on Earth really knows what all of it is doing, or how.

We’re back at complexity. This is an area which I’m familiar with from my experience so far, and why I feel that YAGNI is an important principle. As your product gains more features over time, so complexity increases, and inevitably so does software bloat. Preventing unnecessary complexity and bloat doesn’t just help to prevent code rot, it also reduces maintenance. On a long-term product, this is an important factor, as that expanded feature-set is necessarily going to require expanded maintenance.

But how do I fix it?

There is a tendency, when faced with overly complex code, to throw it out and start again. After all, it is nicer to play with a nice new ball than it is to play with a roughly spherical collection of patches. I include myself here – it is, or at least feels, harder to understand what old code you didn’t write does, than to start afresh asking the more fundamental question: what does this code need to do? When you are beyond the point at which refactoring is a realistic option, or when software bloat is severe enough, this is the right approach. Unfortunately, it is unlikely to be straightforward to recognise these situations if understanding the code is already a problem. Consider the possible time implications of addressing problem areas:

Spend time understanding the code (t_uc) and refactor (t_r): t = t_uc + t_r
Spend time understanding the code (t_uc) and start over (t_s): t = t_uc + t_s
Spend time understanding the requirements (t_ur) and start over (t_s): t = t_ur + t_s

The tendency to start over comes from interpreting the above as t_uc > t_ur, because t_ur is a subset of t_uc so that option 2 is never worth it. Similarly, and as an independent consideration, it may be that t_s > t_r, but t_r doesn’t reduce bloat or complexity, so t_s is preferable. This means option 3 seems preferable. But logically, if t_ur is a subset of t_uc, then it doesn’t follow that t_r doesn’t reduce bloat or complexity, so option 1 is back in the game. It is also possible that t_s < t_r, putting option 2 in the mix too.

Long story cut short: it doesn’t matter what approach you take. Sure, some may take more time than you needed to, but what actually matters is addressing the problem.

Photo by pedrik via Flickr.

Continuous Integration Is Not Enough

November 25, 2014Developmentcloud computing, continuous delivery, continuous deployment, continuous integration, development, devops, dvcs, release early release often, reroPatrick Gill

“Do you have any experience of Continuous Integration?” seems to be a relatively common interview question in software development these days. It makes sense – checking that unit tests still pass, and that the product builds as expected with every commit to the main development line, ensures that only small amounts of effort at a time are required to maintain software quality, and that it doesn’t all need to happen in one large block after development is complete.

However, with the Release Early, Release Often philosophy, software development is moving beyond CI, and towards Continuous Delivery (ensuring every change is deployable to a production environment) or even Continuous Deployment (where every change is in fact automatically deployed to a production environment). One of the drawbacks of RERO can be that users need to update their software more often, but in web applications this doesn’t come into play. This means Continuous Delivery and Continuous Deployment are perfectly suited to web applications.

Continuous Delivery and Deployment bring several advantages. For example, developers or systems administrators spend less time building and deploying, as the process is largely automated, and it becomes easier to detect which change introduced a bug (especially ones which adversely affect performance). Additionally, features and bugfixes are released as soon as they are ready – they do not need to wait until the release window if they’re ready early, or wait until the next release cycle if they need another day. Similarly, releases are not held up due to an issue on a specific change. This means customers get features faster, and can provide feedback and be more involved in the development process.

These advantages do come at a cost, as enabling a software project to move to Continuous Delivery and Deployment can be a complicated process, depending on the state of the product. To start with, you will need a DVCS such as Git in order to separate each development or fix into a separate branch. This will also help with merging. Trying to do it on Subversion will kill you. If you’re not already using a DVCS, there’s going to be a learning curve.

Your life will also be easier if you’re using cloud computing to host your web application. It will mean that deploying a new version can be done by creating entirely new servers to deploy the application onto, instead of offlining/updating/onlining servers in sequence. The advantages here are that two versions can run concurrently, and that rolling back a release is as simple as offlining the new servers. It also means that you’re temporarily increasing capacity during a deploy, instead of temporarily decreasing it, and deploys are faster, which is essential when deploying often. Again, moving to cloud hosting may involve a learning curve.

Finally, there will be a cost involved in the setup or creation of deployment tools, as there are likely to be significant changes in process here (configuration, scripts, builds, etc.). Along with hosting, this might require a dedicated systems administrator or a move towards DevOps. Again, there is a learning curve and/or a cost to this.

With all this in mind, perhaps it is time to start asking whether candidates have an interest in, or experience of, Continuous Delivery instead.

Photo by Noah Sussman via flickr.

The Cost Of Replacing Staff

November 4, 2014Developmentdevelopment, knowledge loss, recruitment, replacing staffPatrick Gill

An article in The Telegraph earlier this year caught my eye. I’ve mentioned once or twice that staff turnover, and the resulting knowledge loss, can be costly. The article in The Telegraph provides some numbers which are really quite startling:

The average fee for replacing a departing staff member is £30,614, says Oxford Economics and income protection providers Unum. This figure comprises two typical amounts – £5,433 for logistics, such as agency fees and advertising, and wages during the time when a new employee is yet to reach optimum productivity level, believed to be an average of 28 weeks at a cost of £25,182.

And specifically for IT:

IT and other technology is most affected by high staff turnover. The overall sector figure is approximately £1.9bn per year. Workers take more than seven months to reach their peak, at a cost of £31,808.

With most probationary periods lasting three to six months, this means on average a developer will reach the end of probation before reaching optimum productivity. Consider also that the average salary for a software developer in the UK is £30,000. This means the average cost of replacing a developer is higher than the cost of paying that developer’s salary for a whole year. Over the course of a year, losing one staff member a month would incur a staggering £367,368, or £381,696 in the case of losing one developer a month. Depending on the size of the company, this can be an unassumable expenditure.

According to Forbes, employees changing jobs can expect a salary increase of between 10 and 20%. Retaining the 12 departing employees, then, would cost between £36,000 and £72,000. It’s still a lot of money, but it’s only a fraction of the cost of replacing them. With such a large divide, it also allows the company to address other issues to aid in staff retention – employee benefits, training, work environment, morale, and so forth.

Photo by cobalt123 via flickr.

The Importance Of… Roadmaps

October 21, 2014Developmentdevelopment, future-proofing, product direction, roadmap, the importance ofPatrick Gill

I started the off the Importance Of… series to write about some aspects of software development which are sometimes overlooked. In what was going to be my last post, I mentioned in passing roadmaps. In particular, I said that customer-led developments shouldn’t affect or obstruct the roadmap. The rationale behind this is simple – the roadmap defines where you need to take your product; it effectively defines the primary development objectives. Anything detracting from your primary objectives will by definition be an obstacle for your product.

In the first place then, a roadmap will force you to think about the priorities, risks, and rewards of developments planned for your product. In turn, this clarifies the direction of the product – where it’s heading and what it’s aiming to do. And direction is invaluable when it comes to engaging with people, both internally and externally.

Externally, providing a roadmap to customers will boost confidence. It demonstrates that there are objectives, a thought process, a plan. Existing customers know what to expect, can start planning for those deliverables, can see progress as roadmap items are delivered. Potential customers can determine whether it’s the product they’re looking for now or in the future, can see it’s an active project with progressive thinking. A roadmap shows your customers that you are committed to improving your offerings.

Internally, that direction is perhaps even more important. It ensures that everyone is working towards the same goals. Aligned objectives means less friction, and more clarity on what to work on now and next. From a software development perspective, it means it is easier to keep multiple development teams in sync, and easier to plan and schedule work. Dependencies across teams can be identified and taken account of.

For developers in particular, having a roadmap can make architectural decisions a less daunting proposition. People sometimes refer to future-proofing a feature or development. While it is technically impossible to future-proof a software product (the 640K quote, while a misattribution, remains a great cautionary tale), knowing where a product is headed can certainly reduce the number of problems you encounter in the future. It allows developers to consider flexible approaches, discard ones which will not work when other developments come into play, or even delay certain aspects until they can be kept in step. [As an aside, the term future-proof isn’t great for software development, can we use future-harden instead?]

A roadmap is an essential part of any company strategy. Without one, you are lost.

Photo by Scorpions and Centaurs via flickr.

The Importance Of… Listening To Developers

October 3, 2014Developmentdevelopment, jeff atwood, knowledge loss, listening, marketing, matt linderman, the importance ofPatrick Gill

Everyone’s heard that you should listen to your users, right? After all, if you don’t, you risk alienating and losing them. My particular favourite take on this is Jeff Atwood’s, of Coding Horror, Stack Overflow, and now Discourse fame (and if you don’t know who Jeff Atwood is, where have you been?). He wrote a blog post called Listen to Your Community, But Don’t Let Them Tell You What to Do. You can see this way of thinking in a lot of successful websites and applications – if Facebook didn’t follow this path, it would have died years ago after reverting every change users complained about. In a way, this also applies to software products where customers pay for developments – by all means allow this, but don’t let that affect your roadmap or you will end up without a product.

Somewhat less popular, but still going strong (and perhaps growing stronger with the appearance of shows like Undercover Boss), is the idea that you should listen to your employees. An interesting take on this can be found in an article by Matt Linderman (of 37Signals/Basecamp) called Marketing to your own team:

You’re not just sending out a message externally, you’re sending one out internally too. If your employees don’t believe it, the whole plan falls apart.

It’s not good enough to just have meetings where employees talk to you – it’s all just platitudes if you’re not engaging in a conversation. Listen to your employees, market to your employees, or they will lose interest.

Of course, this is just one of the reasons you should listen to your developers. From a technical perspective, and in the absence of empirical data (such as usability studies), developers should be your first port of call when it comes to figuring out how to develop your product – they know your product inside out, to at least some degree they are invested in it, and they have an interest in what you’re doing (at least, I find web application developers have a general interest in the web). If they didn’t they wouldn’t be there. Their knowledge and their input is valuable, whether it’s suggesting improvements to customer requests, or strategic direction, because they understand the expectations of software products.

Ignoring developers’ input risks alienating them. It can demonstrate a lack of trust, and a removal of responsibility. And employees who feel alienated, or without responsibility, will soon leave. High staff turnover will in turn lead to other problems. Knowledge loss, leading to difficulties in maintaining and developing the products, is perhaps the most obvious. In the highly interconnected development community, it can also make recruitment difficult, which will make any recovery harder to achieve. And if the turnover cycle is short enough, you will end up unable to promote internally. For example, junior roles tend to last 2 or 3 years, and mid-level roles anything from 3 to 5 years. This means senior roles are generally available only after 5 to 8 years. If you’re losing developers after just 2 or 3 years, the knowledge loss will be severe, you’ll be spending a lot of time on training which you don’t get to see a return on, and you’d better hope the code is very well documented.

Photo by Ky via flickr.

The Importance Of… Process

September 30, 2014Developmentagile, development, jesse james garrett, process, the importance ofPatrick Gill

Early this year, Jesse James Garrett tweeted about process:

Always disappointed when I hear that people think @AdaptivePath is militant about process. You won’t find a more anti-process design firm.

— Jesse James Garrett (@jjg) January 29, 2014

This appears to contradict the title of this post, so to give you some more context, I should explain that Adaptive Path is a user experience design and consultancy firm, and show you followups and replies in the twitter conversation that followed:

In fact, I’d argue that nobody can develop new methods if they’re in the habit of strictly adhering to them.

— Jesse James Garrett (@jjg) January 29, 2014

.@chsweb A solution only works every time if the problem is the same every time.

— Jesse James Garrett (@jjg) January 29, 2014

From a user experience consulting and design perspective, this makes perfect sense – analysing what each particular customer and their users need, and how they interact, will by necessity be different every time. However, it also hints at the inherent complexity that you will sometimes find in web application development. Because from a software development perspective, you need to be Schrödinger about process.

To clarify: at a high level, you need to ruthlessly follow the software development process – gather requirements, design, implement, test, release, repeat. Sometimes, there will be pressure to skip one of the steps. For example, starting implementation without designing first. As I wrote last week, this will cost you. Similarly, in Agile environments, and at a lower level, there is sometimes pressure to change the definition of a Sprint once started. This misses the point of an Agile approach – it is between Sprints that priorities should be changed, and the incremental iterations will produce the desired result. Changing priorities within a Sprint will confuse the team or corrupt work allocation, potentially elongate the Sprint delaying the release, or even produce unreleasable code – it moves you away from “Release Early, Release Often”.

At the same time, you must constantly refine, or change, your process. In Scrum, this is done at the Scrum retrospective, by asking what went well and what could be improved. The theory is to continue doing what went well, and to change or avoid what didn’t. The complexity I mentioned comes into play here. There needs to be context to what you continue to do, and what you stop doing. If implementing a design from a third party didn’t go well, the answer is not necessarily to avoid third party designs altogether – it may be that you need to avoid that particular third party, or that you need to discuss the design requirements with third parties, or any number of possibilities particular to the situation.

Different aspects of product development will necessarily have different processes for achieving their goals. In web application development, requirements gathering of the design, functionality, architecture, and user experience/acessibility/usability of the product could all have different processes. Coordinating the implementation aspects and processes is logistically complex. If the processes which are in place are not followed, it is entirely possible that that coordination will be lost, and your team will end up at a standstill.

Photo by Matthew Burpee via flickr.

The Importance Of… Specs

September 26, 2014Developmentburnout, development, specs, steve yegge, the importance ofPatrick Gill

Have you ever built a house? It’s an analogy often used when developing software. So, have you? Me neither, but one of the things that most people understand about building a house is you need architectural plans. Floor plans in particular will tell you the dimensions of each room, detail fixtures and electrical items, note finishes and construction methods, and so on. It doesn’t just tell you where a door is, but also which direction the door will open in. It means when the construction crew turns up, they don’t need to ask how many bathrooms you want, and where you want to put them. This is all perfectly logical. No one would be accused of stalling when no floor plans are available.

For some reason, this doesn’t translate to software development in quite the same way. Sometimes feature implementations will be requested without more than a description of an idea. Developers will receive it and have hundreds of questions, because an essential consulting and requirements gathering stage has been skipped, and there is no spec for them to implement.

Why does this happen? Mainly because like all analogies, it will break when stretched. Developing software is often not like building a house. Moving a wall is costly – it is easy to understand that because you can see the physical effort involved. In our analogy, moving a wall in software does not afford the spectator quite so clear a view – it can be difficult for someone without a development background to understand why the changes they’re asking for are complicated and expensive to undertake. After all, if repainting the walls is as simple as changing a hexadecimal reference, why should moving the kitchen to the back of the house be any more difficult?

Another problem that seems common is a basic misunderstanding about agile processes. Look up “agile” in a dictionary and you’ll see it described as flexible, nimble, acrobatic. This is what people understand when you say you’re agile, and not so much the idea of iterative development processes. “Well, they’re flexible” they’ll think, “we can always change it later”. And yes, you can tweak things as you go along, but putting the basement in the attic isn’t really a tweak.

Perhaps more importantly, there is a fundamental difference between commissioning your dream house and requesting a change on a software product. The former is often a deeply personal affair, in which the customer is willing to invest their time as well as their money. Getting a customer to invest their time in defining the exact behaviour they want their application to have appears to be much harder – after all they’ve outsourced the work for a reason. I’m told some companies will even reject customers who aren’t willing to invest their time in this way.

Some time ago, Steve Yegge wrote an absolutely brilliant blog post entitled Have you ever legalized marijuana? If you haven’t read it, please read it now. Go ahead, I’ll wait. Done? Good, isn’t it? Steve makes a very strong point in the article – that what seems like a simple idea, which at first glance makes perfect sense, may turn out to be completely insane when it comes to implementation. If customers write a spec they might see this – the process of writing the spec will force them to think about how it might all work, and they might quickly come to the conclusion that it’s not going to work. More often than not, however, the job of analytical thinking will fall to a development team. This tends to make sense, as this is something developers are well equipped to do, and which they are good at.

But what happens when developers have to think about how to move the basement into the attic, or their feedback isn’t given due consideration? Steve mentions something which isn’t necessarily front and centre of the article, but which I’ve found to be true:

Because that kind of shit happened at Amazon pretty much every week I was there, for almost seven years. (And astonishingly, we actually managed to launch at least half those crazy ideas, by burning through people like little tea lights.

You will burn out your developers. Working on insane projects will lead you to insanity. So not only will you have to deal with the logistical nightmare of working without a plan, suffer the delays and extra expenses that that will incur, and strain the relationships with your clients, you will also find yourself having to recruit more often (and consequently spending more time and money on training).

If you take away only one thing, make it this: if you set off without a spec, you will never know whether you’ve delivering what was expected. I know, I know – sometimes it feels like the way to go about a feature is to start coding and see what’s possible (I’ve done this myself). But even in a healthy agile environment, a spec will provide you with a solid platform to start off from. Spend the time to prepare a spec and discuss the issues around it upfront, and save yourself the strife.

Photo by Will Scullin via flickr.

The Importance Of… Testing

September 19, 2014Developmentdevelopment, qa, testing, the importance ofPatrick Gill

Given the importance of testing a software product, it is surprising to still find people who consider it a menial task. Despite testing being a defined stage in the software development life cycle, and increasingly sophisticated tools and approaches, some still believe that testing is simply about “checking it works”. Perhaps this is more prevalent in web application development. After all, anyone should be able to use a website, so perhaps these people think that testing is just about using the site and finding things that don’t work.

Of course, this doesn’t paint an accurate picture of what testing can or should involve. A test engineer working on a moderately complex web application can find many ways to spend their time:

Unit testing (although it is more common for developers to write their own unit tests, there can be advantages to having a test engineer write them separately)
Integration testing
Functional testing
End-to-end testing
Regression testing
Load/stress testing
Usability testing
Browser testing

It is this last item that seems to be the focus in web application development, to the detriment of everything else. All areas of testing require some specialist technical knowledge, so it can be irksome to find browser testing categorised as something anyone can do, particularly when this area can be a world in of itself.

For example, support for multiple browser versions will increase testing time dramatically (particularly when Internet Explorer is involved), and will require a more advanced understanding of the behaviour and support for technologies provided in each of these versions. Similarly, mobile or responsive support (almost certainly a requirement today as mobile use increases) will require an understanding of the particular idiosyncrasies of each platform, device resolutions, and the design changes across break points.

All of this is required to ensure that the product being released works as intended in every way. A bad release can turn your customers and potential customers against you, or cause financial penalties when SLA’s are not met. Insufficient testing can mean more time is wasted in solving an issue – investigating an issue 2 months after the development is much harder than doing it the same week, as they will have forgotten the detail of the implementation and will need to re-investigate. It will be even harder if the original developer of the feature has left. Skipping the load testing can put you in an awkward position at peaks, while skipping usability testing might mean you find less users when you expected more.

Lack of proper testing will also pit the development team against you – they want a product they can be proud of, and they will realise that skimping on the testing will cause them more headaches and create more unnecessary work for them, even if you don’t – no one wants to stay late at the office to prepare an emergency release for an easily preventable problem.

So please, hire test engineers, and spend the time to test thoroughly. A good test engineer will be a very valuable asset in multiple aspects of the product’s reliability and behaviour, and they will save you money in the long run.

Photo by Pascal via flickr.

The Importance Of… Cross-Functional Teams

September 12, 2014Developmentagile, cross-functional teams, development, lean software development, mary poppendieck, the importance of, tom poppendieckPatrick Gill

In this series of blog posts I will be exploring what I feel are some of the important aspects of software development management, particularly in relation to web application development.

A cross-functional team is generally defined as a group of people with different expertise working towards a single goal or objective. Within software development, this means a team of people who will collectively hold all the skills required to develop, build, and deliver the product. For example, when developing a web application, you would want the development team to have expertise not just in coding in the programming language of choice, but also in databases, systems, usability, UI/UX/front-end design, front-end development, testing, technical writing, and so forth.

The common single goal and nature of cross-functional teams make the greatest advantages to the approach seem fairly self-explanatory, but difficult to describe. Perhaps Mary and Tom Poppendieck summarised it best in their book Lean Software Development. They described seven types of waste in software development, and cross-functional teams can help in all of them:

Partially done work, or work which has not yet been delivered. For example, work which is yet to be tested. Cross-functional teams minimise the effect of this by ensuring no external dependency – continuous testing can take place, and in an Agile environment delivery is made at the end of each iteration ensuring partial work is not left waiting.
Extra features, or providing more features than have been requested. This is particularly pertinent when comparing Waterfall and Agile methodologies – in Waterfall this will not be apparent until delivery, whereas with Agile methodologies there will be more opportunity to act. A cross-functional team can help by providing domain experts who can assess when to stop working on a feature.
Relearning, or reinventing the wheel. A cross-functional team can share knowledge and prevent the team from working on a problem which has already been solved.
Hand-offs. Handing off work to a third party will suffer from two problems: the inevitable documentation, and a potential degradation of knowledge. Within a cross-functional team, less documentation will be required as there will already be prior shared knowledge, and there will similarly be less room for misunderstanding. All this will lead to fewer delays.
Delays. As alluded to above, cross-functional teams will reduce delays in communication and external parties, but also those due to, for example, differing priorities.
Task switching. Interruptions can be minimised when work can be coordinated within a single team, without external dependencies.
Defects. Not just bugs, but misunderstanding the functionality and delivering the wrong thing. A cross-functional team is better suited to defining the functionality, and involvement from all team members means better test coverage.

There is a certain amount of disagreement on whether Scrum should include specialists in defining a cross-functional team, or whether cross-functionality should be achieved by having team members work in areas outside their area of expertise. In practical terms, I’ve found it to be a little bit of both, but either way I think it’s clear that cross-functional teams are important.

Photo by Simon Liu via flickr.

Building A Twitterbot

September 5, 2014Developmentdevelopment, google maps, jon ronson, mashups, php, raw shark texts, steven hall, twitter, twitterbotsPatrick Gill

Back in June 2012, I was watching Esc and Ctrl – a series of videos from Jon Ronson for The Guardian, in which he investigates attempts to control the Internet in some way, shape, or form. Some of the videos dealt with Twitterbots, and they piqued an interest on how the bots could generate responses from real Twitter users. As I had free time at lunch I thought I’d investigate how difficult writing a Twitterbot could be.

As it turns out, writing a Twitterbot is remarkably simple. This is probably testament to Twitter’s API – as far as I could tell, pretty much everything I would want to do as a user was available also via the API, and the rich data provided by Twitter was well suited to an application which would effectively parse it in order to generate a response. Additionally, there were plenty of libraries readily available for use in a variety of programming languages, so the barrier to entry was very low. I decided to write one, but I needed an idea for a bot.

Some 5 years earlier, I had read The Raw Shark Texts by Steven Hall. For those who haven’t read it, a brief synopsis: a man named Eric Sanderson is chased by a conceptual shark called a Ludovician, which feeds on words, language, human memory, and the intrinsic sense of self. Eric tries to throw the shark off his scent by travelling through unspace – forgotten tunnels, and abandoned buildings – and the use of other people’s information, writing, and dictaphone recordings. Eric wraps himself in a chaos of others. (As an aside, if you’re interested in the book, I recommend a physical copy, as it is beautifully styled with typography pictures which might lose something in an ebook format).

This felt like a perfect fit for a Twitterbot – what if the Ludovician was searching for Eric on Twitter? A drop of information in an ocean of data would attract it, and there was scope for creating a bit of a mashup. I set off to write it, using PHP for quick development. First, I created an outline of what the script would do:

Search Twitter for references to “Eric Sanderson”, “ludovician”, “luxophage”, or “cognicharius” – these would be rare enough to be people talking about the book, or an exact name match which might entice people to investigate further
For each of the resulting matches, follow any users which weren’t already being followed, store the geo location (if available), and issue a reply to their tweet
Every couple dozen tweets, average out the the locations collected to provide a position for the shark, and provide the likely location of Eric based on the number of matched tweets per location

The responses that the bot would issue would start off as relatively descriptive text, and I was hoping to provide increasing measures of stylised text, much like the book offers, with some specific quotes. This is difficult on Twitter, as ASCII art is limited by the formatting, and there is little control of this on the Twitter site or clients. For this reason I decided to use a technique which was used to great effect on YouTube – superscript and subscript characters. The full list of responses I prepared was as follows:

Pattern recognised, following
Reincident pattern, targeting
Reincident pattern, circling
Reincident pattern, increasing priority
Memorising pattern
Pattern memorised, attempting to locate
Location compromised, reverting to data collection
Target acquired, deploying luxophage: ·~~-=<{((©@) [defined in the book as a parasite that prevents humans from thinking clearly]
Awaiting luxophage attachment
Preparing for second stage
ȇ̑̐ͫ͗ͣ͒͐͠͏̭̱͕̟̝̀͘͡y̵̷̜̞̼̼̱͓̬͒ͩ̂͒̉̌͐ͣ̾ͨ͡ͅȩ̧̛͙̤̪̞͕̰̃̓͋̈ͩ̒͗̓ͥͫͧ O ̵͓̳̗͓͖̉͛̆ͩ͐̈́͐͗̂̄ͅe̘͕̞͖̼̘̞͊̑̆̓ͩ̿̈̀y̵̴̛̻̯̟̯̩̹̣͒͐̄ͦ̓̆̓͠e
t̤̦͚̭̙͉̒ͯ̍͂͌̈́̍̌͂̿͘ẽ̝͉̩ͧͯ͗ͣ͢e̲͍t͙̠̣̽̀̈́̚’ͤ͆ͭ̓ͣ̓ͬ҉͉̜̟̩̯h̲͕̋ͮ ̦̯ͅ ͮ̌̎ͬ̿͆<͕̘͈V̑̿͋̽͢V̦̻ͧͥ̽͗̎ͩ͟V̹̲͚̗̺͞V͎̬̦͞V̝̥͕ͧ͞>̡̣̯̠̱̪͈ͩ
for̽g͌ö̜̯̟́ť͚̱̗̺t͎̝̬̦͂e̲̼̼̳͕̦̯̚ͅn͕ͣs̯͉̓o̦̜̿͋̽m̟̃ͧͯ͗ͣͅẽͬ̿͆ͩ̎̂͡’͕̯̑̑t͕h̰i̯̟n͚̗̺g͎̬̦i̲͕̦̯ͅmp̷̜o͙̤̪͛̆ṛ̡̯̑̆ͩt̲͕̳̗̦̯ͮ̌̿ͅͅ
̵<ca̤̦͚̭̒ͯ̍͂͌̈́̍ͅpͯḯṯ͆a̜̞̼̼̬ͅl̷ͦ͗ ̭̦̯ͮ̎̓ͥ ̭̦̯ͮ̎̎͋ͅ O̭̦̯̎̎͋͡ͅ ̭̦̯ͮ̎̎͋ ̭̦̯ͮ̎̎ͥͅ o̵͓͖̺ͦ͗ͅpen>̵
g̰̰̰͂a̰̰̰̰̰͂͂ḭ̰̰̰̰͂͂͂n̰̰̰̰̰͂͂͂͂ḭ̰̰̰̰͂͂͂͂n̰̰̰̰̰͂͂͂͂͂g̰̰̰̰̰͂͂͂͂͂ O̦̦̦̰̎̎̎ f̰̰̰̰̰̰͂͂͂͂a̰̰̰̰̰̰͂͂͂͂͂s̰̰̰̰̰̰͂͂͂͂͂t̰̰̰̰̰̰͂͂͂͂
f͚̭͌̈́̍̌i̿ͩ͘͟r̻̯̟͠s̈́͐͗̂̄t͓̳ͅ ҉͉̜̟t̷̜̞͡h͓̬̃̓͋̈ͩͅi̧̛ͥͫͧn͙̤̪g̼̼̱̉͛̆ͩ͐̈́ͅs͕̟̐ͫ͗ͣ ̙͉̿͘f͕̞͖͐̄ͦi̼̱̓̆̓ͅr͏̭̱͕̟̀͡s̤̪̞͕̰͞t̠̱̪̑͋̽
h͙̠̣̽̀̈́’a̞̼̼̱ͣ̑͗ͅṛ̡̯̠ͬḑ̧̛͙̤̱̪ͩ ͏̭̱̀͘͡t̳̗͓͖ͧͥ̽͗ͅo̘͕̞͖̼ͦ̓̆̓ ̓ͬ҉͉̜̟s͚̭͂͌̈́̍ḙ̱͕̟ͤ̿͋̽e̲͕̦̯ͤ̑̐ͫ͗ͅ
Â̵͓̄ͅq͕̟ͯ͗ͣṳ̪̞ͫ͗ả͛̆ͩr̞͖̼ḯ͙̠u̞̼ͭ̓ͣm’̠̣ͤ͆ ̿͆w̲ͬ̋ͮǎ̧̧͙̎s͓̬̃ͧͯͅ ͕͞g̵̷͛̆ͩ͡i̘͈̓̆̓g̳̗͛̆ͩͅa͕̟̝͋̈ͩṋ̱ͣ͒͐t̟̯̩ͧͥi̱̪͒̉̌c̯̠̱ͣ̾ͨ
ȗ̟̝̐n͕ͤ̍͢r̼̼ͯ̈́a̱͕ͭ̍v̟̝ͬ̌e͕̞͖ͣ͗l̦̯͘͡ͅl̘͕̞̒i̠̣ͬ̌̎n̲͍ͤ͆g͚̭ͩ̂ ͏̀t͙̤͞ḧ̤̦́͐͗o͛̆ͩu͓͒̉̌ͅg̘͈ͧ̽h̠̱̪ͮ̌ț͚͒ͩ ҉͉̜̟d’̡̣̯ř̦̯a̼̱ͯg̭ͯ
˙ǝʃqıssod ʎɐʍ ʎɹǝʌǝ uı uɐıɔıʌopn⅂ ǝɥʇ oʇuı ‘ǝɟıʃ ʎW

The final response was based on a separate “fragment” of the novel, which fans of the novel were more likely to have read, and made reference to the Ludovician and Eric becoming tangled.

I had initially considered the idea of taking the geo coordinates and doing a reverse lookup with Google Maps in order to get a near exact address to use in the response, but given the responses planned, I felt it might come across as a little too creepy/menacing.

@Cognicharius

The only thing left was to create a Twitter account for the bot. @Ludovician was unfortunately already taken, so I decided on @Cognicharius – the name of the family the Ludovician purportedly belongs to. I used an ASCII art creator to generate an image from a shark photo as its avatar. I set it loose on June 14th 2012, just four lunchtimes after starting.

The bot lived for approximately one year – until the 1.0 Twitter API was retired in June 2013 in favour of the 1.1 API. I didn’t upgrade the bot as I felt the experiment had run its course. In its year of life, @Cognicharius had tweeted 738 times, followed 310 Twitter users, and garnered 91 followers (far more than I can claim). It earned several retweets (including some from Steven Hall himself), generated some amusing responses from people as you can see below, and even got someone to create a Google Map tracking the location of the shark.