The Great Node Mpris Project

I think one of the things that makes me different from other people is that it really bothers me when things don’t work correctly. I feel a compulsion to fix things when I see that they’re broken. As I’ve written about in the past, it’s not glamorous work to be a bug fixer. You don’t get the same credit as the original author. But it’s still important work to do and I find it oddly satisfying to put things back into their intended order.

The Bug

This project started with a bug on my issue tracker for Playerctl that was submitted two years ago. Media players implement a standard protocol on the Linux Desktop called MPRIS which is used for desktop integration. This allows things like the media keys to work, and the desktop to have widgets that allow you to see what song is playing, adjust the volume, and things like that. Playerctl is a utility people use to make their own desktop media player integrations.

When I built the media players affected by the bug and tested them, I found that the bug was in their code and there was nothing I could do on my side to make this work. This makes things a lot more complicated for me. It’s a lot more difficult to understand the inner workings of code that you didn’t write. And since these are established projects, I would have to communicate clearly what needed to be done and make the fixes in the least intrusive way possible so people would accept the fix. There is a whole established etiquite for this within the open source community that needs to be followed in situations like these.

The Broken Library

What the broken media players have in common is that they all have a dependency on a library called mpris-service. I was really lucky here because the owner of the library is someone who I have worked with a lot in the past, Simon (emersion), who is an amazingly talented and responsive open source developer. We met in person about a year ago at a hackathon for Sway.

On his issue tracker, I found all the same issues. Only the very basic features of MPRIS were working and everything else was broken. I was surprised that in the state it was in that the library had gotten such wide adoption. Three major media players were using it despite all the bugs and no progress had been made on the issue for years. I decided to make this my responsibility to help out a friend with a buggy library he didn’t have time to fix (because he’s busy doing other amazing work), for the users of Playerctl, and to improve the Linux Desktop environment.

The Next Broken Library

But it turned out that the bug wasn’t in Simon’s library either. He was using a library for the underlying protocol of MPRIS (called DBus) which simply wasn’t working correctly. It didn’t have support for the data types that are used in MPRIS. And further, both the implementation and the user interface were very bad because it uses platform-specific code written in C++ which makes it less portable across systems. This introduced some build errors in the media players they got around in various hacky ways with their own fork of the library. This definitely needed to be fixed.

The problem though was the DBus library was just not written in such a way that it could ever support MPRIS. Also, the owner seems to have abandoned the project and is no longer taking submissions for fixes. It was then I realized why this hasn’t been fixed. This was going to take a lot of work.

There was some discussion about using an alternative DBus library called dbus-native which had gotten some support by the library users. This path seemed promising because this library was much cleaner than the other DBus library and didn’t require compiling platform-specific C++ code. So I set out to make mpris-service work with this new library.

This didn’t work either. While dbus-native has great internal features, the user interface for creating DBus services did not support some very basic features I needed to implement an MPRIS service, and adding them would require a very extensive rewrite of the top layer of the library.

My Very Own DBus Library

Since I knew this was the only way to get this bug fixed, I did this rewrite and submitted a pull request on the dbus-native project. This pull request remained open for a few months before I realized that it would probably not be merged. This is totally understandable because lots of old code depends on this library that could break with my changes, and reviewing the code is a lot of effort that I couldn’t expect someone to do just to help fix my silly Playerctl bug.

So I decided to fork the library with all my changes and release it as a new library called dbus-next. I also fixed a lot of other bugs and added an integration test suite for all the new functionality that has very good coverage. So now NodeJS finally has a working DBus library. Great.

After that work was done, I then rewrote mpris-service to use my new library and everything worked great.

Media player implementations

Now that the mpris-service library works, people are starting on implementations of MPRIS on media players written in NodeJS and I’m doing my best to help out.

Now it’s possible for all these media players to support Linux Desktop integration. And when that work is done, I can finally close that Playerctl bug on my issue tracker.

Impressions of AlphaStar

Recently I heard that DeepMind has turned its attention towards making a StarCraft II bot in a similar way it made AlphaGo, the bot that recently proved to be capable of playing the game of Go on a very high level. The SC2 bot turned out to be really good as well. It beat two excellent pro players in decisive victories in a series of five games. SC2 is a game that is very dear to me. I first picked up the game in the late 90s when it came out and have at some times played at a fairly high level. A lot has been said about the games and I’d like to add my perspective about the performance of AlphaStar.

How does AlphaStar work?

I’m not an expert in machine learning, and the details are quite dense, so if you want an actual technical explanation, check out their whitepaper if you’re up for it. Otherwise, enjoy my very naive attempt at understanding how this works.

The bot plays programmatically through a headless interface that’s pretty much like a human would use. The domain of possible actions is pretty daunting considering how many places on the screen there are to click. However, it probably simplifies a lot at higher levels of reasoning. If the idea is “move the Stalker away from the Marines”, the exact angle at which that happens is probably not very important, and you really only have maybe like three or four sensible actions in that case. But still it seems like a pretty difficult technical challenge to overcome.

For the higher level gameplay, they broke the game into a few simple challenges.

  • Mine as many minerals as you can in a certain time
  • Build as many units as you can in a certain time
  • Win a battle against enemy units
  • Find things on the map to kill

These things are essentially what you do while you play the game. They created “agents” that do these things with complicated neural networks with a lot of different parameters to tweak and selected the best ones through a process of training. Then I think they glued these things together and ran them all at the same time and basically got something that plays StarCraft. The mining part mines minerals, the building part builds units, the finding part finds enemies, and the winning part wins the battles. Repeat until you win or lose.

This is a really fascinating way to think about the game. It seems so obvious, but in reality humans are thinking about things in a completely different way. Humans start with very high level plans and then think about execution afterwards sort of like a basketball play. I am going to open with this build order, then try to do a heavy Immortal push, and if I see X then I’ll do Y, etc. How does a person come up with a crazy idea like this? Who knows. Machines can’t seem to think this way though.

Whatever high level plans you think the machine is thinking of just seems to emerge out of the details. It’s sort of like when you see a V formation of birds in the sky. They don’t all get together and decide to fly in that formation. It’s just the easiest thing to do because flying like that cuts down on wind resistance, and any bird who doesn’t do it won’t be able to keep up. The machine looks at the details of the situation, and then estimates the probabilities of certain actions (actually to the end of the game) and then picks the action with the best chance of winning. You can really see this at work in the bot’s play style.

How does AlphaStar play?

With such a different approach to the game, AlphaStar naturally has come up with some completely new strategies for playing. A lot has been made in particular about two of its behaviors.

For one, AlphaStar will almost always overbuild workers in the beginning part of the game. It builds about 20 to a human’s 16. This is definitely the most practical result I’ve seen come out of the project because it’s something that a human being can easily copy and test to see if it works. This actually makes a lot of sense because it’s an easy way to counter all the early harass that Protoss has that usually picks off about two to four probes anyway. I expect this to become a new standard on ladder. It would be interesting if we still saw the behavior in matchups with less early worker harass pressure.

Another strange thing AlphaStar is doing is not building a wall at the base entrance, which is considered to be a best practice among human players. It’s difficult interpret this however. The purpose of the wall partially is to address the very human problem of a slow reaction to an Adept harass, but also to block an Adept shade from getting into the base to begin with by building a Pylon for a full block. It would be understandable to think that the Pylon block strategy would not emerge quickly because it takes quite a bit of high level thinking. So I think people will continue doing this. The machine is after all not perfect.

There are some other strategies that AlphaStar notably does not use. For instance, it does not use Sentries to block ramps and it doesn’t drop. These might also be a bit too complicated to emerge from the limited time they trained the agents.

AlphaStar does however have a very entertaining play style. It micromanages its units perfectly in every situation, sometimes even at multiple locations at once. At one point, it executed a perfect three-pronged Stalker ambush in the middle of the map. Each group of Stalkers almost seemed to be controlled by separate players. Much of human play optimizes for the limited attention of the person, but a machine has no such restrictions. Each Stalker can move out of the way of fire at exactly the right moment to avoid destruction. Seeing the game played perfectly was truly amazing.

This point however did however receive some criticism. If AlphaStar is trying to teach us how StarCraft should properly be played and the answer is “just have perfect mechanics”, then that is not very interesting. Sort of like how it’s pretty trivial to create a chess AI that can beat a human opponent with only 500ms on the clock. On every tick, AlphaStar has the human equivalent of hours of pondering for each small move.

While AlphaStar did put on a very impressive show, I still found the play style to be very cold and mechanical. I didn’t feel what was described by people who watched the AlphaGo games who thought that agent played in a human-like way. AlphaStar did do some really insane stuff. But it seemed to almost completely ignore the unit composition of its opponent and most of its decisions seemed to be predicated entirely on the assumption of perfect control. For instance, the game against TLO where it massed Disruptors is a strategy you could not possibly use without perfect control. It’s almost as if it is playing a completely different game than we are. There’s an entirely different set of constraints that the game is simply not balanced for. The same isn’t true in a game like Go which does not reward reaction time.

In fact, a lot of StarCraft II mechanics are specifically designed for the fact that humans do have a limited number of things they can focus on. For instance, Queens do not auto-inject Hatcheries precisely because that would make Zerg imbalanced in the early game. Human-scale focus is baked into many of the mechanics of the game.

What can we learn from this?

My primary takeaway is that machines like this just do random dumb shit until they find something that works. I really like what this company is doing though because I think overall these sorts of things can have a positive impact on our culture. They sort of remind me of Boston Dynamics, a company that seems to be in the business of making random cool things for YouTube videos.

I hope this project can influence game designers to make better competitive games. It puts into focus what machines do well versus what humans do well. I think game designers should take a cue from this to maximize game design for human skills by rewarding high-level thinking and creative problem solving over mechanical mastery. Now that computers are better than us at StarCraft II, the challenge should be to create a game that humans will always be better at. There must be some kind of game like that, right? I’m not sure anymore actually.

At least we know if the Zerg are real and they ever invade Earth, we should have a chance to defeat them now.

References

Write Drunk Edit Sober

The famous quote write drunk, edit sober is often attributed to Ernest Hemmingway. When I first heard this, I thought of how I could incorporate this idea into my own creative process and came to the conclusion it is a terrible idea. As a creative professional, I need to come up with creative ideas every day. If I needed to rely on alcohol for my creative process, I would very quickly destroy my health. But something about this idea still rings true to me so I think it’s worth some time to analyze to see if there’s anything we can learn from it not just for writing, but for any creative activity.

Write Drunk

“I hate to advocate drugs, alcohol, violence, or insanity to anyone, but they’ve always worked for me.” – Hunter S. Thompson

The first part is write drunk. The first thing I think about when I hear this is one of my favorite authors, Hunter S. Thompson and the book Fear and Loathing in Las Vegas. Thompson’s inspiration for writing often comes from altered states of consiousness from the use of drugs and alcohol. Obviously you don’t need to go this far to be a creative person. But there is certainly something about the creative process that makes it feel like a different state of mind than normal waking consciousness. Creativity seems to flourish not by direct effort, but by the suppression of some more rational part of our personality that is responsible for inhibition.

The key insight is that the creative state of mind is sort of like being drunk. If we change this to write as if you were drunk the advice becomes much more practical. Someone who is drunk tends to be bursting with ideas. Most of the ideas are really bad, but there’s a lot to choose from. The drunk person has so many ideas precisely because he doesn’t care whether they’re bad or not. The alcohol suppresses the critical faculty responsible for their immediate evaluation.

We can mimick the creative part of the drunken state by slowing the feedback loop between idea generation and evaluation. An idea that seems bad may actually lead to some valuable path that we may have never discovered if we cut off the line of reasoning too early. For instance in chess, a common tactic is to make a move that intentionally loses a piece, sometimes even an important one like the queen, to gain a positional advantage that will win the game. If we consider the move and then immediately evaluate it, it might seem insane to intentionally lose our queen and the opportunity to win will be lost.

Edit Sober

Another great insight in this quote is that the creative process happens in two distinct stages. There is the drunk stage where you freely come up with ideas without judgement and then there is the sober stage where you pick one of the ideas and start to flesh it out. These two stages are completely different contexts, and switching between them incurs some overhead cost. Knowing when to switch is an important part of being creative.

With this process, the skill of creativity is to recognize a good idea through a process of selection. You become sort of like a music critic rummaging through recently released albums trying to find something to recommend to your readers. Sometimes a great piece of music won’t just jump out at you. Some of my favorite albums required multiple listenings for me to appreciate them. Many good ideas will challenge you to find their value. These tend to be the best ideas though, because if it were obvious, then everybody would be doing it already.

This selection process must be done sober. One of the problems with actually getting drunk is that drunk people make really bad decisions when it comes to selecting something to act on. I think we’ve all had that experience.

Applications to Programming

Since my craft is computer programming, I’ve thought about this quote in the context of what I do. Write drunk edit sober works for writing code too. When you first start on a project, none of the rules for best practices are practical. They just get in the way and slow you down. I make big ugly monoliths, write giant functions, hardcode everything, copy and paste big swaths of code around, and all the other stuff they teach you not to do the first day on the job. I can write code really quickly and efficiently this way because I’m self-taught and this is exactly what I did for the first few years and nobody told me any differently. I made some really beautiful disasters like this.

These days, nobody ever sees my code in this state because after I’ve gotten something basically working, only then do I clean it up and make into something pretty. After it works and all the details are in place, cleaning things up becomes really easy. The abstractions are neat and pretty because they were made at the last minute by necessity, not up front because of a guess. None of the cruft survives the editing process. Anything I show to anyone has probably been rewritten three or four times just by drunkenly iterating through bad ideas and then polishing up whatever is left.

I really wish language designers would take my workflow into account by providing me with tools to support both stages of work. I think we can split up programming languages (and frameworks) into those that are drunk and those that are sober.

Drunk Languages

  • JavaScript
  • Python
  • C
  • Scala

Sober Languages

  • Rust
  • Java
  • Go

The problem with this dichotomy is that it is very hard to write sober code in a drunk language and it’s very hard to write drunk code in a sober language. This is my main problem with Rust how it is right now. It’s extremely hard to drunkenly iterate with Rust code because it forces you to deal with a bunch of details you aren’t prepared to think about. In Rust, your code won’t even compile unless it’s guaranteed to be thread safe, has no memory leaks, and a lot of other things. Once you actually get your code to compile, it tends to be extremely reliable and safe. But your abstractions are going to be weird because it’s so painful to try new things without slogging through a bunch of details first.

A language like JavaScript has the opposite problem. Writing drunk code is really easy because there aren’t many rules. But cleaning it up after is very hard because your code doesn’t have a lot of enforcable structure to it. Anything can be anything at any time, which is very liberating in the first stage because it gives you a lot of flexibility, but becomes infuriating when you realize it’s nearly impossible to finish a big project cleanly. I’ve heard a lot of people complain about this when their Node projects ultimately become unmanagable when the structure becomes difficult to reason about.

I really wish someone would make a language that did both of these things well.

So I think the things we can learn from this quote are 1) don’t judge your ideas too early and 2) design your APIs for both stages of the development process.

Happy coding and please drink responsibly.

Balance What You Read, Think, and Do

In the past few years, I’ve learned a lot of new things. The software industry changes very quickly, and I need to stay up to date on the current trends and practices to be effective at my job. To make things more difficult for myself, I’ve made an effort to work in as many different fields of technology as I can. I’ve done frontend, databases, embedded systems, graphics, DevOps, and management in tens of programming languages, and along the way I’ve established myself as a capable generalist problem solver in several domains.

If you need to learn a lot of new things quickly, it pays dividends to be mindful of your learning process. Going from a cold start in a new field can be intimidating and stagnating in a field you already know can be a frustrating experience. To help me overcome these challenges, I’ve developed a system for learning to help me make decisions for how to spend my time in the most effective way possible, which I’ll present here. It’s written with software in mind, but I believe it can be applicable to any kind of learning. This system is a work in progress, so feedback is appreciated in the comments.

The three vital activities for learning are reading, thinking, and doing. These activities should be balanced to create a positive feedback loop. Each activity is a force multiplier for the next. The better you read, the better the quality of your thoughts become. Better thoughts will lead to more effective action. And more effective action will lead to better reading.

Learn for a Purpose

Learning for its own sake is a beautiful endeavor that everyone should engage in from time to time. Everyone should know a little bit about things like history, chemistry, or the classics. You don’t need a system like this for that kind of learning. An important part of this system is having some sort of purpose to work towards. If you want to be a novelist, you should be working towards writing a book. If you want to be a well respected software engineer, you should be working towards creating great software. At the end of the day, we are judged by the results we achieve and not our capabilities.

Don’t take results too seriously though. Learning is a long process and not everything you do will have a direct impact on what will ultimately become your greatest achievements. Plan to throw one away. Take risks and expect to produce some really bad stuff while you are learning. Use these experiences to refine your purpose.

The Three Activities

Now I’ll explain the role of the three activities and most importantly how to find a balance between them.

Reading

When asked about his genius, Isaac Newton famously said “If I have seen further it is by standing on the shoulders of Giants.” How to read well is an art unto itself that deserves another article. In this context, reading has an important relationship with the other two activities.

Any piece of writing on its own is simply some arrangement of symbols on a page. What brings the writing to life is the experience a person has while reading it. Every person who reads a passage in a novel gets a different mental image of the setting and the characters because we bring our own experiences to the scene. If the passage takes place in a port, I’ll imagine it’s like a port I’ve been to. If one of the characters acts like one of my friends, I’ll relate to the character like I relate to my friend.

Reading nonfiction is the same way. If I’m trying to build microservices, I’ll have a completely different experience with a book on microservice architecture if I’ve actually tried to build one. If I’ve thought deeply about microservices, I may find the author has put into words exactly what I was thinking but in a more eloquent way. Use the other two activities to provide context and purpose for your reading. Experience on the subject you are reading about deepens your reading experience so you can spend your reading time more efficiently.

Reading Too Much

To be well read in itself is rarely the purpose we are trying to achieve with learning. Read too much and you risk becoming the stereotypical academic locked away in an ivory tower and disconnected from the real world. The character who comes to mind is Chidi from The Good Place. Chidi is a college professor who is an “expert” on the moral philosophy of Immanuel Kant. Chidi is very well read, but is characteristically adverse to making actual moral decisions. Without the context of being a moral person, we see through the course of the show that Chidi actually has not gained any understanding of morality despite being well read on the subject, and ultimately ends up in Hell because of it.

Reading Too Little

Reading too little is a missed opportunity to learn from the experiences of those who have come before you in the field. As a beginner, I sometimes find myself averse to reading because I believe it will stifle my creativity or I just want the experience of figuring things out for myself for fun. As I learn more and become an expert, I tend to think I already know everything there is to know and my mind closes to new ideas. It’s important to fight these tendencies and keep reading on a subject no matter what your skill level is. Reading too little leads to stagnation of your thoughts.

Thinking

The philosopher René Descartes was famous for locking himself in his room, laying in bed all day and thinking deeply about things. During these bouts of meditation, he came up with the Cartesian coordinate system that we all learn about in high school, and a lot of other influential ideas in philosophy like “I think therefore I am”. Similarly, Immanuel Kant would take very long walks every day where he would think about the great ideas of his moral philosophy. Aristotle even went so far as to say that the unexamined life is not worth living.

The purpose of high quality reading is high quality thinking. High quality thinking leads to high quality actions, like in the examples above.

Thinking Too Much

“A person who thinks all the time has nothing to think about except thoughts. So he loses touch with reality, and lives in a world of illusion.” – Alan Watts

Low quality thinking spins around in a circle. At its worst, it becomes existential despair, like the famous opening line from the play Waiting For Godotnothing to be done. Low quality thoughts simply lead to more thinking ad infinitum.

The most important skill to develop with your thinking is knowing when to stop. When you have something that looks like a good idea, it’s time to go to the next step and start implementing it. You don’t need to have all the details worked out in advance. Things will become much more clear once you have at least part of your vision out of your head and into the real world. The only way to have your thoughts build on top of each other is having something in front of you to give you different things to think about.

Thinking Too Little

The risk of thinking too little is doing the wrong thing. Without taking the time to absorb what you read, you may develop a sense of false confidence where you believe you are an expert in a field you really don’t know much about. No matter how much work you put into creating your work, if you start with a bad idea you will not be successful. In business, this may lead to the very common mistake of creating the wrong product for the market. If you feel like you are just spinning your wheels without really going anywhere, you probably need to spend some more time thinking about what you’re doing.

Doing

Real artists ship. – Steve Jobs

The purpose of going through this process is to actually create something valuable and that happens in this stage. Now that you’ve thought about what to do and read enough to know how to do it, it’s time to get to work.

Doing Too Much

It may seem counterintuitive, but spending too much time on action directed towards your goal can be unproductive if you aren’t mindful of what you are doing. When playing the piano, there is a big difference between practice and performance. When I learned how to play the piano, I started with scales and exercises. I found these exercises to be tedius because what I really wanted to play was Mozart. As I got better and learned a few songs, I found that improving at these scales and exercises was the only way I could improve at playing songs. The key insight I learned form this experience is that you don’t get better at playing Mozart by playing Mozart. If you spend too much time on your performance without practicing, you will learn bad habits and it will be harder to get better.

Software is exactly the same way. You have to practice at it to get better, and this practice is a completely different sort of exercise than what you will do at your job. When you practice, spend your time pouring over the code rewriting it until you get everything perfect. Go very slowly and strive for perfection, like a pianist who slowly plays a passage of Mozart over and over until it is perfect. Then when it is time to write code under the pressure of a deadline, you’ll get a better result much faster. Both practice and performance are essential for effectively getting the most out of what you do.

Doing Too Little

Without having actual experience, you won’t have context to absorb what you read to the fullest extent. Without getting your thoughts out into the real world, you’ll need to keep so much in your head that there won’t be room for anything else. Taking too little action can cause your learning to stagnate just as much as too little reading or thinking. And at the end of the day, it’s time to start performing and working towards what will become your greatest achievements, because that is after all the point of going through all this work.

Conclusion

This system has come about from years of observing my own process of learning and it seems to work for me pretty well. However, I don’t consider it complete and this is my first time writing it down so let me know what you think. There’s a lot here that I’d like to refine in further articles.

How to Make an Open Source Feature Request

When using an open source project, you may find it lacks some important feature you need to work with it effectively. When you are in this situation, you can either request the feature you need and implement it yourself, or just look for another project to use that has something closer to what you need. While most of the time people pick the second option, requesting and implementing the feature yourself can have a lot of benefits.

  • Maintenance work on the feature can be shared by all of its users
  • Working within the project exposes powerful internals that can give you exactly what you are looking for
  • Implementing the feature can give you deep knowledge of the project that can be shared within your organization

If you always choose to look for another project when it lacks a feature, you are missing out on one of the main benefits of using open source software: the fact that you have access to the source and are able to change it. This is an enormous amount of power to have, and the top technology companies take advantage of it. When time and budget allow, it should be considered as an option for your important dependencies.

Making good feature requests is an essential skill to master to be productive with open source projects. As an open source maintainer, I’ve seen a lot of variation in the quality of feature requests to my projects over the years. Making a good feature request is much more difficult than people realize. It’s part creative, part sales, and part technical. But when you get it right, it’s one of the most rewarding experiences I’ve had as a developer.

It may seem intimidating at first, but it’s much easier when you know the rules. In this article, I’d like to share some things I’ve learned about making good feature requests to help you create contributions that are able to get the attention of maintainers so they can be accepted into the project.

Creating an Issue

Once you have something in mind to work on, the first step is always to make sure you have an issue to work from. The most common beginner mistake I see is to start coding right away. This works for simple bugs when you fix something that’s obviously wrong, but any nontrivial feature will require some discussion before it’s ready to be implemented. You want to get people involved in the design process as soon as possible. The amount of discussion you generate with your proposal is a great way to gauge how interested other people are in the feature. Someone who is involved early in the process will have the best understanding of your goals and will provide valuable feedback that will guide your development. Treat anyone who gets involved early in the discussion as a potential user of the feature. Even if they seem adversarial or the feedback is negative, taking the time to respond at all should be taken as a sign of respect and an indication that a conensus is possible.

If the project is active, chances are someone may have already thought of your feature and proposed it on the issue tracker. Spend five to ten minutes searching for your issue with different wordings and see if you can find something similar to avoid creating a duplicate issue. If an issue already exists, read through the discussion carefully because it can save you a lot of time by not duplicating a discussion that has already happened or not trying an implementation strategy that is known to have problems. Check if someone is already working on the feature. If you see that someone is actively working on it, you can still contribute by adding your opinion to the discussion, testing the active branch, and reviewing the code. However, don’t get discouraged if you see someone who claims to be working on the feature who doesn’t have an active branch they are working on. In my experience, about eighty percent of the people who start working on something never actually finish it. Ask if you can pick up the work where they left off and try to credit them the best you can in your work. The best thing to do would be to use their commits directly in their branch, but that’s not always possible so at least give some thanks in your commit message.

Once you have an issue to work from, now it’s time to explain what you want to do. A good feature request always has at minimum these three components:

  1. The use case
  2. The approach
  3. The test

The Use Case

Coming to an agreement on the use case is the most important part of the discussion. If the maintainers agree that the use case is important to support, the rest is just implementation details. If the maintainers believe the use case is not valid, then nothing else is important. Don’t try to sell a beautiful approach for an invalid use case. The three most effective ways to justify a use case are 1) appeal to the project’s mission, 2) appeal to similar features, 3) demonstrate user demand.

The project’s mission is often best expressed in the description which is usually in the form of e.g., “a library to do X”. Justify your use case by explaining how your feature facilitates the user to more effectively accomplish X with this library. A more detailed description of the project mission is usually included in the project overview or the README. You may even find your feature is explicitly requested or blacklisted within the documentation. You can use all of these sources of information to support your case.

Additionally, look for features in the project that are similar to yours. This sort of appeal is extremely efficient because you get to reuse all of the justification that was used to justify the similar feature, which by default is assumed to be valid or otherwise the similar feature would be deprecated. On a related note, keep in mind that this sort of justification is so powerful, that maintainers may be wary of accepting even small features that may expand the scope of the project in unpredictable ways. To accept a certain feature with a certain justification is to implicitly accept all future features proposed within the same scope. If the project doesn’t have the resources to support the whole class of features, this is a good justification by the maintainer to reject the smaller feature even if it doesn’t add a lot of complexity by itself.

Finally, it is important to demonstrate user demand. Most often the user of the feature will be yourself or another project you are involved with. If you can, demonstrate a concrete use case with issues from other projects. Explain how adding the feature will help to fix those issues on the other projects. High demand for a feature means a larger pool of developers who can potentially come fix bugs when things break, as well as more influence within the project space.

The Approach

Discussion of the approach should come after everyone has come to a rough consensus on the nature and validity of the use case. Give a general overview of the changes you will need to make to implement the feature in a few sentences. For example, explain whatever new classes you might need to add or how the existing functions need to be modified. The amount of complexity a maintainer will allow in an approach usually relates to the strength of the use case, with a stronger use case warranting a greater amount of complexity. Changes that break backwards compatibility and need a major version bump need the most justification, so don’t propose these kinds of changes unless it’s clear to everyone why it’s important to do so.

Explaining the approach is an important step because people who know more about the project will often have valuable feedback on what you’ve proposed. What might seem simple to you may not be extensible enough to accomidate future planned work, or there might be unwritten conventions to follow so your code fits better with the project’s style. Knowing these things up front will save you a lot of time in code review.

The Test

Finally, you should include a test with your feature request. These don’t have to be formal tests, just an example snippet that demonstrates what the important part of the API will do when you are done. Including a short test tends to bring about a good discussion of details you might not have thought of like error handling. You need to have a test in mind during development anyway so you might as well post it on the issue. Be honest about any edge cases and bring up any problems you find in the issue as early as possible.

Issue Discussion Etiquite

These sorts of discussions are what give open source projects their reputation for being unfriendly places. Please keep in mind that discussions on open source issue trackers have very little resemblence to what is commonly called “normal human interaction” and a different set of rules tends to apply. They can sometimes resemble a game unto themselves much like poker with lots of posturing and bluffing. You may even sometimes feel like a lawyer in a court room. The most important thing to keep in mind is to always begin your thoughts from a place of respect and do not take things personally. The goal of the discussion is to always move forward towards a consensus. If you sense that no progress is being made, do not repeat your points. Wait a few days for others to chime in with a fresh opinion. Be willing to accept that not every idea you propose can come to a consensus in the project, and have a backup plan such as forking or starting a new project that can accomidate your use case.

Now Start Coding

Congratulations, your feature request was accepted. With all you’ve been through, this might seem like the easy part. It is very rare to have a feature rejected at this phase, but I have regretably seen it happen before. There’s a lot more that can be said after this point such as how to make a good pull request and how to respond to feedback during code review, but I’ll leave those topics to another post.

Messing Around with JavaScript Decorators

I’ve been looking at the new features in ES6 and boy has JavaScript changed a lot in the last few years. The ECMAScript standards team is really doing a great job making the language more comfortable to work with. My favorite features are block scoped variables with let, a better syntax for defining a class (I never really understood what a “prototype” was), built-in support for loading and exporting modules, and native support for promises. Proxy objects seem like they could be a powerful tool as well. My first impression is that JavaScript is starting to look like a more liberal version of Python, which isn’t a bad thing.

One of the features I looked for in ES6 and could not find was support for function decorators. Decorators can be a great thing to have in a language sometimes. When they fit into an api, they really fit in well and I often use them in my Python library code. I was surprised this feature didn’t make it into ES6 because they are used extensively as part of Angular and React and have native support in TypeScript.

The proposal for decorators can be found in this repository with user guide located here. The proposal is currently at stage two which means it will likely be included in the language in the next major update but are not ready to be included in production code yet. I wrote some sample decorators to test out the new features which you can find in my notes repository here and I’d like to explain to you how they work.

Note: I am not an expert JavaScript, front-end, or Node developer so if you see anything I’m doing wrong in these examples, please let me know in the comments or in an email.

Building the Project

Unfortunately, there is no native implementation for decorators in Node (currently version 11) or any browser yet so you must compile your decorator code with a project called Babel. This project requires two additional plugins, @babel/plugin-propsal-decorators and @babel/plugin-proposal-class-properties.

npm install --save-dev @babel/cli \
    @babel/core \
    @babel/plugin-proposal-class-properties \
    @babel/plugin-proposal-decorators

Babel must also be configured with some options in a .babelrc file you can find here.

Now you can compile and run your code with babel like this and it should work correctly:

babel ./index.js -o index-compiled.js && node ./index-compiled.js

Anatomy of a Decorator

A decorator is basically just a function that gets called in the context of a target method or property that is able to change its state somehow. Here is an annotated example of a decorator which does nothing:

function(descriptor) {
  // alter the descriptor to change its properties and return it
  descriptor.finisher = function(klass) {
    // now you get the class it was defined on at the end
  }
  return descriptor;
}
 
class Example {
  @decorator
  decorated() {
    return 'foo';
  }
}

The descriptor that is returned from the decorator is an object that you can mutate to change the target method. The important properties of this object are:

  • kind – whether this is a ‘method’, a ‘field’, or something else
  • key – the name of what is being decorated (in this case, ‘decorated’)
  • descriptor – contains configuration for the property, and the value which you can hook into with custom behavior
  • finisher – add a function to be called after the class is defined for customization of the class

To have your decorator take parameters, use a function that returns a decorator like the one above.

function decorator(options) {
  // options are passed with the decorator method
  return function(descriptor) {
    return descriptor;
  }
}
 
class Example {
  @decorator({foo:'bar'})
  decorated() {
    return 'foo';
  }
}

Example: Log a Warning When Calling a Deprecated Method

function deprecated(descriptor) {
  // save the given function itself and replace it with a wrapped version that
     logs the warning
  let fn = descriptor.descriptor.value;
  descriptor.descriptor.value = function() {
    console.log('this function is deprecated!');
    return fn.apply(this, arguments);
  }
  return descriptor;
}
 
class Example {
  @deprecated
  oldFunc(val) {
    return 'oldFunc: ' + val;
  }
}
 
let ex = new Example();
ex.oldFun('foo')
// > this function is deprecated!
// > oldFunc: foo

Example: Make a Property Readonly

function readonly(descriptor) {
  descriptor.descriptor.writable = false;
  return descriptor;
}
 
class Example {
  @readonly
  x = 4;
}
 
let ex = new Example();
ex.x = 8;
ex.x;
// returns 4. note that you don't get a warning for trying to set a readonly property

Example: Reflect a Class

Given a class, we want to find all the methods that are decorated with a certain decorator. We will use the @property decorator for this. This sort of thing would normally be used for a base class in your API that your user is expected to override.

function property(descriptor) {
  descriptor.finisher = function(klass) {
    klass.properties = klass.properties || [];
    klass.properties.push(descriptor.key);
  };
  return descriptor;
}
 
class Example {
  @property
  someProp = 5;
 
  @property
  anotherProp = 'foo';
 
  static listProperties() {
    return this.properties || [];
  }
}
 
Example.listProperties()
// > [ 'someProp', 'anotherProp']

Conclusion

Decorators open up a lot of possibilities in a language and I am looking forward to their inclusion into JavaScript. I plan to use them in a Node library I am writing right now. However, reflection is a very powerful tool and should not be used without careful consideration. Make sure the decorator pattern actually fits your use case before you decide to use them. Have fun with decorators!

Playerctl at Version 2.0

I’ve spent the last month revisiting an older project of mine called Playerctl. I wrote the first version nearly five years ago over a weekend and have been making small tweaks to it here and there in my free time since then. The idea was to make a command line application to control media players so I could bind a command to keyboard key combinations to have media key functionality. I do mostly everything on the keyboard so I found it distracting to reach over to the mouse, find the media player window, and click on a button whenever I wanted to pause the player or skip a track on the radio. Playerctl works great for this. These lines have been in my i3 config for years.

bindsym $mod+space exec --no-startup-id playerctl play-pause
bindsym $mod+$mod2+space exec --no-startup-id playerctl next

Another goal of the project was to access track metadata for a “now playing” statusline indicator in i3bar, tmux, or whatever. I did a few implementations of this, but never really made a satisfying statusline (more on that later).

Other people seemed to have found it useful too, and to this day it’s the most popular project I’ve published under my name on Github. With users come issues when people find bugs and limitations in the interface for their use case. These discussions with users have guided the development of version 2.0 of Playerctl and I think I’ve addressed everyone’s concerns. The main points that drove development for this version were:

  • Make the command line application easier to use with multiple players running at the same time
  • Make it easier to print metadata and properties in the format you want
  • Make it easier to make a statusline application
  • Make the library more usable

It wasn’t easy to do these things, but in the end, I’m pretty happy with how things came out and I hope everybody enjoys the new features. Version 2.0 was a big effort and represents nearly a rewrite of v0.6.2 and a doubling of the size of the code base to accomidate the new features. In this post, I’d like to go over the changes, share some of the rationale for the design choices I made, and alert you to some breaking changes in the new version.

Player Selection Overhaul

In the old version, you could select players by passing a name or comma-separated list of players to the --player flag but I found this feature rather limiting because the behavior was simply to execute the command on all the players. That makes it not very usable for the play command because you almost never want all your players to start playing something at the same time. My thinking is most people have something like a “main music player” which is good at handling large playlists and then a secondary player they use for movies or other random media on their system. So I changed the default behavior of the --player command to only execute the command on the first player in the list that is running and supports the command. That way people can pick the priority of the players they want to control, and if the command is not supported (such as if you command the player to go to the next track but there is no next track), it will skip the player and go to the next one. You can still get the old behavior by combining this flag with the --all-players flag.

Another requested feature was the ability to ignore players instead of explicitly whitelisting them. This is useful for instance to ignore players that are not really media players like Gwenview. Now you can pass those players with --ignore-player and they will simply be ignored.

Another detail about player selection is that now a player name will match all the instances of the players, which it didn’t do before. So if I pass --player=vlc, all the instances of VLC that are open will be selected.

I think that should cover everybody’s needs the best I can, but there are always going to be edge cases that I won’t be able to address without mind reading abilities.

Format Strings

Before format strings, the way to get multiple properties was normally with multiple calls to the CLI like this:

artist=$(playerctl metadata artist)
title=$(playerctl metadata title)
echo "${artist} - ${title}"

But that’s obviously not a very elegant solution and doesn’t scale very well if you want to print more than a few things. I implemented a few features in this version to address this. One feature is that you can now specify multiple keys and each key you specify will be outputted on its own line.

$ playerctl metadata artist title
> Katy Perry
> California Gurls

Then you can split on the newline and they’ll be in an array in that order. I still wasn’t very happy with this because it’s not very semantic for what people are trying to do with it. What people said they wanted was for a raw playerctl metadata to output JSON they could parse, which I wasn’t willing to do because I don’t want to add the dependency just for that feature, and even then, scripts would then need something like jq to parse the output. What I did do is make the metadata call output a table (instead of the serialized gvariant before) which is great for readability, but I wouldn’t recommend parsing it.

vlc   mpris:trackid             '/org/videolan/vlc/playlist/37'
vlc   xesam:title               Synthestitch
vlc   xesam:artist              Garoad
vlc   xesam:album               VA-11 HALL-A - Second Round
vlc   xesam:tracknumber         34
vlc   vlc:time                  281
vlc   mpris:length              281538480
vlc   xesam:contentCreated      2016
vlc   vlc:length                281538
vlc   vlc:publisher             3

So after deliberating for awhile, I decided to do the hard thing and just go ahead and write my own template language based loosely on jinja2 but with a lot fewer features. The parser was a lot of fun to write and I’m happy with how it came out. Now you can do this:

$ playerctl metadata --format '{{artist}} - {{title}}'
> Garoad - Your Love is a Drug

I even put template helpers into the language for additional formatting you may want to do to make the variables more readable. This is the format string I use for my statusline generator right now:

fmt='{{playerName}}: {{artist}} - {{title}} \
     {{duration(position)}}|{{duration(mpris:length)}}'
playerctl metadata --format ${fmt}
> vlc: Garoad - Dawn Approaches 3:28|4:10

The duration() helper converts the position from time in microseconds to hh:mm:ss format. There are a few others too that are pretty interesting documented in the man page.

Follow Mode

If you wanted to use the CLI to make a statusline before, you basically had to poll. And if there’s one thing that everybody hates to do, it’s polling. It would be better to have a tail -f style flag that blocks and automatically updates when things change. This was the hardest feature to add because there was a lot of functionality lacking in the library, and the CLI was designed to be mostly stateless because it was only supposed to be a one-off. There’s also a lot of edge cases with players starting, exiting, and changing states which is difficult to get right. I put a lot of detail into making sure the most relevant thing is shown on the statusline based on the input. If you have players passed with --player, it will show them in order of player priority based on the last player that has changed. If you pass --all-players, it just shows whichever one changed last. I think the last one is what I prefer.

It even works with the --format arg and will tick if you give it a position variable. Here is the grand finale:

fmt='{{playerName}}: {{artist}} - {{title}} \
     {{duration(position)}}|{{duration(mpris:length)}}'
playerctl metadata --all-players --format ${fmt} --follow

My own personal statusline implementation of this can be seen here in i3-dstatus, another one of my neglected projects that will get my attention next.

Library Improvements

I originally had bigger plans for the library, but didn’t end up doing as much with it. I still think it’s really cool though, and I want to keep supporting it. The problem was there was no way to listen to when players connect and disconnect to control, so you basically had to run your script when you knew your player was running which is not great. I needed to add this feature for the follow command anyway, so I decided to go ahead and externalize it in the form of a new class called the PlayerctlPlayerManager. This is meant to be a singleton which emits events for when players start, and keeps an up-to-date list of player names that are available to control. It can manage the players for you too and alert you when they exit.

Here is an exmaple of the manager in action:

#!/usr/bin/env python3

from gi.repository import Playerctl, GLib

manager = Playerctl.PlayerManager()

def on_play(player, status, manager):
    print('player is playing: {}'.format(player.props.player_name))

def init_player(name):
    # choose if you want to manage the player based on the name
    if name.name in ['vlc', 'cmus']:
        player = Playerctl.Player.new_from_name(name)
        # connect to whatever you want to listen to
        player.connect('playback-status::playing', on_play, manager)
        # add the player to the list of managed players and be notified when it
        # exits
        manager.manage_player(player)

def on_name_appeared(manager, name):
    # a player is available to control
    init_player(name)

def on_player_vanished(manager, player):
    # a player has exited
    print('player has exited: {}'.format(player.props.player_name))

manager.connect('name-appeared', on_name_appeared)
manager.connect('player-vanished', on_player_vanished)

# manage the initial players
for name in manager.props.player_names:
    init_player(name)

main = GLib.MainLoop()
main.run()

I tried to make as few breaking changes to the library, but a few were inevitable. There are also a few deprecations that will affect anyone who based a script of the previous example code. See the library docs for more details.

There are a lot of other little changes, but those are the main ones. Enjoy Playerctl 2.0!

Software Maintenance is Like Golf

I think software maintenance is one of the least understood concepts among engineering managers. By maintenance I mean all the small little tasks developers do to make their code nicer to work with, refactoring, testing, as well as fixing bugs. This part of the development process is difficult to manage for several reasons. For one, developers tend to be quite bad at making a case for why these activities are necessary. Maintenance is essentially a technical task so there is a mismatch in communication between decision makers and developers that is difficult to overcome. Often a developers arguments for refactoring reduce to aesthetic principals of best practices for coding that are difficult to reason about clearly but may still be impactful on the outcome of the project. The decision to spend resources for maintenance activities must involve a degree of trust in developers to spend their time wisely. And my experience tells me that often they don’t, so maybe managers are right to be a little skeptical.

While a little skepticism towards maintenance activities is healthy for managers, I believe this point of view when taken to extremes can cause managers to hold an inaccurate model of how their software is actually being developed. I once had a manager say to me that “refactoring is a bad word” and in my personal experience, tasks related to code quality have been the most difficult to pitch because this bias is so common. In fact, Stripe recently published a paper which identifies these activities as waste with the implication that if developers just wrote their code correctly the first time, we could save 85 billion dollars in lost GDP per year.

While it’s possible to be an effective manager with this simple heuristic, I think there’s a different understanding of the process of developing software that is closer to reality although maybe a bit less intuitive. The uncomfortable fact is that maintenance activities are seemingly unavoidable and sometimes refactoring really is the most impactful thing your developers can be doing for the outcome of the project. While these decisions may rarely make or break the project, having a good understanding of software maintenance is a great way to become a more effective leader and gain respect among your engineers.

The Current Metaphore

Metaphores and language shape the culture of our teams. Just like the ancient Greeks created myths to explain the chaos of the world in human terms, we do the same thing as engineers and managers. The current metaphore of maintenance is understood in the same way as home maintenance. For instance, my kitchen sink clogged up recently so I had to have a plumber come to the house to fix it and he charged me $300. It’s easy to understand this cost as a waste. My sink worked perfectly well, then I called the plumber and the end result is the same as it was before the clog: a working sink. If the plumber would have offered to rearrange the pipes to be more efficient for a cost of an extra $500, I would have respectfully declined.

Depreciation is a real phenominon for code, but it happens for a different reason than plumbing or factory equipment. The reason code requires maintenance is not because it wears out when you use it. Code is just a description of a deterministic logical process. Given the same inputs and state of your hardware, you will get the same sort of result now that you will get 20 years from now. Rather, code depreciates because people change the way they use it over time.

With the home maintenance metaphore, bugs are understood to be clear cut and well defined (we call these regressions: when something used to work and now it doesn’t), but the overwhelming majority of bugs I’ve encountered are not regressions, but rather come from the user using the software in a way that was not expected by the original designer. The user did something that seemed like it should have worked, but then the software did something else entirely. In this case, the line between a bug and a feature is not clear and it’s often not useful to make the distinction at all. For instance, if the user installs my software in an environment I didn’t do any tests for and runs into problems, is it a bug because the software is not working correctly or a feature to add support for the new environment? Whatever we call it, usually it doesn’t affect the discussion so we don’t bother with semantics. Sometimes we just tell the person we don’t want to support their use case and close the bug as wontfix. Windows users should be used to this by now.

Most “maintenance” work is actually feature development in disguise. The cause of most bugs are changes in user expectations. Your developers want to refactor the code base because they are anticipating changes in user expectations and they want to get an early start implementing the features they think you’ll need while they have the problem fresh in their mind. If you understand things this way, you should see that the home maintenance metaphore is limiting for practical decision making about maintenance activities.

The Golf Metaphore

I rather see maintenance not as a wasteful activity, but rather an important part of the development process. To me, a software project is a lot like a round of golf. The object of the game of golf is to get a ball into a hole across a field using clubs with as little effort as possible. Effort is measured by strokes, or how many times you hit the ball.

On the first stroke (the drive), you are very far away from the hole. The objective of this stroke is not necessarily to get the ball in the hole. It could happen, but getting a hole in one is really just a lucky outcome that can’t be attributed to the skill of the golfer. There are a lot of factors the golfer is not thinking about at this point like the exact speed of the wind that may alter the ball’s trajectory by several feet. The best you can hope for is to get close enough to make the rest of your shots easier.

Let’s notice some things about a good drive. First of all, the ball travels about 80% of the distance to the hole during this shot. Second, this shot costs the same as any other: one stroke. If you do a naive calculation of distance per stroke, you will come to the conclusion that your drive is by far your most efficient shot. With this data in hand, a good golf coach might tell his player your drive is your most efficient shot, so just do that every time. Worse yet, if you golf as a team and have a designated driver, you might mistakenly think this is your star player because he moves the ball the farthest. Truly this must be a 10x golfer!

But in reality you can’t drive every time (even if you could, you may never get to the hole). Your next shot requires a different set of skills and even a different set of clubs. The ball will travel much less distance during this shot, but it still counts the same as the shot before: 1 stroke. Finally, you are close enough that it’s time to putt. The putt uses the smallest club and causes the ball to move the shortest distance, but the effort required is the same as the drive.

Software maintenance work is like putting. At this point, small details matter like the contours of the earth and the length of the grass. And while this shot is not nearly as efficient as the drive, it’s the best way to play golf (as long as you don’t make the mistake of doing it too early). Some people even make a game out of just this part: minigolf and it’s pretty fun.

So if you think about a software project like this, you should see that maintenance work is just part of software development just like putting is part of golf. If your code base needs refactoring or you have bugs, it doesn’t mean that anybody made a mistake or wrote “bad code” just like a golfer who doesn’t make a hole in one isn’t a bad golfer. That’s just how golf works.

My Definition of a Hacker

The word hacker can be an charged word that means different things to different people. Usually this word is used among people outside of the software industry to refer to a computer criminal, but there are alternative definitions that predate this that are still in common use today within the industry that don’t have anything to do with criminal behavior (such as the popular forum Hacker News). In this post, I would like to explore the different uses of this word and try to come up with an alternative definition that unifies them all.

Alternative Definitions within the Software Industry

Within the software industry, the term is sometimes used to describe a talented computer programmer who enjoys solving problems with code. The common definition can be used within the industry as well, but this might be seen by some as controversial. In an influential attempt to define the term in 2001, Eric S. Raymond, a legendary self-described “hacker”, made this distinction.

There is another group of people who loudly call themselves hackers, but aren’t. These are people (mainly adolescent males) who get a kick out of breaking into computers and phreaking the phone system. Real hackers call these people ‘crackers’ and want nothing to do with them.

While I agree that the word can and should be used to describe noncriminal behavior, if a colleague calls me in the middle of the night and says to me “hackers just broke into our database”, I’m not going to spend any time correcting his usage. The word cracker seems to have fallen out of style because I’ve never heard anyone use it.

Rather, the word hacker is more often used as a term of endearment among software engineers to denote membership in a particular subculture. Between developers, a hacker is someone who enjoys solving difficult problems in a creative way. Richard Stallman, another legendary hacker, puts it like this:

Hacking means exploring the limits of what is possible, in a spirit of playful cleverness.

Hacker culture values a healthy skepticism of authority and the status quo in search of more effective and more interesting solutions to common problems. You can hear echos of the hacker ethos in the present Silicon Valley ideal of the disruption of established industries. In fact, both Stallman and Raymond were particularly disruptive to the software industry in their time by developing the Free and Open Source Licenses which now form the legal basis for how companies and individuals share common code bases with each other.

There is one more common usage of the word hack in the industry that is not as flattering. It refers to a solution to a problem that is either seen as overly complicated, fragile, or uses a low-level interface to accomplish a high-level task. This usage is somewhat related to the common definition of “a professional who is bad at something” (such as the comedian I saw last night is a hack), but in programming the word is more specific and I think it’s more related to the above two definitions as this article explains well. The word can also be used positively to describe “a clever hack” which is a surprising use of an algorithm or obscure interface to accomplish a task in an unexpected way. This is the sense of the word in the phrase “life hack”.

Putting it all Together

These definitions might seem to be incompatible with each other, but I think they are related in a simple way. Let’s review them and try to find out what they have in common.

  1. Someone who breaks into computer systems
  2. A creative and disruptive programmer
  3. Inappropriate or surprising use of low-level interfaces

I would like to propose a definition that unifies all of these usages.

A hacker is someone who creates an interface where none existed before.

[note: I think this is an original insight, but please let me know in the comments if someone has already put it like this].

Thus, hackers are people who create new interfaces. This definitely describes the pioneers of the early Internet protocols and the creators of the GNU operating system like Richard Stallman who made a Free Unix interface. These people are legendary hackers and they deserve our praise. It also describes people who break into systems and steal data. For instance, there is not an interface on your banking website that lets you take money out of someone else’s account and put it into yours. If you create this interface (hopefully using a surprising low-level approach!) then you are a hacker and you should go to jail.

Let me explain what I mean in an analogy of a house. A common saying is when God closes a door, he opens a window. Entering a house through the window is a hack in all three senses. It is 1) a way to subvert God’s security system of the house by bypassing the lock on the door to gain access, 2) a creative solution to a technical problem which disrupts our notion of home entry, and 3) the abuse of an interface to do something it was not designed to do. I think all hacks have these elements to them. While good hacks are not criminal, there is still a sort of a sense of mischief to them and a spirit of doing things you are not supposed to do. Someone who solves a problem in the way they are “supposed” to do it is definitely not a hacker after all. And of course, to create any new interface requires using a lower level interface by definition.

Don’t get me wrong, I’m not trying to compare the hacker community to criminals. These are two distinct groups of people with very different goals. But maybe these similarities are why the terms got tangled up together in the media.

Command Line vs GUI

An important topic in interface design for end users that is rarely discussed is when to design an application for the command line and when to design it as a GUI. Most computer users have a strong opinion about this. Nontechnical people often seem to hold the belief that the GUI is simply more advanced technology and command line applications are an artifact of a primitive era, while some hardcore adherents to the philosophy of Unix believe that command line applications are superior. I believe that the truth is somewhere in the middle and each paradigm of UI design can be applied appropriately to different use cases. In this blog post, I will give an outline of things to consider when choosing between the two when designing or using an application.

Technical Literacy of the Users

First consider how technically literate your users are. Not everybody who uses a computer is necessarily a computer person and that’s perfectly ok. Technical literacy proceeds much like normal literacy. When people first learn to read, they tend to start off with picture books. The pictures in a picture book give context to what is going on with the story. When words are necessary, the context of the pictures can help the reader to understand words or concepts they are not familiar with. This is a bit like the icons of a GUI which can give some visual context to a command that might not be easy to understand with words alone.

As we learn to read, we tend to prefer novels. Words are capable of expressing certain subtleties that pictures alone cannot. Some complex processes are much better modeled with words than pictures. What we lose with the visual immediacy of the GUI is made up with a greater capacity for complexity. Anyone who has put together furniture with the picture-only instructions from Ikea might know what I mean (or is that just me?).

Takeaway: always use a GUI if your users are nontechnical.

Composability and Automation

The most obvious weakness of a GUI is that they are notoriously difficult to compose and automate. By compose I mean the ability to take the output of one program and use it as the input of another program. Command line applications can be combined and assembled into scripts that can accomplish a task automatically on certain events. This isn’t possible with GUIs because they require user interaction at runtime. Command line applications are like Lego blocks that can be assembled into whatever you need at the time and rearranged when your needs change.

This has a strong influence on the interface design in either paradigm. GUI applications need to have strong and well thought out opinions on how the user will use the application. They live alone at the end of a chain of automation so they must provide everything the user needs to accomplish their task. Thus GUI applications tend to have a much larger scope than command line applications making them more difficult to develop and maintain. This isn’t all bad though because highly opinionated software tends to be easier to use, given you use it for the purpose it is designed for.

Command line applications can be designed with a limited scope to accomplish a general task. This is in line with an important tenant of UNIX design philosophy:

Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new “features”. – Doug McIlroy

This design approach is sometimes called “worse is better”. A design that has less features (“worse”) will be simpler to use and understand (“better”). Less opinionated software has the advantage of being more extensible, that is, able to be used for a purpose that the designers of the software did not anticipate.

Takeaway: prefer the command line when your application might be used as part of a larger workflow.

The Visualness of the Domain

The most obvious weakness of command line applications and text interfaces is that they are not capable of displaying raster or vector data (e.g., images or elevation data) in their most natural visual form. A full-featured web browser or image editor must always be written as a GUI.

In this situation, it’s often a good idea to provide both a command line interface and a GUI. For instance, when I’m doing image processing, I will start with GUI image editing software like Gimp and then when the task becomes more clear, I’ll switch to a command line image editing application like imagemagick which can be automated. Doing a simple task like resizing all the images in a directory is much easier to do on the command line than with a GUI image editor, but determining exactly what is the right size is easier in the GUI.

Takeaway: if your domain is visual, provide both a command line and a GUI application.

Documentation and Discoverability

GUI applications are known for being difficult to formally document. Instructions on how to do things are presented with screen shots often with little circles around the right places to click which the user must repeat themselves to get the right result. Following these instructions for especially complex tasks can start to feel like a scavenger hunt. Seemingly innocuous changes in the UI can cause the documentation to go out of date because the screenshots no longer reflect what the application looks like and updating all the screenshots is tedious. The reality is that documentation for GUIs is rarely helpful and therefore rarely used. Instead, users expect the interface itself to guide them towards their goal. GUIs are great at this because the flow of the application can be laid out visually in a way anyone can understand without reading anything.

This experience isn’t possible with a command line application. Reading (and therefore documentation) is simply required to use the software either by passing a --help flag or reading the installed manual page. However, command line applications can be documented much better because the documentation medium is the same as the command medium (text!). Often times, documentation can simply be evaluated directly by copying and pasting an example command directly from the manual page or your web browser. This makes command line applications easier to support through text based communication like email or instant messaging, because telling someone what to type is easier than telling them where to click when you can’t see their screen.

On a related note, GUIs are better at internationalization. Command line applications and their parameters are almost always in English which not everybody knows, while GUIs can provide an emersive environment in many different languages.

Takeaway: use a GUI if your users don’t like to read.

Portability

The greatest advantage of command line applications is you can run them on systems without a graphical environment. As long as you have a shell on the computer, such as with a remote shell protocol like SSH, you can run the command. This is invaluable when you need to run your application on a remote server or an embedded device. A GUI application can normally only run on a computer with a connected mouse, keyboard, and monitor. Remote desktop protocols exist, but they are heavy on resources and much less reliable than SSH. If your workflow primarily consists of command line applications, it’s possible to work seamlessly on many machines at once which is a big boost to productivity. This is the reason why I prefer to use a lightweight text editor like Vim instead of an IDE.

Takeaway: use the command line if you need to run on servers and embedded devices.

Conclusion

This all might seem obvious, but you’d be surprised at how often I see people choosing the wrong tool for the job. Using both command line applications and GUI applications are an essential part of being a computer power user and knowing when to use each one is an important skill for computer programmers who want to work as efficiently as possible.