Header Ads

ToDo or Toodledo. That is the Question. Again.

One of my most popular posts at my other blog is a comparison of ToDo and ToodleDo for the iPhone.  The original post was written a while ago, and both apps have had several significant revisions since then.  So, I'm refreshing the post here: I've gone over the presentation, update the information to reflect the current versions of these two apps, and tweaked the data to reflect my most current thinking.

I like PDAs because they help me manage the things I have to do – and I’m all about the todo lists.   I don’t know if I’ve become dependent on lists because I have a bad memory, or if my memory is failing because I use lists for everything.  Still, there it is.

Over the past year or so, a number of task manager apps have come out for my beloved iPhone, and I’ve been trying most of them.   It’s surprising how I keep coming back to the same two apps, and equally surprising (to me) that after months of playing around with them, I still can’t quite decide which one I prefer.

The two apps is Appigo’s ToDo, and ToodleDo for the iPhone.  Both cost only a few dollars, and both are very well-rated by the public at large.

So, I figured, let's use some design analysis tools to evaluate the two apps, and see what the numbers say.

I’m going to use two tools: pairwise comparison, and a weighted decision matrix. These tools aren’t only useful for analyzing designs – they’re basic decision-making tools, and they’ve always done right by me to evaluate designs, conceptual or otherwise.

Both tools depend on having a good set of criteria against which the two apps will be compared. You might not know what decision to make, but you need to know how you’ll know you’ve made the right one.  In our case here: How do I know when I’ve found a good task manager app?

The formal term for what I’m doing here is qualitative, multi-criterion decision-making. It generally comes involves four tasks, which in my case are:
  1. Figure out criteria that apply to any “best” task manager.
  2. Rank the criteria by importance, because some criteria will affect my decision more than others.
  3. Develop a rating scale to rate each app.
  4. Rate the apps with the rating scale and the weights.
Here’s my criteria, in no particular order of importance, based on years of using other task management tools:
  • Fast.  No long delays when telling the app to do something.
  • Easy.  Minimal clicking (e.g. not having to hit “accept” or "save" for everything, or burrow into deeply nested forms and subforms).
  • Start dates.  Tasks shouldn't appear on any standard task list until its start date (if given).
  • Due dates.  Obviously, but not mandatory on all tasks.
  • Repeats.  Repeating tasks at regular intervals.
  • Priorities.  At least three levels of priority for tasks.
  • Sync.  Easy syncing to some remote service that is fairly robust, using standard formats, that let's me access my tasks from other devices.
  • Groups.  Group tasks by tag or folder or project or whatever.
  • Sorting.  Multiple ways to sort tasks.
  • Hotlist.  Some overview page showing only near-term, important tasks; preferably customizable in terms of how I define "important."
  • Restart.  Picks up next time I run it where I left off last time (oddly, not every iPhone app does this).
  • Recovery.  Be able to uncheck tasks that were accidentally checked off.
  • Subtasks. Treat a single task as if it were a group/project/folder.
  • Checklists. A degenerate case of a task is just an item in a checklist.  Not every "task" really deserves all the attributes.  Checklists that can be used as templates (i.e. copied over and over again) would be even better.
  • Conditional deadlines.  Due dates based on due dates of other items (e.g. task B is due two weeks after task A is completed).
  • Backlinks. Given a task, one-tap access to the group/project/folder in which the task lives.
Oddly, not a single iPhone app I’ve checked out so far meets all my requirements.   In particular, I’ve not even heard of an app that even tries to meet the last two requirements. I say “oddly” because I don’t think these requirements are excessive or bizarre, and I do think they'd be immensely useful.  Still, there it is.

Next, we have to develop weights to assign relative importance to the criteria.  The word relative is key here; we’re not going to say that one criterion is certainly and universally more important than any other.  What I want is to know how important each is with respect to the others and my own experience.  Remember, one size never fits all.

This is where pairwise comparison comes in. Details on how this works are given in another web page (it isn’t hard). The chart below is just the end results. In each cell is the criterion that I thought was more important of the pair given by that cell’s row and column. Since it doesn’t make sense to compare something to itself, and since these comparisons are symmetric (comparing A and B is the same as comparing B and A), then I only need to fill in a little less than half of the whole chart. If you’re thinking this took a long time, you’d be wrong. It took me about 30 minutes to fill in the whole thing.

A Fast - B A D E F H I J K L M N A P A
B Easy
- C D E B B B J K B B B B P Q
C Start Dates

- D E F H I J C C M N C P C
D Due Dates

- DE D D I D D D D D D D D
E Repeats

- EF E I E E E E E E E E
F Priorities

- H I J F F M N O P Q
H Sync

- H J K L H N O H H
I Groups

- J I I I I I I I
J Sorting

- J J J J O J J
K Hotlist

- L K N K P Q
L Restart

- M N L P Q
M Recovery

- N O P M
N Subtasks

- O P N
O Checklists

- P O
P Cond. Deadlines

- P
Q Backlinks -

This leads to the following weights:

Fast 2.46%
Easy 6.56%
Start Dates 4.10%
Due Dates 11.48%
Repeats 11.48%
Priorities 4.10%
Sync 5.74%
Groups 9.84%
Sorting 9.84%
Hotlist 4.10%
Restart 3.28%
Recovery 4.10%
Subtasks 6.56%
Checklists 4.92%
Cond. Deadlines 8.20%
Backlinks 3.28%

So this tells me, for instance, that having due dates and repeating tasks are the two most important criteria.  Task grouping and sorting are a close second.  And so on.

The point of this process is that the human mind is not good at juggling a bunch of variables, but it is very good at comparing one thing against another. Take the trivial case of choosing between three alternatives, A, B, and C. If you prefer A to B, and B to C, then you should accept the logic that A is the most preferred item. To do otherwise just isn’t rational. That’s exactly what pairwise comparison does. And there’s good evidence that this technique actually works.

The next step is to choose a rating scale. This scale will be used to rate each app with respect to each criterion.
There’s a variety of scales I could use, and a great deal of research into qualitative measurement scales has been done. The scale that works best for me – and seems to be the most general – is a five-point scale from -2 to +2, where 0 means “neutral,” -2 means “horrible,” +2 means “excellent,” and -1 and +1 are in-between values. If you prefer something a little finer, you can use a 7-point scale from -3 to +3. I think it’s important to have a zero value to indicate neutrality, and I find it meaningful to have negative numbers stand for bad things and positive numbers for good things.

It’s interesting to note that in some industries (e.g. aerospace), I’ve noticed a tendency to use an exponential scale – something like (0, 1, 3, 9). This is because aerospace people tend to be extremely conservative (for reasons both technical and otherwise), so they tend to underrate the goodness of things. This scale inflates any reasonable rating to make up for that conservatism.

But I’m neither an aerospace engineer nor particularly conservative, so I’ll use the -2 to +2 scale.

Now we can do the weighted decision matrix. The gory details are given elsewhere. The weights come from the pairwise comparison above. In a decision matrix, we rank each alternative to some well-defined reference or base item. We need a reference because we need a fixed point against which to measure things.  For this comparison, I'll use the task manager that I am actually using these days, Pocket Informant for the iPhone, as the reference.

I worked up a weighted decision matrix comparing ToodleDo to ToDo. Here it is:

Ref (PI) ToodleDo ToDo
Wgt R S R S R S
Fast 2.46 0 0 0 0 0 0
Easy 6.56 0 0 -1 -6.56 1 6.56
Start Dates 4.10 0 0 0 0 -2 -8.2
Due Dates 11.48 0 0 1 11.48 1 11.48
Repeats 11.48 0 0 1 11.48 1 11.48
Priorities 4.10 0 0 -1 -4.1 1 4.1
Sync 5.74 0 0 0 0 0 0
Groups 9.84 0 0 0 0 0 0
Sorting 9.84 0 0 1 9.84 0 0
Hotlist 4.10 0 0 1 4.1 1 4.1
Restart 3.28 0 0 0 0 0 0
Recovery 4.10 0 0 0 0 0 0
Subtasks 6.56 0 0 0 0 0 0
Checklists 4.92 0 0 0 0 1 4.92
Cond. Deadlines 8.20 0 0 0 0 0 0
Backlinks 3.28 0 0 0 0 0 0
100.04 0 26.24 34.44

This table might not look like much, but it tells a bit of a story.  The column marked Wgt is the weight of that criterion taken from the pairwise comparison.  Each of the three apps gets two columns.  The R column is the rating I gave it; PI is the reference, so it gets zeros in every category.  That way, if another app does better than the reference, it gets a positive rating, and if it does worse than the reference, it gets a negative rating.  The S column is the actual score, which is the rating multiplied by the weight for that criterion.  The numbers at the bottom of the S columns are just the arithmetic sums of the individual scores.

If you look at the ratings for ToDo, you see that it’s a bit better than ToodleDo on some points, and a bit worse on others. But the +1's don’t actually cancel out the -1's because of the weights. The criteria on which ToDo beat ToodleDo are more important to me than the others, because the weights are higher. That makes ToDo noticeably better than ToodleDo.

It's interesting to note that this version has me preferring ToDo over ToodleDo, whereas my original post had it the other way around.  This is because of all the updates to both apps since I first compared them.  Even though there are some things about ToodleDo that really turn my crank, ToDo is the better app, because it does better on things that I think are more important.

And that jives nicely with my intuition.  I started with ToDo, then switched to ToodleDo (just before I did my first comparison).  But now, given the improvements to ToDo, it's taken the lead again.  If it weren't for the decision matrix, I'd only have a "gut feeling" telling me which was better.  But now, having done the comparison twice, I understand and can explain why I preferred one, then the other, then the one again.

One might ask, then, why I'm using Pocket Informant since both ToodleDo and ToDo beat PI.  The answer is simple: appointments.  PI integrates appointments sync'd with Google Calendar right into the app.  That is an absolute deal-breaker for me: it's just too useful for me to have my appointments and tasks all available under one roof, so to speak.  If I'd've added appointments as a criterion, both ToDo and ToodleDo would have lost to PI.

Back during my first comparison, I ran into a problem with ToodleDo that - though it has been corrected since - remains noteworthy with respect to doing these kinds of comparisons.

The problem was this: ToodleDo used to generate the next in a series of repeating events only when it sync'd with the ToodleDo service.  ToDo, on the other hand, handled repeating events internally.

This was a problem for me when I travelled. I had gone to Berlin for a conference. And I didn’t have a data plan for my iPhone (that’s a whole separate story), so I couldn’t sync either app. But that meant ToodleDo couldn’t roll repeating items over properly.  So before I went to Berlin, I sync’d up ToDo and used it while I was gone.  And when I came back I switched back to ToodleDo.  I did that whenever I travelled.

Does the evaluation consider that? No it doesn’t, because I didn’t. The evaluation is only as good as the evaluator. When I evaluated the two apps, I was nestled snugly at home, WiFi at the ready – and sync’ing either ToDo or ToodleDo was a non-issue. If I’d've done the evaluation in Berlin, I’m sure I’d've gotten different numbers, because the repeating events problem would have been right there in my face, irritating the hell out of me.

So this underscores a limit with the evaluation method – indeed, a limit with any method: it’s only as good as the situation you’re in when you use it. Some people might say a method is only as good as the information you use, but it’s more than that. My situation, in this case, includes me, my goals (at the time), my experiences, all the information I have handy, constraints, and anything else can possibly influence my decisions at the time.

The problem, then, is that a method depends on the situation when it’s used. But that situation may be different for the person doing the evaluation than for the person(s) who will have to live with the decision being made. Indeed, it’s virtually guaranteed that the situations will be different, if for no other reason than the implications of a decision will only occur later.

Does this put the kibosh on these kinds of methods?

Not at all.   It just means that we must be vigilant and diligent in their application.   If I did the evaluation in Berlin, ToDo would have won, because in that situation, ToodleDo would have scored poorly on repeating events.  This is as it should be.  That means that in the two different situations, the method worked.  The problem is that in any one given situation, there’s no way to take into account any other situations.

Happily, there is fruitful and vigorous research concerned exactly with this. Some people call it situated cognition; others call it situated reasoning. We’ve not yet figured out how to treat situations reliably, but I think it’s only a matter of time before we do.

In the meantime, there is at least one other possible way to treat other situations. A popular technique to help set up a design problem is the use case (or what I call a usage scenario). These are either textual or visual descriptions of the interactions involved in using the thing you’ll design. They can be quite complex and detailed. Usage scenarios try to capture a specific situation other than the one that includes the designers during the design process. So it’s at least possible that usage scenarios could help designers evaluate designs and products better.

One final caveat: this evaluation is particular to me. It is unlikely that anyone will agree completely with my evaluation, because their situations are different from mine. So I’m not saying ToDo “is better” than ToodleDo. I’m just saying it seems to be better for me.

As they say: your mileage may vary.


  1. That has to be the best reasoned analysis of HOW to compare the various to do apps available on the market that I've seen in days of research. I've tried ToDo and Toodledoo, but have recently been researching 2Do and Pocket Informant due to personal recommendations from others. I've been considering suggesting RTM to my mother, whose one technological feat is learning to use the iPhone, and who has very different requirements for a to do list. Most reviews simply discuss the reviewers opinions and compare the features. Well done.

  2. Thanks for the kind words. Pocket Informant is very robust, but I prefer having an "action list" like Things by Cultured Code, or Taska, than the way PI does things. I've yet to take a serious look at 2Do, but I don't like the look and feel. Highly unscientific, but there's only so many hours in a day. RTM, similarly, doesn't support the action list approach I like so much.

    There's always subjectivity in these "reviews." That's the advantage of writing about HOW to do things. So it's most gratifying to reach your acknowledgement of that.


  3. Greetings from South Africa :-)

    Wow! This article / document is VERY informative! Thank you for publishing - APPRECIATED!

    I am deliberating how to implement GTD based system having just started using a Blackberry. Pen and paper have been primary for past year or so but always with problems... Previouse mobiles included iPaq and Samsung Omnia both using WM with PI (since 2005) and to be honest the OS became the point of failue... now using BB seems a better platform but must now decide 'how' to implement and your input is going to help BIG TIME ! Thanks again.

    Best regards

    Nigel R

  4. You're welcome!

    Unfortunately, I've never used a BB so I can't comment on what might be a good app for GTD on it.

    I do know that Pocket Informant has a version for BB. PI is very good. I don't use it because I personally find it too complicated (remember that I'm more on the AutoFocus side of things myself). However, I have used PI for the iPhone and found it very robust and usable. You might want to read up about it.


  5. For fast and easy work out and manage the things with significance the tool that has been a part and parcel of my life is the cloud based Replicon's task management software (http://www.replicon.com/olp/task-management-software.aspx) which can be better in all terms of the devices like iPad and iPhone.

    1. Stephie - this appears to be simple advertising. Please do not peddle things on my blog.
      In any case, your product appears to be large and complex. My interest here is in smaller systems for PERSONAL task management.
      I honestly don't think you're doing your company any favours by dropping "free ads" here. Pay Google to advertise for you.

  6. Wow, Filippo, awesome analysis! I have been using PI now for about 6 months and really like. It is complicated which I think truly does turn some people off.
    I have never used either ToodleDo or ToDo but I see a lot of references to people that use either of these programs alongside PI. My simple question is, with not having used either program, what would be the advantages of using them with PI?


    1. Personally, I would not mix PI with anything else. PI is an all-in-one solution and I think you'd just be asking for trouble trying to use it along-side anything else.

      What advantages might there be? I honestly don't see any at all. ToodleDo and ToDo are really subsets of PI's functionality, but both have more usable interfaces that PI does.


Powered by Blogger.