If you are a frequent user of Excel, you most certainly know the VLOOKUP function. If not you can read this tutorial to learn this must-know function.
But if you have ever worked on large Excel files with hundreds of thousands of rows, you certainly experienced how painful the VLOOKUP function can be. Although very useful to link data from multiple tables, the Excel VLOOKUP function quickly becomes unable to calculate in an acceptable amount of time. On large Excel files, it can take up to an entire hour to recalculate your spreadsheet, with most of the time application failure in the middle.
Waiting for Excel to calculate formulas
The good news is, you can make VLOOKUP formulas work much faster with a technique called the Double-VLOOKUP trick.
So how does it work ? Let’s start with a quick reminder on how the VLOOKUP function works. If you already know that you can move to the next section.
The regular VLOOKUP function
The VLOOKUP functions writes as VLOOKUP(lookup_value,table_array,col_index_num,[range_lookup]), where:
Lookup_value is the ID you are using to link your two tables. For instance your ID is “1234” if you want to lookup a value for ID "1234" on another table.
Table_array is the table in which you are looking. It’s a table with IDs in the first column.
Col_index_range is the index of the column which contains the target value in the table_array.
Range_lookup is the match type. This is going to be the decisive argument in this post, so pay attention. Range_lookup can be TRUE or FALSE (or alternatively, 0 or 1 which Excel translates respectively as TRUE and FALSE). TRUE stands for approximate match, while FALSE stands for exact match.
In the example below, the VLOOKUP function will return the value of column number 2, when it finds the ID "1234" in the table. So it will return the value 17.
Excel regular VLOOKUP function
Unsorted search Versus binary search
Fine observers will have noticed that I haven’t explained the [range_lookup] argument yet. FALSE corresponds to searching unsorted data, while TRUE corresponds to a binary search. What’s the difference?
Regular search (FALSE) is super slow, because Excel checks for each row in the table if it matches your ID, until a match is found. Statistically, it takes searching half of the rows to find a match. If you have 100k rows in your first table and 200x in the second table, that’s 100k x 200k x 0.5= 10 billion operations!
Binary search (TRUE) works only on sorted data, but:
Because your IDs are sorted, Excel can work smarter. On the first calculation, it checks the value in the middle of the table. If it is smaller than your ID, then Excel knows your ID can only be found above (because the column is sorted), and if it is above the ID can only be found below. That means in just 1 calculation, Excel has already ruled out half the column. On the second calculation, it will do the same and rule out again half of the remaining, etc.
Consequence is: binary search is way faster than unsorted search. Way way, way faster.
Binary search is super fast
The only issue is: what happens if your ID does not exist in the table you are looking for? Your VLOOKUP function will return the value for the closest match to your ID, but what you want is usually an exact match. So binary search is faster but you cannot be sure the result is correct! Damn it!
The Double-VLOOKUP trick
The Double-VLOOKUP trick is made to use the crazy speed of the binary search while ensuring the result is correct. This is how we do it:
The first VLOOKUP checks that the ID you are looking for exists in the table, and only if he exists the second VLOOKUP will return the corresponding value. Doing so, you avoid getting wrong values when there is no corresponding ID.
So how much faster is it?
According to the experiment run by Dailydoseofexcel.com, the Double-VLOOKUP is 28 times faster than the regular VLOOKUP formula for a table of 10,000 rows, and this increases up to 3,606 times faster than the regular VLOOKUP for a 1,000,000 table.
Although the regular VLOOKUP function is just simpler to use on a day-to-day basis, keep in mind that if your spreadsheet starts taking time for calculations, you can make it much faster by using the Double-VLOOKUP trick. Just remember to sort your IDs before and your spreadsheet will calculate crazy fast again.
If your version of Excel is not too old, you should also consider using the new XLOOKUP function which replaces VLOOKUP with more flexibility and better performance.
Frequent user of Excel and PowerPoint? Download the Power-user add-in and you will get super powers with tons of templates, charts, icons, diagrams, and many tools that will save you hours!