The Problem
I recently needed to fetch a value from a (comma-separated) text list, which is an equivalent problem to fetching a word from a (space-separated) sentence.
Google turned up some interesting but convoluted results (including the possibility that it couldn’t be done without VBA). Inspired by the solution at SourceDaddy ( http://sourcedaddy.com/ms-excel/get-nth-word.html ), I came up with what I think is a simpler approach.
The Setup
There are four inputs:
A2 contains the list or sentence.
B2 contains a numeric index indicating which item list or word we want to fetch.
C2 contains the delimiting character (either a comma or space).
D2 contains the marker character (we’ll use the caret ^, but you can choose any character or sequence that won’t otherwise appear in the list).
For clarity, let’s name each of these cells: A2 as List, B2 as Index, C2 as Delimit, D2 as Mark
The Solution
Now the formula is:
=MID(LEFT(List, FIND(Mark, SUBSTITUTE(List&Delimit, Delimit, Mark, Index))-1), FIND(Mark, SUBSTITUTE(Delimit&List, Delimit, Mark, Index)), 99)
Notice that there are only three references to the original List, and only two references to the Index.
There are four references to the Mark and six to the Delimit character, but these could be literals.
The Overview
Here’s an overview of what’s going on:
The second FIND(SUBSTITUTE()) gives us the start of the item we want; the first FIND(SUBSTITUTE()) gives us the end of the item we want; LEFT() returns the string up to the end of the item we want; MID() starts from the start of the item we want and runs to the end of the remaining string (up to 99 characters long — this can easily be increased).
The Full Analysis
Let’s break it right down.
Both SUBSTITUTE() functions replace the nth occurrence of the delimiting character in the text (where n=Index) with the Mark character. SUBSTITUTE() is used because it can replace just the nth occurrence (something that FIND() cannot do directly). That’s fine for finding the delimiter *after* the entry we want (since the nth comma will follow the nth item in the list, for example), but we should be looking for the n-1th delimiter to get the delimiter *before* our entry. In fact, both cases can fail if we don’t add the Delimit character to the start or end of the list first.
It’s easier to think about this if you add delimiters to *both* the start and end of the list: Delimit & List & Delimit. Now every item in the list (or word in the sentence) is surrounded by the delimiting character. For example: “21,22,23,24,25” becomes “,21,22,23,24,25,”. If we replace the nth comma with a caret ^, that caret will appear just before the item we want in the list: n=4 gives us “,21,22,23^24,25,”. Similarly, we can replace the n+1th comma with a double caret ^^: n=4 gives us “,21,22,23,24^^25,”. This would still work if we wanted the first or last word (which is where it would break if we didn’t modify the list, regardless of whether we searched for the n-1th, nth or n+1th delimiter).
But note that we’re not trying to mark the start *and* end of our word at the same time. We only need to mark one or the other. That means we can use the same Mark character in both lists (no need for a *double* caret for the end-of-word marker). It also means that if we want the delimiter *before* our word, we add an extra Delimit character to the *start* of the list and replace the nth occurrence. But if we want the delimiter *after* our word, we add an extra Delimit character to the *end* of the list and still replace the nth occurrence. This is kind of elegant, to me. And it doesn’t require any special treatment for the first and last entries in the list.
Using our examples, we get the following two lists (when n=4): “,21,22,23^24,25” and “21,22,23,24^25,”
We can now use the FIND() function to get the character index of the Mark character in each list. In our examples: 10 and 12.
The final bit of cleverness (to my mind) is to use these numbers in the right way. If we go back to the original list, we know our word starts at character 10 and ends at character 11 (=12-1). We can certainly use the first number with the MID() function (and we do). But MID() wants a *length*, not an “end character”. Instead, we can use the LEFT() function to first grab the full list just to the end of our entry, chopping off everything to the right (including the delimiter — that’s what the -1 is for). That gives us: “21,22,23,24” (which is 11 characters long). Now we can finish it off using MID(), from character 10 (in our example) to the end of the string (or at least the first 99 characters, if it’s really long).
The Out-of-Range Errors
The finishing touch is to put the whole thing inside an IFERROR() function, in case someone asks for an item that’s not in the list (eg: if the list/sentence has 10 items/words, it doesn’t make sense to ask for the 0th word, the -5th item or the 21st word). If you want to trap the two types of error (Index too small, Index too big) separately, put an IFERROR() around the LEFT() function and another around the whole thing — the first catches “too big”; the second, “too small”.
The Discussion
The thing I most like about this approach is that we never calculate the same thing twice. The usual approach would be: MID(List, FIND(start index), FIND(end index) – FIND(start index)), which means calculating the start index twice. I hate that. I’d love to be able to do the whole thing with just one reference to List and one reference to Index, but I don’t think this is possible. (Let me know if it is!)
The other great thing is the use of SUBSTITUTE() to mark the nth occurrence of the delimiter. Ideally, FIND() would be able to find the nth occurrence directly, but it can’t — and SUBSTITUTE() adds just a single extra step. (Credit goes to Source Daddy for this.)
The Terms
I’d love to hear people’s thoughts on my approach. If you like it, feel free to use it. You don’t need to credit me, but please don’t claim it as your own.
If you have an approach that you think is better, I’d also love to hear about it. Especially if you can get it down to just a single reference to List!
I hope my analysis is also useful for those learning how to build complex functions in Excel.
(I apologise if this is a known solution to the problem. That would be a case of parallel evolution. Apart from the inspiration from Source Daddy, I’ve developed this on my own.)