• text frequency distribution (2000)

    Author
    Topic
    #1770877

    Hi all,

    I have a column of text entries, many of which are repeating. some fields are blank. I want to us arrays identify the unique fields in one column, and populate the number of times each is used the next column. For example, range “animals” contains:
    dog
    cat
    dog
    dog

    cat

    My results would populate:
    dog 3
    cat 2
    blank field 1

    I can see a few ways to do this manually, in a step by step process, such as filtering out unique fields “unique_animals”, then countif(animals, unique_animals), although that wouldn’t include blank cells, which I also want to count. Given the number of columns I would have to do this for, wondering if there is an all-in-one array formula so I could just populate the output without extracting unique fields first, or have to make a separate formula to count the blanks.

    I think this is a common array, but don’t recall how to do it…

    thanks,

    -Eric

    Viewing 3 reply threads
    Author
    Replies
    • #1788985

      Eric,

      The easiest way is probably to use a PivotTable.

      1. Give your column of data a heading, e.g. ‘animal’.

      2. Create a PivotTable with your column set as the source data.

      3. in the PivotTable Layout, drag the animal column into both the ROW and DATA areas.

      Your PivotTable should now show the unique animals and a count of their instances as you require.

      (PivotTables do have certain limitations, e.g. size, but they do cater nicely for a huge subset of data consolidation or analysis requirements in a very quick and easy fashion)

      Regards,

      • #1788987

        That does not count the blank cells for me. It shows (blank), but there is no count. Is there any way to get it to count blanks?

        • #1789022

          you are right! I completely forgot about the blanks doh

          the only way I think is to replace them with a blank identifier e.g. ‘-‘

          if this is feasible then I still think a PivotTable is by far the easiest way, far simpler then adding columns with long formulae

    • #1788991

      I can’t do it with array formulas or pivot tables. A formula approach from Chip Person is in columns B & C, but it’s unsorted and the blanks count shows in every column C cell which is a duplicate.

      FWIW I can kind of do it with a WorkBook_Sheetchange event and Filter, Advanced, Unique, Another Location, per the attached example; see the Workbook Object. Not sure the code is terribly robust, had to mess about to ensure blank cells are counted. Also, I tried the Worksheet_SelectionChange event, see Sheet object, but it runs much slower on my machine than WorkBook_Sheetchange.

    • #1788995

      I think I am close, but not yet there, anyway, here it is:

      =(IF(COUNTIF($A$1:A1;A1)=1;COUNTIF($A$1:$A$10;$A$1:$A$10);IF(ISBLANK(A1);COUNTBLANK($A$1:$A$10);””)))

      where I assume the data is in A1:A10. Put this formula in B1 and drag down and if you then sort – ascending, (if you want) you are close to what you want. One unique entry should only give you one result in column B, except for the blanks, where you get the same count for each blank. If you sort them afterwards, this is not really a problem.
      Hope this helps.

      • #1788996

        Replacing your “;” argument separator with US convention “,”, I think you have it (where “Animals” is the range name of the list, and starting with cell A2 as the first data cell instead of A1):

        =(IF(COUNTIF($A$2:A2,A2)=1,COUNTIF(Animals,Animals),IF(AND(ISBLANK(A2),COUNTBLANK($A$2:A2)<2),COUNTBLANK(Animals),"")))

        I must remember =COUNTBLANK()!

        • #1789000

          Sorry for the ; John, I should know it by now…

          • #1789001

            Consider it a lesson to me in cross-cultural adapability! laugh It did throw me for a while, asking myself, what the heck are all these semicolons for?

    • #1789020

      You can do this without sorting if you have 3 intermediate columns.

      This example is using a list that extends to row 25.
      Assuming that row 1 has text titles, and the data starts in row 2, then in column B from row 2 onwards have this formula
      =MATCH(A2,$A$1:$A$25,0)
      which calculates the row position of the first occurence of any animal.

      And in column C from row 2 onwards have this
      =IF(NOT(ISNA(INDEX($A$1:$A$25,MATCH(ROW(),$B$1:$B$25,0)))),”here”,”not”)
      which singles out the first occurence of the results in column B.

      And in column D have the number 1 in cell D1, and then also have this formula
      =MATCH(“here”,OFFSET($C$1,D1,0,500,1),0)+D1
      from row 2 onwards. Which collects the individual animal names row positions.

      Then in columns E and F have the results, using these formulae. In column E have this formula
      =IF(ISNA(D2),IF(COUNTIF($E$1:E1,”=”)>0,””,””),INDEX($A$1:$A$25,D2))
      from row 2 onwards. Which creates the names of the animals in the cells at the top of the column,
      together with a special entry for blanks. And then in column F have this formula
      =IF(E2=””,COUNTBLANK($A$1:$A$25),IF(ISNA(D2),””,COUNTIF($A$1:$A$25,”=”&E2)))
      which gives the count for each animal.

      You can then group the columns B thru’ D, or hide them, as they are only necessary as intermediate steps.

      Hope this is useful.
      Glenn B

    Viewing 3 reply threads
    Reply To: text frequency distribution (2000)

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: