• Conditionally Void a Cumulative Sum

    Author
    Topic
    #495779

    I have a large data set (800K rows), so I’m looking for 1) any solution to my problem, but 2) an efficient one would be better.

    See the attached file …

    I have columns A and B filled with binary data (ones and zeros). The ones and zeros come in bunches. The two columns are not correlated.

    I want to fill column C from column A. If the cell in A is zero, set the cell in C to be zero (no problem). If the cell in A is a 1, start a cumulative sum in column C (so 0,1,1,1,0) in A converts to (0,1,2,3,0) in C. Again, not a problem.

    Here’s the problem: I want to mask some of column C and put the result in column D. If there’s any cell in column B that’s a 1, and there’s a cumulative sum next to it in column C, then I want the entire set of cells in the cumulative sum to be reset to 0 in column D, including the rows before and after the one where there was a 1 in column B. In the example above, a (1,0,0,0,0) or (0,0,0,0,1) in B does nothing to C, but a (0,1,0,0,0), or a (0,0,1,0,0) or a ((0,0,0,1,0) would convert D to a (0,0,0,0,0). So do other combinations with more than one 1 in any of those three spots.

    For me, the roadblock is that I never know how many cells in D — before the 1, or after the 1, shows up in B — need to be reset to 0.

    Viewing 4 reply threads
    Author
    Replies
    • #1461709

      How about this in D2:

      Code:
      =IF(OR(A2=0,B2=1,AND(A1>0,D1=0)),0,IF(SUM(OFFSET(A2,0,1,MATCH(0,OFFSET(A2,0,0,COUNT(A:A),1),0)-1,1))=0,1+D1,0))

      A formula for C2 (if desired) would be:
      =IF(A2=0,0,1+C1)

      Steve

    • #1461725

      Promising (seems OK at the top). But … it returned a #REF! from row 262,600 downward.

      I think I know where the problem is, but I’m not sure how to address it.

      Because there are 786,435 rows in my worksheet, the count(A:A) portion of the formula will return a large height =offset(), which may carry that range beyond the number of rows allocated by Excel.

      ***********************

      Also, I have 2 rows of labels at the top, and 1 row of labels at the bottom which will make the count(A:A) slightly off. But, I’m not quite sure how this is working yet, so I’m not sure how to adjust for those. What’s the best way to make that modification?

    • #1461763

      Need confirmation …

      I believe I have solved this problem by changing the count(A:A) bit to count(A2:A$786434) for an entry in row 2.

      Am I missing anything that anyone can see?

    • #1461769

      You could use:
      =IF(OR(A2=0,B2=1,AND(A1>0,D1=0)),0,IF(SUM(OFFSET(A2,0,1,MATCH(0,OFFSET(A2,0,0,COUNT(A:A)-COUNT($A1:A1),1),0)-1,1))=0,1+D1,0))

      I reduce the count by the number of entries before. I just used count since I needed to ensure that I had the whole range of rows in a “bunch”. If you know the bunch will have a max number, you could just use that number (10, 100, etc). The exact number is not critical, it just has to be large enough to see the transition in the “bunch” it currently is on.

      Steve

    • #1461866

      Thanks a bunch.

      I’m actually parsing through a matrix that’s written out as a vector. And I can replace the count() with a large integer, since what I’m counting up has to all fall within a single row that’s about 1000 elements long. I’m sure that will save time.

      EOM

    Viewing 4 reply threads
    Reply To: Conditionally Void a Cumulative Sum

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: