• Select Random Rows of existing data (Excel 97 SR2)

    Home » Forums » AskWoody support » Productivity software by function » MS Excel and spreadsheet help » Select Random Rows of existing data (Excel 97 SR2)

    • This topic has 20 replies, 6 voices, and was last updated 23 years ago.
    Author
    Topic
    #371872

    I have a spreadsheet with multiple sheets, each sheet has approx 2000 rows of data ( names and addrresses etc ). I want to extract about 100 names and addresses from each sheet.

    How do I do this?

    I have tried the sampling from the data anaysis tool, but this does not seem to like non numeric data.

    Any ideas please

    Carl Cross UK

    Viewing 2 reply threads
    Author
    Replies
    • #592379

      You can do the following in each of the sheets. If the used range has the same size in each of the sheets, you can also select all sheets.

      In the first empty column next to the data, enter =RAND() in the first cell, and fill down to the last row with data.
      Now sort on this column and take the first 100 rows.

      (If you want to be able to restore the original order, also add a column with 1,2, 3, … next to the data before sorting on the random numbers. Then you can sort on this column aftwerwards.)

      • #592381

        Hi Hans,

        Were you reading my mind? (look at the time and content of our posts. Now tell me there is no witchcraft involved .

        • #592385

          Jan Karel,

          The similarity is amazing. Maybe it’s because we’re both Dutch? (insert wooden shoe/windmill/tulips logo here)

          Regards,
          Hans

          • #592386

            Hi Hans,

            <>

            ROFL

            You forgot to add cannabis and XTC though .

            • #592408

              Ah, be careful now! This way, the Lounge will become off-limits for people using filters to protect the innocent…

              Regards,
              Hans

            • #592515

              Sounds like double-dutch grin

    • #592380

      On each sheet, add a column with this formula:

      =RAND()

      Now sort the entire sheet with that column as the primairy key.
      You can now take the first or last 100 rows for your random selection.

    • #592550

      Take a look at THIS THREAD and see if my macro does what you are asking.

      • #592616

        That is a good approach, but I have a concern. A true random selection should eliminate any potential for bias. If the original list is ordered, and the random selection stops, there may be a bias in selection towards the upper parts of the list.

        It would almost seem prudent to insert a column of random values, sort on that column, select the entries there, and then delete the column of random values, and resort to the original format.

        • #592643

          I’m don’t understand what you mean by “If the original list is ordered, and the random selection stops?” Why would the random selection stop?

          • #592703

            Let’s say that you have an original list of 20,000 records, and from this you want a sample of 100 or 0.5%. On your selection of 100 samples, on an example run, let’s say that the 100 selected accounts are reached at record 18,500. Records 18,501 to 20,000 would not be used for the sample. If the original list was sorted in any form (ie Alpha, by Zip/Postal, by Income, etc), the sample would have a built in bias against the latter stages of the list. By randomizing the sort order of the original list first, the bias would be eliminated.

            Hope that clarifies my original message.

            Regards

            • #592904

              The same would be the case if you put random numbers in a column, and sort on that column. If the random number that selects the 100th record were to fall on row 18500, then the last 1,500 rows would not be used. This does not make the selection any less random. The last 1500 rows still had exactly the same probability of being selected as the first 1,500.

            • #592905

              Not necessarily. Let’s assume that the list was sorted by state, and the last 1,500 records in this example contained all the entries for Wyoming, and Washington. (If I’ve forgotten any states near the end of the alphabet, my apologies.) With the list in its natural form, no selections from those two states would be selected. However, if the list was randomized, then the entries for those two states could be selected. The bias created by the sort on state would be eliminated, as the list was now sorted on random numbers.

              I hope that clears it up.

              Regards

            • #592906

              I am not a statician, but I don’t believe that that there is any bias. I think that the probability that no records from those states being selected is the same either way. The probability of the random numbers missing a record from Washington in position 20,000 should be the same as if it was in position 1. If you look at the loop in my macro, each pass through the loop has the same probability of selecting a record in the last 1,500 as the first 1,500 (assuming that the random number generator is truly random and we all know that it is not). Any bias would come much more from the method of choosing the random numbers than from the order of the list.

            • #592909

              I believe that there is a distinction, but to allow us to get back to more Excel specific enquiries, shall we declare this “dead horse” officially “beaten”?

              Regards

            • #592911

              Well, if there is bias, I am interested in where it is coming from. So far, I do not see it.

            • #593078

              Beating this dead horse even more …

              It all depends on what the purpose of this selection is. If you need to have a random selection regardless of state, then there is -of course- no bias (provided XL’s RAND is really a good random number generator…).

              If however one needs to have a representative sample of people from each state, this is just the wrong method to select them and bias has nothing to do with it.

            • #593092

              Ther solution given was perfect for my needs.

              Thank you for the interesting posts esp the bizarre dutch ones

              Carl Cross

            • #593095

              <>

            • #593154

              Yes. If you need a representative sample from each state, then you would need to use this method seprately on each state to pick the number of representatives you need from that state.

    Viewing 2 reply threads
    Reply To: Select Random Rows of existing data (Excel 97 SR2)

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: