• First 20 records with a field unique/distinct (200

    Home » Forums » AskWoody support » Productivity software by function » MS Access and database help » First 20 records with a field unique/distinct (200

    Author
    Topic
    #449462

    This is a followup to my question 700541 in the Excel forum, where HansV wisely recommended I turn to Access instead. Having done so, I’m now asking how to tailor my query results to avoid duplicates in one particular field (DISTINCT seems to look for unique records instead).

    The scenario: Imagine a group of clinics (units) collaborating on medical studies. Study participants visit the clinics for their medical procedures. At a given clinic (identified by a unitID number) I want to count the first 20 distinct participants to complete a clinic visit on or after January 1, 2007.

    Here is the SQL code I have, using an example unitID:

    SELECT TOP 20 tblVisits.unitID, tblVisits.protocol, tblVisits.ParticipantID, tblVisits.EnrollmentDate, tblVisits.CompletedVisitDate, tblVisits.Visit
    FROM tblVisits
    WHERE (((tblVisits.unitID)=”45601″) AND ((tblVisits.CompletedVisitDate)>=#1/1/2007#))
    ORDER BY tblVisits.CompletedVisitDate;

    …but I think this will not prevent duplicates in the ParticipantID field. That is, 1 participant may have a second visit before 19 others have had their first.

    How can I get the query to return the first 20 records (chronologically) but only after skipping those records whose ParticipantID has already occurred?

    Thanks in advance,
    Erik

    P.S. I’ve included the fields “protocol” and “EnrollmentDate” because they will be needed later, but they are not relevant to this particular query.

    P.P.S. The next step is, I’ll want to do the same thing for all clinics: the first 20 unique participants at EACH clinic.

    Viewing 1 reply thread
    Author
    Replies
    • #1102044

      You can use two queries for this:

      1) A query based on tblVisits that selects the first visit on or after 1/1/2007 for each participant to a clinic:

      SELECT tblVisits.UnitID, tblVisits.ParticipantID, Min(tblVisits.CompletedVisitDate) AS FirstVisitDate
      FROM tblVisits
      WHERE (((tblVisits.CompletedVisitDate)>=#1/1/2007#))
      GROUP BY tblVisits.UnitID, tblVisits.ParticipantID;

      Save this query as qryParticipants.

      2) A query based on qryParticipants that computes the rank of each visit within the clinic, and selects the records with rank less than or equal to 20:

      SELECT qryParticipants.UnitID, qryParticipants.ParticipantID, qryParticipants.FirstVisitDate
      FROM qryParticipants
      WHERE (((Val(DCount(“*”,”qryParticipants”,”UnitID=” & [UnitID] & ” AND FirstVisitDate<=#" & Format([FirstVisitDate],"mm/dd/yyyy") & "#")))<=20))
      ORDER BY qryParticipants.UnitID, qryParticipants.FirstVisitDate;

      See attached demo (I didn't create the extra fields).

      • #1102056

        Thanks — safe to say I’d have never come up with this on my own…

        But there’s a problem. Working with my data, the first query returns 1914 results…but so does the second query. The problem seems to be somewhere in

        Val(DCount(“*”,”qryParticipants”,”unitID=” & [unitID] & ” AND FirstVisitDate<=#" & Format([FirstVisitDate],"mm/dd/yyyy") & "#"))

        because I exposed it as a field and the value was 0 every time (not 1-20 for each unitID as I expected…and as it comes back in your example). I know it's hard to diagnose without seeing my data, but do you know why that kind of thing might happen?

        Thanks again
        Erik

        • #1102057

          What are the data types of the UnitID and CompletedVisitDate fields in your table?

          • #1102058

            CompletedVisitDate is Date/Time (as expected) but unitID is text — a typical value is 034-001, with the hyphen in the middle. Can the query handle text, or should I add a field for numeric IDs to my units table?

            Thanks

            • #1102060

              Change the SQL of the second query as follows:

              SELECT qryParticipants.UnitID, qryParticipants.ParticipantID, qryParticipants.FirstVisitDate
              FROM qryParticipants
              WHERE (((Val(DCount(“*”,”qryParticipants”,”UnitID=” & Chr(34) & [UnitID] & Chr(34) & ” AND FirstVisitDate<=#" & Format([FirstVisitDate],"mm/dd/yyyy") & "#")))<=20))
              ORDER BY qryParticipants.UnitID, qryParticipants.FirstVisitDate;

              This encloses the value of [UnitID] in double quotes, Chr(34) is the " character.

            • #1102337

              That did it!

              HansV, many thanks again for your time and magic.

              Erik

            • #1102352

              Sorry, one more question:

              When I modify qryParticipants to restrict to only one clinic, the second query returns only 15 records, not 20 (although the first query returns 119 records). Do you know why this is? Here is the modified qryParticipants:

              SELECT tblVisits.unitID, tblVisits.ParticipantID, Min(tblVisits.CompletedVisitDate) AS FirstVisitDate
              FROM tblVisits
              WHERE (((tblVisits.CompletedVisitDate)>=#1/1/2007#))
              GROUP BY tblVisits.unitID, tblVisits.ParticipantID
              HAVING (((tblVisits.unitID)=”052-001″));

              Thanks again–
              Erik

            • #1102364

              Perhaps there were only 15 unique participants for that clinic on or after 1/1/2007?

            • #1102379

              LOL, that was my first thought too. But no, I’ve rechecked the data and there were 119 unique participants.

            • #1102381

              I’d have to see the data to know what is going on…

            • #1102382

              If I can figure out how best to anonymize it (without destroying my problem!) I’ll post an example. But in any event I really appreciate your guidance thus far.

            • #1102447

              Hans–if you have time to look at it, my data is attached…disguised everything except the dates, but consistently.

              Excel file with one worksheet of data per query. SQL of each query is in a text box on the respective worksheet. My question is why the second query yields 15 records and not 20.

              With many thanks in advance,
              Erik

            • #1102455

              I’m afraid that doesn’t help – the SQL on the first sheet mentions a unit 052-001 that isn’t present in the data, and it refers to a field CompletedVisitDate that isn’t present either.

              Could you post a stripped down copy of your database? See post 401925 for instructions.

            • #1102458

              Thanks for the instructions. Database, stripped & zipped, is attached.

            • #1102484

              The problem is that there are too many ties in the calculation of the rank. Here are the first 23 records, with the rank shown:

              FirstVisitDate Rank
              02-jan-07 2
              02-jan-07 2
              03-jan-07 8
              03-jan-07 8
              03-jan-07 8
              03-jan-07 8
              03-jan-07 8
              03-jan-07 8
              04-jan-07 15
              04-jan-07 15
              04-jan-07 15
              04-jan-07 15
              04-jan-07 15
              04-jan-07 15
              04-jan-07 15
              05-jan-07 24
              05-jan-07 24
              05-jan-07 24
              05-jan-07 24
              05-jan-07 24
              05-jan-07 24
              05-jan-07 24
              05-jan-07 24

              Because of the ties, there are only 15 records with rank less than or equal to 20.

              In the attached database, I have used a rather dirty trick to make the values unique: I added the unique ID divided by 100,000 to the visit date. Because there are no ties any more, you get exactly 20 records.

            • #1102493

              A dirty trick is the best kind of trick! grin Thanks.

              One good lesson for me is to remember the connection between “tie” and “rank”…

    • #1102498

      I (almost) never wear a tie. That must be why my rank is low… grin

      • #1103111

        One last (?!) question. In the SQL — or elsewhere — can I specify that VisitDate should be in date format? Right now it comes out 39084 etc. I tried to specify date format in the query’s Design View (Properties, Format) but Date wasn’t one of the options available (general number, currency, Euro etc.)

        Here is the SQL:

        SELECT qryParticipantsFirstVisitChooseDateChooseSite.unitID, qryParticipantsFirstVisitChooseDateChooseSite.protocol, qryParticipantsFirstVisitChooseDateChooseSite.ParticipantID, qryParticipantsFirstVisitChooseDateChooseSite.EnrollmentDate, Int([FirstVisitDate]) AS VisitDate
        FROM qryParticipantsFirstVisitChooseDateChooseSite
        WHERE (((Val(DCount(“*”,”qryParticipantsFirstVisitChooseDateChooseSite”,”UnitID=” & Chr(34) & [UnitID] & Chr(34) & ” AND FirstVisitDate<=#" & Format([FirstVisitDate],"mm/dd/yyyy hh:nn:ss") & "#")))<=20))
        ORDER BY qryParticipantsFirstVisitChooseDateChooseSite.unitID, Int([FirstVisitDate]);

        Thanks,
        Erik

        • #1103163

          Even though date formats aren’t in the dropdown list, you can type one yourself, for example

          Short Date

          or

          mm/dd/yyyy

    Viewing 1 reply thread
    Reply To: First 20 records with a field unique/distinct (200

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: