• Finding duplicates, and removing duplicates

    Home » Forums » AskWoody support » Productivity software by function » MS Access and database help » Finding duplicates, and removing duplicates

    Author
    Topic
    #504159

    This is a 2 part problem. I have a sorted query – [Most_FT_3g_v6_Query] that shows the most free throws made for 3 consecutive games. My listing shows many of the same player (i.e. uses 2 of the games that the leader used.) I figured that if I asked for duplicates (players listed for the same season) I would then have a better listing of top 3 game free throws made. But when I make my list of duplicates, it includes the top records. I would expect that the duplicate list would NOT include the top listing, but would include those with the same Roster_ID as shown above. Something tells me that this will involve indexes. I’m not sure how to ‘create’ indexes.43373-Creating-duplicates
    My second problem is when the duplicate list is appropriate, how do I remove those records from the ‘original’ query?

    Viewing 13 reply threads
    Author
    Replies
    • #1548545

      Hi jlwood44,
      I’m not sure why you mention “index,” the purpose of which is to allow the database management system to more quickly fetch database rows that match the query criteria. A database would have to be quite large before the overhead of creating an index would be outweighed by the benefit.

      Also, perhaps you could more clearly explain what you consider “duplicate,” that is, what field(s) you want to essentially be the unique identifier. Would it be player and year, so that there would be only one result for , namely, the result where free throws made over the three games was highest for that combination of player and year? If there were several rows for a given player and year that had the same three-game sum (as for ), which one would you want selected?

      It might help also if you provided the query statement(s) you’re using.

      Hope this helps,
      Dave

      • #1548563

        Hi jlwood44,
        I’m not sure why you mention “index,” the purpose of which is to allow the database management system to more quickly fetch database rows that match the query criteria. A database would have to be quite large before the overhead of creating an index would be outweighed by the benefit.

        Also, perhaps you could more clearly explain what you consider “duplicate,” that is, what field(s) you want to essentially be the unique identifier. Would it be player and year, so that there would be only one result for , namely, the result where free throws made over the three games was highest for that combination of player and year? If there were several rows for a given player and year that had the same three-game sum (as for ), which one would you want selected?

        It might help also if you provided the query statement(s) you’re using.

        Hope this helps,
        Dave

        I agree with your assessment that I find all those with the same name and season, and have only 1 entry, and the others are duplicates. I used the duplicate query and used Roster_ID to find the results, BUT I expected that the ‘original’ would be the top one in the sorted query. When I ran the duplicates, as you can see the duplicates include the top records.
        43378-SQL-for-the-duplicates
        It appears that the sorting happens AFTER the duplicates are determined. If the sorting is done before the duplicates are determined, then the appropriate listing would be created.

    • #1548577

      I think that I misunderstood what the duplicate query would show. I was thinking that it would show the 2nd, 3rd, etc. occurrence of a record, when if there are duplicates, it shows the 1st, 2nd, 3rd occurrence of the record. Then the duplicate query was ‘correct’ as it ran. BUT, it still brings up my other question, how do I (really ACCESS) remove the records after the ‘original’ record.
      I think back to a year ago when I ran a duplicate query for a table to find if I had entered results more than once, and it showed both occurrences. Because this was records in a table, and it was only 1 or 2 duplicates, I manually deleted the duplicates.
      In this case, the ‘original’ info is from a query search. I do NOT want to remove the inputted data, but ‘restrict’ the results from the search to only show the ‘first’ occurrence. And more data is constantly being added – Vironnica just had a game where she made 14 free throws – I’m not sure what her consecutive game results were.

    • #1548578

      I think you have a bigger problem. How can you be having multiple records where many of columns are different between these multiple records, and yet you still call them duplicates? I think your problem is the actual query that is creating these records. Shouldn’t your query be creating just one record per player per season?

    • #1548584

      Hi jlwood44,
      I agree with Rui that “find duplicates” is apparently being used here in a way for which it wasn’t really intended.

      Also, the tables might have been designed to be more robust. Suppose a player had a given three-game FTM sequence occur more than once in a season, such as 12 in game 3, 9 in game 4, and 10 in game 5 and then 12 in game 17, 9 in game 18, and 10 in game 19. This could be accommodated if an additional field that I’ll call g1_number were in the table that contains the FTM triplets. This would allow and to coexist.

      These entries wouldn’t be “duplicates”; they would be in conformance with the table layout. This representation would show where a player achieved a given three-day total more than once in a season, but you could still obtain one query row by including the appropriate criteria in the select statement, and your select statement could specify that the results were to be sorted descending by year.

      Perhaps if you posted the present tables and explained what the FTM data are used for, we would be able to offer more specific suggestions.

      Dave

    • #1548585

      Do you want a single record per player? Or do you want the overall top 3 consecutive games with most free throws made?

      In any case, you can take the query you have now and use it as the entry table for a new query. Then you apply a condition that selects what you want: a TOP 3 SELECT clause, for the latter, or a use a correlated subquery in the WHERE clause or something close to either or both, depending on what you want.

    • #1548643

      As you indicated, I want 1 entry per player. It seems that I would use a WHERE statement. How will the WHERE statement compare to the WHERE statement that was used for the Duplicates Query? The SQL statements are included in an earlier comment (#3).

    • #1548658

      The SQL is not usable, as it is an image. If you post the SQL for the original query, I will write you a correlated subquery to select the record with the higher Total for 3 games per player.

    • #1548740

      SQL for ‘original’ Query
      SELECT Most_FT_3g_v2_3_Query.Player_ID, Most_FT_3g_v2_3_Query.Roster_ID, Most_FT_3g_v2_3_Query.Fall_Year, Most_FT_3g_v2_3_Query.Full_Name, Most_FT_3g_v2_3_Query.g1FTM, Most_FT_3g_v2_3_Query.g2FTM, Most_FT_3g_v2_3_Query.g3FTM, Most_FT_3g_v2_3_Query.Tot3gFTM, Most_FT_3g_v2_3_Query.Tot3gFTA
      FROM Most_FT_3g_v2_3_Query
      ORDER BY Most_FT_3g_v2_3_Query.Tot3gFTM DESC , Most_FT_3g_v2_3_Query.Game_ID DESC;

      SQL for duplicates Query
      SELECT Most_FT_3g_v2_6_Query.Roster_ID, Most_FT_3g_v2_6_Query.Full_Name, Most_FT_3g_v2_6_Query.Player_ID, Most_FT_3g_v2_6_Query.Fall_Year, Most_FT_3g_v2_6_Query.g1FTM, Most_FT_3g_v2_6_Query.g2FTM, Most_FT_3g_v2_6_Query.g3FTM, Most_FT_3g_v2_6_Query.Tot3gFTM, Most_FT_3g_v2_6_Query.Tot3gFTA
      FROM Most_FT_3g_v2_6_Query
      WHERE (((Most_FT_3g_v2_6_Query.Roster_ID) In (SELECT [Roster_ID] FROM [Most_FT_3g_v2_6_Query] As Tmp GROUP BY [Roster_ID] HAVING Count(*)>1 )))
      ORDER BY Most_FT_3g_v2_6_Query.Tot3gFTM DESC;

    • #1548748

      Replace your original query by a query based on this SQL statement. Should be enough for what you want.

      Code:
      SELECT M.Player_ID, M.Roster_ID, M.Fall_Year, M.Full_Name, M.g1FTM, M.g2FTM, M.g3FTM, M.Tot3gFTM, M.Tot3gFTA
      FROM Most_FT_3g_v2_3_Query As M
      WHERE M.Tot3gFTM = (SELECT MAX(T.Tot3gFTM) FROM Most_FT_3g_v2_3_Query As T WHERE [COLOR="#FF0000"]T.Player_ID=M.Player_ID[/COLOR])
      

      I have a doubt, though. It is possible that the condition in red has to be replaced by T.Roster_ID=M.Roster_ID

      This latter version would probably give you multiple seasons for the same player, if the Most_FT_3g_v2_3_Query query gives you values for multiple seasons. If not, the replacement is not needed, even if it can still be performed.

    • #1548922

      43394-Run-of-Query-without-duplicates
      That kind of works. And, I changed to Roster_ID so that a player might be listed for different years. BUT that seems to want to find matching results for a player. Vironnica Drake is listed with 2 times of 31 makes, but she has one of 36, and maybe a couple others bigger than the 31. Katrinna Blackmon had a 29 and a couple others in the 20’s. According to your query run the 2nd highest total is 16.

    • #1548932

      Well, if you want it ordered on M.Tot3gFTA, the subquery needs to get the max of T.Tot3gFTA and not T.Tot3gFTM.

      Code:
      SELECT M.Player_ID, M.Roster_ID, M.Fall_Year, M.Full_Name, M.g1FTM, M.g2FTM, M.g3FTM, M.Tot3gFTM, M.Tot3gFTA
      FROM Most_FT_3g_v2_3_Query As M
      WHERE M.Tot3gFTM = (SELECT MAX(T.Tot3gFTA) FROM Most_FT_3g_v2_3_Query As T WHERE T.Roster_ID=M.Roster_ID)
      

      Yeah, it can show repeats for a player, if the player has multiple instances with the same Tot3gFTA total value. If you want to specify a tie breaking criterium, I may try and solve it.

    • #1548968

      You misunderstood my concern. Look at the results from #12, and compare with #1.
      I want the MAX for Tot3gFTM, but the query in #12 is NOT showing the top totals as shown in both parts of #1
      What happened to Drake making 36 free throws, Blackmon making 29, Guarneri making 26, R. Drake making 25? For some reason none of these are showing in the new query. The next to last field is what really matters.
      Although the query seems to ask for MAX, it does not seem to show the actual MAX.

      • #1548989

        You misunderstood my concern. Look at the results from #12, and compare with #1.
        I want the MAX for Tot3gFTM, but the query in #12 is NOT showing the top totals as shown in both parts of #1
        What happened to Drake making 36 free throws, Blackmon making 29, Guarneri making 26, R. Drake making 25? For some reason none of these are showing in the new query. The next to last field is what really matters.
        Although the query seems to ask for MAX, it does not seem to show the actual MAX.

        I only work with what you give me and it seems you gave me two different queries – V2_6 and V2_3. I don’t know enough about your queries to distinguish them. It seems the queries are different and give different results. The solution seems obvious to me: if you want the results from V2_6, use V2_6. and change the SQL for the query I wrote accordingly – replace all references to V2_3 with V2_6.

    • #1550093

      I have not had a chance to evaluate your answer for several days. v2_3 DOES have Tot3gFTM results greater than 31. I also was getting a parameter request for running each of the queries. I figured out how to replace the ‘alias’ request, so that the parameter does not show. Now when I ran it today it did show V Drake with 36 FTM and Blackmon with 29 FTM and several others also with 20+. I did have Julia Cardwell show 3 times for this season, but all 3 had her with 19 FTM. Not major, but is there a way to have her showing only once. Would the term UNIQUE or DISTINCT be helpful?

    • #1550101

      I would guess using SELECT DISTINCT should work.

    Viewing 13 reply threads
    Reply To: Finding duplicates, and removing duplicates

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: