• Access – Crosstab Performance

    • This topic has 9 replies, 5 voices, and was last updated 11 years ago.
    Author
    Topic
    #494088

    Hi,
    I have finally been converted to Access as the data set I’m using grew to a ridiculous size. I am analysing one years’ worth of data and the database itself has grown to be quite large at just over 1GB with a few million rows number of rows. This looks at about 700 stations and for each stations there are many 10,000’s of rows of data, hence the need to switch to a database.

    I then also have an excel spreadsheet that contains some reference data for each station that we intend to update over time, hence why I wanted to keep it separate.

    My question is regarding performance of the database in general. I am running cross tab queries that can take in excess of 20+minutes to run. If this is normal for a data set of this size then I will have to live with it but if not I would appreciate any suggestions on how to speed things up.

    I created a query that combines the two data sets, the raw data and the station key using a relationship, the code of which I have shown below, and this runs relatively quickly (<5 seconds) which was quite encouraging.
    Linked Data Query:

    Code:
    SELECT [2013 Data Private].Hour, [2013 Data Private].Dte, [2013 Data Private].Unidad, [2013 Data Private].[Type Offer], [2013 Data Private].Volume, [2013 Data Private].Price, [2013 Data Private].OM, [Linked MIBEL Codes].Type
    FROM [Linked MIBEL Codes] INNER JOIN [2013 Data Private] ON [Linked MIBEL Codes].[UP Code] = [2013 Data Private].Unidad
    WHERE ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].OM)="O"));
    

    I then built another query off the first (this is the start of many layers) to be able to quickly filter the data under certain criteria, one query being built to filter price in a range of a min, mid and max scenario. Code again shown below for the min filter (one of three) and whilst performance was worse (<10 seconds) I was still happy with this. Admittedly the code does not look good from below but the purpose of the query is to filter by the 14 station types, each with a different price. As I said I think this part works fine in most but for reference purposes

    Code:
    SELECT [2013 Data Private].Hour, [2013 Data Private].Dte, [2013 Data Private].Unidad, [2013 Data Private].[Type Offer], [2013 Data Private].Volume, [2013 Data Private].Price, [2013 Data Private].OM, [Linked MIBEL Codes].Type
    FROM [Linked MIBEL Codes] INNER JOIN [2013 Data Private] ON [Linked MIBEL Codes].[UP Code] = [2013 Data Private].Unidad
    WHERE ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<15) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="COAL")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<30) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="GAS")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<50) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="OIL")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="NUKE")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<30) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="Thermal")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<15) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="Dom Coal")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="Geothermal")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="Pumping Stor.")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="Hydro")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<15) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="N/a")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="Solar PV")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="Solar Ther.")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="Wind")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<15) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="UNKNOWN"));
    

    However the problem comes when I begin to build cross tab queries off this final query which contains the filtered results. The first is a simple average of prices across months however this now takes 5 minutes to run on its own for me to be able to get a table that can be exported to excel. In addition I would love for the cross tab query to show a weighted average of price instead of the simple average it does now.
    Average Price Crosstab Query:

    Code:
    TRANSFORM Avg([Price Development Volume Query Min].Price) AS AvgOfPrice
    SELECT [Price Development Volume Query Min].Type, [Price Development Volume Query Min].Unidad, Avg([Price Development Volume Query Min].Price) AS [Total Of Price]
    FROM [Price Development Volume Query Min]
    GROUP BY [Price Development Volume Query Min].Type, [Price Development Volume Query Min].Unidad
    PIVOT Format([Dte],"mmm") In ("Jan","Feb","Mrz","Apr","Mai","Jun","Jul","Aug","Sep","Okt","Nov","Dez");
    

    The final cross tab query is supposed to simply sum up the total volume offered in each hour, for each day and for each station. Whilst I fully appreciate this gives a v.large data set the query takes in excess of 20 minutes to run and often causes access to crash.
    Volume Crosstab Query:

    Code:
    TRANSFORM Sum([Price Development Volume Query Min].Volume) AS [Volume]
    SELECT [Price Development Volume Query Min].Type, [Price Development Volume Query Min].Unidad, [Price Development Volume Query Min].Date
    FROM [Price Development Volume Query Min]
    GROUP BY [Price Development Volume Query Min].Type, [Price Development Volume Query Min].Unidad, [Price Development Volume Query Min].Date
    PIVOT [Price Development Volume Query Min].Hour;
    

    Any feedback on how to improve performance, if only slightly, would be greatly appreciated. Am fully expecting that I have built in a lot of excess that has potential to be stripped out

    Viewing 4 reply threads
    Author
    Replies
    • #1447121

      Firstly, is the Excel sheet used in any of these queries, that will sow it down.
      Second, do you have appropriate indexes set up to minimise access time.
      Thirdly, the 3rd query I would output to a temporary table and use this in the 4th and 5th queries.
      I hope this helps, Let us know how you go.

      • #1447130

        Hi Patt,

        Firstly thanks for your tips really appreciated. In response to each of your points

        1) The excel sheet is used in the queries. I’d be happy to put this into an access table but is there a way to get this to updated automatically from the excel sheet. This doesnt need to be updated every minute but other ppl will be using this sheet and would liek to preveny changes not being properly represented

        2) I dont have any indexes set up. Would you recomend doing so, maybe for the station codes themselves

        3) Will give this a try

        Thanks and will let you know how all these help

    • #1447131

      Indexes are most important, anywhere you use the WHERE clause these fields should be indexed, also join fields (I believe) need indexing.

      It’s easy enough to import Excel sheets into an Access table using DoCmd.TransferSpreadsheet.

      • #1447133

        Hi Patt,

        Firstly Ive amended the below to create table like you suggested, it certainly does speed up running the subsequent querys having them based off it. Two questions I have, Ive just been using the query design so far but have read about “Temporary Tables”. In the below code could I make a simple change so that this table is temporary and deletes at the end of the session. And then further to your last on indexing in my below code should I try to index “Type Offer” “Price” e.t.c Just concerned quite a large table so may take a very long time to index?

        Code:
        SELECT [2013 Data Private].Unidad, [2013 Data Private].Hour, [2013 Data Private].Dte, [2013 Data Private].Region, [2013 Data Private].[Type Offer], [2013 Data Private].Volume, [2013 Data Private].Price, [2013 Data Private].OM, [MIBEL Codes All].Type INTO [Temp Min Table]
        FROM [MIBEL Codes All] INNER JOIN [2013 Data Private] ON [MIBEL Codes All].[UP Code] = [2013 Data Private].Unidad
        WHERE ((([2013 Data Private].[Type Offer])=”V”) AND (([2013 Data Private].Price)<15) AND (([2013 Data Private].OM)="O") AND (([MIBEL Codes All].Type)="Coal")) OR ((([2013 Data Private].[Type Offer])="V") AND (([2013 Data Private].Price)<30) AND (([2013 Data Private].OM)="O") AND (([MIBEL Codes All].Type)="Gas")) OR ((([2013 Data Private].[Type Offer])="V") AND (([2013 Data Private].Price)<50) AND (([2013 Data Private].OM)="O") AND (([MIBEL Codes All].Type)="Oil")) OR ((([2013 Data Private].[Type Offer])="V") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([MIBEL Codes All].Type)="Nuke")) OR ((([2013 Data Private].[Type Offer])="V") AND (([2013 Data Private].Price)<30) AND (([2013 Data Private].OM)="O") AND (([MIBEL Codes All].Type)="Thermal")) OR ((([2013 Data Private].[Type Offer])="V") AND (([2013 Data Private].Price)<15) AND (([2013 Data Private].OM)="O") AND (([MIBEL Codes All].Type)="Dom Coal")) OR ((([2013 Data Private].[Type Offer])="V") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([MIBEL Codes All].Type)="Geothermal")) OR ((([2013 Data Private].[Type Offer])="V") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([MIBEL Codes All].Type)="Pumping Stor.")) OR ((([2013 Data Private].[Type Offer])="V") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([MIBEL Codes All].Type)="Hydro")) OR ((([2013 Data Private].[Type Offer])="V") AND (([2013 Data Private].Price)<15) AND (([2013 Data Private].OM)="O") AND (([MIBEL Codes All].Type)="N/a")) OR ((([2013 Data Private].[Type Offer])="V") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([MIBEL Codes All].Type)="Solar PV")) OR ((([2013 Data Private].[Type Offer])="V") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([MIBEL Codes All].Type)="Solar Ther.")) OR ((([2013 Data Private].[Type Offer])="V") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([MIBEL Codes All].Type)="Wind")) OR ((([2013 Data Private].[Type Offer])="V") AND (([2013 Data Private].Price)<15) AND (([2013 Data Private].OM)="O") AND (([MIBEL Codes All].Type)="Unknown"));
        
        • #1447484

          Hi AccessNovice,

          Do not delete the temporary table at the end of the session. Reason? Any indexes you’ve created go away along with the table. You will have to recreate the indexes each time you recreate the table. I would create a permanent “temporary table” that has indexes defined. Then, each time you need to use the table, empty it first (DELETE * FROM [MyTempTable]), and refill the table before processing. You can refill it with an append query rather than a make-table query.

          So, in the end, you run two queries to fill your indexed table, the first one deletes everything, the second one appends the records. You can automate this with a macro that runs the two queries in sequence. You could even include one or more crosstab queries in the macro. That way it’s only a single click to load the table and run one or more crosstabs.

          Good luck to you.

          • #1447615

            Hi AccessNovice,

            Do not delete the temporary table at the end of the session. Reason? Any indexes you’ve created go away along with the table. You will have to recreate the indexes each time you recreate the table. I would create a permanent “temporary table” that has indexes defined. Then, each time you need to use the table, empty it first (DELETE * FROM [MyTempTable]), and refill the table before processing. You can refill it with an append query rather than a make-table query.

            So, in the end, you run two queries to fill your indexed table, the first one deletes everything, the second one appends the records. You can automate this with a macro that runs the two queries in sequence. You could even include one or more crosstab queries in the macro. That way it’s only a single click to load the table and run one or more crosstabs.

            Good luck to you.

            I wouldn’t disagree with this per se, but remember if you use this technique to compact the database regularly – Access will not reclaim the space that was used even if you delete the data from the “temporary” tables 🙂

    • #1447139

      On table [2013 Data Private] I wouldn’t index on price, I would index on Type Offer, Type, OM, that should speed up that query substantially to create the temporary table [Temp Min Table].

      Then base your queries on that temp table where you should index on those fields as well as price on a reduced number of fields.

    • #1447597

      What James says is right about “temporary tables”, even though I mentioned them I have never used them, I always create my own tables to be used for these purposes.

      The last time I used these tables I had to import an excel file into a table after deleting the table first, then used ALTER Table in SQL to add fields to that table, you can also add indexes via that method too.

    • #1451016

      I am not that familiar with Access, but in the real database world, you are usually better of truncating a table that deleting it and starting over, because you do not lose your indexes when you truncate the table.
      I also agree that the use of temporary tables should be relegated to data loads.

      HTH
      Peter..

    Viewing 4 reply threads
    Reply To: Access – Crosstab Performance

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: