Hi,
I have finally been converted to Access as the data set I’m using grew to a ridiculous size. I am analysing one years’ worth of data and the database itself has grown to be quite large at just over 1GB with a few million rows number of rows. This looks at about 700 stations and for each stations there are many 10,000’s of rows of data, hence the need to switch to a database.
I then also have an excel spreadsheet that contains some reference data for each station that we intend to update over time, hence why I wanted to keep it separate.
My question is regarding performance of the database in general. I am running cross tab queries that can take in excess of 20+minutes to run. If this is normal for a data set of this size then I will have to live with it but if not I would appreciate any suggestions on how to speed things up.
I created a query that combines the two data sets, the raw data and the station key using a relationship, the code of which I have shown below, and this runs relatively quickly (<5 seconds) which was quite encouraging.
Linked Data Query:
SELECT [2013 Data Private].Hour, [2013 Data Private].Dte, [2013 Data Private].Unidad, [2013 Data Private].[Type Offer], [2013 Data Private].Volume, [2013 Data Private].Price, [2013 Data Private].OM, [Linked MIBEL Codes].Type FROM [Linked MIBEL Codes] INNER JOIN [2013 Data Private] ON [Linked MIBEL Codes].[UP Code] = [2013 Data Private].Unidad WHERE ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].OM)="O"));
I then built another query off the first (this is the start of many layers) to be able to quickly filter the data under certain criteria, one query being built to filter price in a range of a min, mid and max scenario. Code again shown below for the min filter (one of three) and whilst performance was worse (<10 seconds) I was still happy with this. Admittedly the code does not look good from below but the purpose of the query is to filter by the 14 station types, each with a different price. As I said I think this part works fine in most but for reference purposes
SELECT [2013 Data Private].Hour, [2013 Data Private].Dte, [2013 Data Private].Unidad, [2013 Data Private].[Type Offer], [2013 Data Private].Volume, [2013 Data Private].Price, [2013 Data Private].OM, [Linked MIBEL Codes].Type FROM [Linked MIBEL Codes] INNER JOIN [2013 Data Private] ON [Linked MIBEL Codes].[UP Code] = [2013 Data Private].Unidad WHERE ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<15) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="COAL")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<30) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="GAS")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<50) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="OIL")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="NUKE")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<30) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="Thermal")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<15) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="Dom Coal")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="Geothermal")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="Pumping Stor.")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="Hydro")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<15) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="N/a")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="Solar PV")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="Solar Ther.")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<5) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="Wind")) OR ((([2013 Data Private].[Type Offer])="v") AND (([2013 Data Private].Price)<15) AND (([2013 Data Private].OM)="O") AND (([Linked MIBEL Codes].Type)="UNKNOWN"));
However the problem comes when I begin to build cross tab queries off this final query which contains the filtered results. The first is a simple average of prices across months however this now takes 5 minutes to run on its own for me to be able to get a table that can be exported to excel. In addition I would love for the cross tab query to show a weighted average of price instead of the simple average it does now.
Average Price Crosstab Query:
TRANSFORM Avg([Price Development Volume Query Min].Price) AS AvgOfPrice SELECT [Price Development Volume Query Min].Type, [Price Development Volume Query Min].Unidad, Avg([Price Development Volume Query Min].Price) AS [Total Of Price] FROM [Price Development Volume Query Min] GROUP BY [Price Development Volume Query Min].Type, [Price Development Volume Query Min].Unidad PIVOT Format([Dte],"mmm") In ("Jan","Feb","Mrz","Apr","Mai","Jun","Jul","Aug","Sep","Okt","Nov","Dez");
The final cross tab query is supposed to simply sum up the total volume offered in each hour, for each day and for each station. Whilst I fully appreciate this gives a v.large data set the query takes in excess of 20 minutes to run and often causes access to crash.
Volume Crosstab Query:
TRANSFORM Sum([Price Development Volume Query Min].Volume) AS [Volume] SELECT [Price Development Volume Query Min].Type, [Price Development Volume Query Min].Unidad, [Price Development Volume Query Min].Date FROM [Price Development Volume Query Min] GROUP BY [Price Development Volume Query Min].Type, [Price Development Volume Query Min].Unidad, [Price Development Volume Query Min].Date PIVOT [Price Development Volume Query Min].Hour;
Any feedback on how to improve performance, if only slightly, would be greatly appreciated. Am fully expecting that I have built in a lot of excess that has potential to be stripped out