Are publicly-available offshore wind farm time series any useful?

Offshore wind farms production time series: an introduction to ENTSO-E and Elexon BMRS datasets

Many are interested in knowing how much energy is being produced by offshore wind farms. While there are monthly production values publicly available for some countries (in Denmark via the Ministry of Energy and in the United States via the Energy Information Administration, for instance), and that some secondary sources of information are available for other places (for instance the OFGEM ROC register in the UK), these data remain sparse, bulky, and vastly undocumented.

This article aims presenting two datasets which have remained unexploited by the wind & site community: the ENTSOE-E transparency platform (which includes hourly power time series for parks from the UK, Belgium and Denmark), and the Elexon BMRS B1610 database. (half-hourly time series for the UK). In both databases, the data are labelled “Actual Generation Output Per Generation Unit“.

The article first presents the data and their terms of use, and then explains how to make a simple model for modelling the hourly production of a wind farm in the North Sea. Further, I present a way to compute, from the dataset, array cable- and wind turbine availability losses. At the end I present how the dataset can be used for analysing wake effects.

What are the ENTSO-E and Elexon BMRS data?

The main piece of documentation available to all is Ludovic Blunat’s recent master thesis report (June 2020), available at this link (this version of the report only includes the Data and Methods Sections, not the analysis results). Much of the work presented in this article originate either directly from the work carried out by myself and by Ludovic, and from discussions with Gregor Giebel, Niels-Erik Clausen and Andrew Henderson (respectively thesis supervisor, supervisor, and co-supervisor).

There exists only very few articles which have used these BMRS or ENTSO-E park-level data, see a list below of the studies I could find:

  • (E. Zaunseder, L. Müller and S. Blankenburg) “High Accuracy Forecasting with Limited Input Data: Using FFNNs to Predict Offshore Wind Power Generation“, 2018-12, LINK.
  • (I. Staffel, R. Green) “How does wind farm performance decline with age?“, 2014-06, LINK.
  • (Gandoin R.) “Using publicly available power time series for re-analysis of offshore wind park energy yield”. VindKraftNet meeting, 2018-04-09, LINK.

You may want to check this list of other, relevant databases for grid studies: https://open-power-system-data.org/data-sources.

For a detailed description of the data, please refer to Ludovic’s master thesis. In essence, the data include hourly (ENTSO-E) and half-hourly (BMRS) park production data, measured typically at the low-voltage side of the offshore substation(s). There can be more than one metering point for a given park. For instance: London Array (4), Beatrice (4), Gunfleet Sands (2), etc.

The data start at the end of 2014 for most parks, and are updated every day with a lag of approximately five days. The data can be backfilled, therefore it is adviseable to download the data regularly and save separate copies every time (should any comparisons between the different versions of the databases be neededed).

How to download the data?

The data can be downloaded for free from ENTSOE-E transparency platform‘s SFTP (folder: /TP_export/ActualGenerationOutputPerUnit) and Elexon BRMS B1610 dataset via Elexon’s API. You need to register , create an account, and agreee to the licensing terms (see below) to access these services.

There is at least one tool for downloading the ENTSO-E data, see this project: https://github.com/EnergieID/entsoe-py (and the function “query_generation_per_plant”) [note: I have not tested it].

What about data licensing?

The data come for free, and fall under two separate licenses.

For BMRS, see this link (archived version). In particular, the license states that [the user is] “free to: copy, publish, distribute and transmit the BMRS Data“. I have furthermore exchanged emails with Elexon, who have confirmed that the data can be shared. I have then saved my own copy of the data, which I am happy to share at this LINK.

For ENTSO-E, the situation is different :/

  • The EU Commission Regulation (EU) N°543/2013, in its Article 16 “Actual Generation”, demand that “actual generation output (MW) per market time unit and per generation unit of 100 MW or more installed generation capacity […] shall be published five days after the operational period”. Furthermore it states that “Generation units and production units respectively shall be considered as primary owners of the relevant information they provide”.
  • The Article 4 of the same EU regulation states in its paragraph 5 that “without prejudice to the obligations of the TSOs and of the ENTSO for Electricity laid down in paragraph 1 and Article 3, data can also be published on TSOs’ or other parties’ websites“.

From these, there seem to be no indications that the data cannot be reshared. Yet:

  • The ENTSO-E Terms and Conditions (archived version) for the use of the Transparency Platform states in its Section 5 that:
    • In accordance with the applicable legislation, the Data User shall, when using of the Transparency Platform Data for any purpose whatsoever: […] not cause prejudice to the copyright or related right on a Transparency Platform Data, which may be owned by the concerned Primary Owner of Data.
    • In case of a risk to cause prejudice to said right, the Data User shall seek the prior agreement of the holder of the copyright or related right. Notwithstanding this requirement, as a facilitation for the Data User, ENTSO-E publishes on the Transparency Platform and regularly updates the list of the Transparency Platform Data which can be freely re-used with no need to seek for the prior agreement of the respective Primary Owner of Data.
    • The Data User has responsibility to check this list before each re-use of the Transparency Platform Data“.
  • The list of data available for re-use (archived version) does not include the items listed in Article 16 of the EU regulation.

I have contacted ENTSO-E about this, asking for clarification on how to reach out to the primary contact owners. From their answer, my understanding is that the data are available but cannot be re-shared. Furthermore, in order to fullfil the Terms and Conditions clause on prejudice to the copyright or related right, I will only show anonymised results, both for ENTSO-E and BMRS; that is that the wind farm names- and owners will not be provided (and that all results will be normalised to rated power).

How do the data look like?

The Figure below shows an example of time series from the BRMS dataset for a wind farm in British waters. As explained earlier, the dataset consists in power time series at the four metering locations for this wind farm, labelled BMUs (for Balancing Mechanism Units). The sum of the four time series is shown in black, and is the one (same type, not same wind farm) which has been used below for comparison with the OFGEM data.

This Figure shows an example of power time series (half-hourly timestamps) from the BMRS database.

Data quality and validity

Comparison between the ENTSO-E/BMRS power time series and the publically-available monthly production data from ENS/OFGEM shows a good agreement, see the two figures below:

  • For ENTSO-E: the comparison with the ENS data shows good agreement for most wind farms but not all, see the example of Nysted below. This may be due to the location of the metering point.
  • For BMRS: the comparison with the OFGEM data shows in general good agreement, yet with substancial differences for some months.
This Figure shows comparisons between the monthly energy production values from the ENTSO-E dataset and the ENS wind turbine registry dataset, for several Danish wind farms. For all but Nysted, the data show very good agreement.
This Figure shows comparisons between the monthly energy production values from the BMRS B1610 dataset and the OFGEM RO and REGO renewable certificates datasets, for several British wind farms; the data show very good agreement.

Unfortunately, I could not find any (any!) publically-available studies with 10-minute (or hourly) SCADA time series of offshore wind farms for the time period covered by the BMRS and ENTSO-E datasets (from 2015 onwards). There are publications available, showing Horns Reef 1 data for instance, but these are older and/or not showing timestamps and/or are for other wind farms and/or only showing WTG-specific time series:

Therefore, I could not make a formal validation of the time series against SCADA data. Any user which has access to such dataset should feel free to do such analyses, and any feedback would be welcome ^^.

Worked example – a wind farm in the North Sea

For this example, I have used the ERA5 100m wind time series (speed and direction) which I have scaled to the expected long-term hub height wind speed for this site (my guess, partly based on experience but also publicly-available- and pretty accurate maps from projects like this one from the MetOffice). I have also used the air (2mMSL) and sea surface water temperature time series.

I have then computed two wind farm power curves (these are two-dimensional look up tables providing power as a function of wind speed and wind direction), one for unstable atmospheric conditions and one for stable conditions, using the EMD WindPRO software.

A first, simple model

A comparison of two time series is shown below. The first Figure shows the two time series (as well as the wind speed and temperature difference time series) for the month of January 2019; clearly the model time series replicate well the synoptic (daily, weekly) patterns but misses some mesoscale events. Also, because it only include wake losses, the model time series overpredicts the power.

This Figure shows, at the top, metered (blue) and model (red) time series. The model time series only include wake losses and thereby overestimate the power. The two other subplots show respectively hub height wind speed and the absolute difference between air- and water temperatures (a crude measure of the atmospheric stability).

The second Figure shows histograms of the entire, concurrent time series, (excluding some short periods with outages). Several differences catch the eye; compared with the model data, the metered data:

  • have a larger number of time stamps with very small power values;
  • show a skewed distribution of large near-rated power values.
This Figure shows histograms of the two conccurent time series (metered and model), for the entire period.

A simple model with electrical- and WTG availability losses

Let us focus on the largest power values. When using smaller bins, the histogram reveals several interesting features, see the Figure below.

As explained in the text, this Figure shows only the distribution of the largest power values, and with a smaller bin width that in the preceeding Figure. From it, one can infer the maximum wind farm power (i.e. the nominal power minus the array cables electrical losses). One can also infer that the distinct groups of values correspond to 1,2,… WTG being down.

First, the largest peak to the right likely correspond to the mean maximum power, i.e. the sum of all WTG power minus the electrical losses due to array cables; in this case this loss can be estimated to 1.6% (a typical value for an offshore project). A new model time series can then be created by applying this loss factor.

Also, the histogram consists of disctints groups which mode (the peaks marked in red) correspond to the total park power when {1,2,..} wind turbines are off (the dashed lines). Assuming that the number of WTG off, for a given timestamp, is governed by a binomial distribution, one can then try to guess what the mean turbine availability is. See the two plots below.

This Figure shows the result of the analyses which are carried out for determining the mean wind turbine availability, see text.

The first method (first plot) consists in creating a time series of random WTG availablity (binomial), for a number of mean WTG availability (the x -axis spans from 0.94 to 1.00). Then, each of these time series are multiplied by the multiplied power time series (the one with electrical losses) and then the correlation coefficient of the quantile-quantile (Q-Q) regression between the resulting model time series and the metered time series is calculated. This correlation coefficient is then plotted on the y-axis. The mean WTG availability that corresponds best to the dataset is derived by taking the availability value which correspond to the largest correlation (the red dot).

The second method consists start with computing a histogram showing the number of WTG down. It is derived by dividing the power time series by the individual turbine nominal power (say the wind farm is 100 MW and each turbine is 5MW, if the total power is 90MW then this means that 2 WTGs are down). When doing so, one assumes that the electrical losses are the same in relative tersm, regardless of how many turbines are down). A binomial distribution is then fitted to the histogram, and the mean availability is computed.

Here the two methods agree well (2.91 and 3.07 WTGs down in average), but this may not be the case for all wind farms, because of curtailment and/or power boost, which also forms clumps of values in the power histogram.

As illustrated in this histograms below, the resulting time series, which incorportate both the electrical losses and some random availability loss, matches better the metered distribution. There are still larger occurences of small power values in the metered dataset compared with the model one. Lookin at the absolute difference between the red and the blue histograms (modelled minus metered), these timestamps seem to correspond to outages or episodes where all the turbines are down (the sum of the timestamps in the 13 last grey bins is equal to 71% of the number of timestamps of the first bin).

This figure illustrates that the proposed method accounts fairly reasonably for electrical losses and turbine availability.

Park power curve, a high-level analysis

In the two figures below, I am plotting:

  • with blue dots: the metered power against the hub height wind speed,
  • with red markers: the median binned values from these blue dots
  • with black lines:
    • full line: the mean value of the modelled power time series;
    • dashed lines: the 10- and 90-percent quantiles of the modeled power time series.

The results are shown for two stability classes: unstable conditions (z/L<-0.1) and stable conditions (z/L>0.1), where z = 10 mMSL and L is the Monin-Obukhov length computed from the bulk Richardson number using the Grachev and Fairall method explicated in this study.

Differences between measured and modelled power are here due to errors either the wind speed- or the modelled park power curve. Overall, there is a better match between model and metered data in unstable conditions – this is a reccurent observation across the literature.

See text.
See text.

Wrap-up

We went through the description of the BMRS B1610 and the ENTSO-E datasets which include offshore wind farm time series. After having reviewed the license and terms of use, we have looked at how the time series compare with other, publically available sources (monthly production).

Then, we move on and tried to create a model time series for a given wind farm in the North Sea, accounting not only for wakes but also electrical losses and wind turbine availability.

Overall the model matches well the metered data, and this opens up for more advanced studies, including re-analysis of yield, wake modelling, grid studies.

Thanks for reading, and feel free to send me your feedback.

All the best, Rémi.

2 Comments

  1. Hi Remi

    I hope in your calculation you have extrapolated (horizontally) using Windpro model. what is your recommendation to capture the mesoscale effect when there is a limitation for the WRF model? Can we use reanalysis (downscaled) data to extrapolate across the wind farm?

    1. Hello there! It is a difficult question 🙂 I guess it depends on the site, but typically you can get improve mesoscale model outputs by using a microscale model (linearised, like WAsP, or CFD) to capture small scale orography effects; see for instance https://wes.copernicus.org/articles/5/997/2020/. But that is more for onshore. Here offshore, since we use 30-minute or 1-hour timeseries, we’re filtering out microscale eddies anyways, so I think the mesoscale is appropriate. Then, engineering judgement and experience play also a big role in such complex situations where the risk of bias is large – thereby some of this risk should be addressed in the uncertainty analysis. Rémi

Comments are closed.