1.

## Introduction

Northern marginal seas experience many seasonal variations, which can be expected to modify their circulation field significantly, as many forcing factors vary from season to season. For example, in the Baltic Sea wind forcing is stronger in autumn and winter than in summer and spring. Seasonal ice cover modifies the response to wind forcing in the winter. In the spring, melting sea ice affects surface salinity. River runoffs also increase with melting waters. Precipitation is lowest in the spring and then increases towards autumn. In the summer, increasing temperatures lead to the formation of thermal stratification, which then collapses in the autumn. The interplay of these factors results in seasonal circulation changes that are not always intuitive.

The Gulf of Finland (GoF) is the easternmost sub-basin of the Baltic Sea (Fig. 1). It is a direct continuation of the Baltic Proper with no separating sills between the two. It is estuary-like, with the Neva River — the largest single freshwater input to the Baltic Sea — at the eastern end, and a more saline deep water wedge extending from the Baltic Proper at the western end. These opposite inputs maintain a permanent horizontal density gradient. The GoF is an elongated basin approximately 400 km long. At the eastern end of the basin, there is the shallow and narrow Neva estuary and its transition zone. Next, there is a wider part of the basin between 26°E and 28°E (maximum width c. 135 km) and then a narrower part between 23°E and 26°E (minimum width 48 km). For in-depth descriptions of the physical oceanography of the GoF, see for example Alenius et al. (1998), Soomere et al. (2008), Soomere et al. (2009), Leppäranta and Myrberg (2009), and Myrberg and Soomere (2013).

Fig. 1.

The model domain and bathymetry (in metres). CTD stations referenced in the article (from the east): Haapasaari (orange circle), Länsi-Tonttu (green triangle) and Längden (red square). Kalbådagrund weather station is marked with a magenta star. Station 18A used in analysis is marked with a blue X. Red lines show the location of the transects used in the analysis. Solid, darker lines indicate transects that were used for plots in this article. The map also indicates the approximate location of the Neva river mouth at the eastern end of the basin, along with Kymi, Luga and Narva rivers. Furthermore, approximate locations of the cities of Helsinki (Hel) and Tallinn (Tal) are shown. The inset is the location of the model domain on a map of the Baltic Sea.

Perhaps the simplest way to consider the circulation patterns of the GoF is to begin from standard estuarine circulation, which is established by freshwater forcing and density-driven currents (for a general discussion of estuaries, see e.g. Talley et al., 2011). In the case of GoF this would mean that the fresh river waters from the head of the estuary in the east flow on the surface outwards, towards the mouth of the estuary in the west. A compensating flow of saltier, denser water is transported deeper in the water column to the opposite direction. (See Hela (1952) for an early English-language description of this for the GoF.)

The estuarine circulation is, however, heavily modified by factors such as wind forcing, topography and geostrophic effects. The current direction can change quickly for several reasons. On a timescale of hours to days, periodic processes such as inertial oscillations and seiches can be important. On a timescale of days to weeks, variable winds are the main driver of currents in the GoF. Sometimes surface currents are almost unidirectional nearly everywhere in the whole gulf, if the wind forcing is uniform enough. At other times, there is more spatial variability. Overall, on any given day, the circulation patterns in the GoF can be expected to be very complicated and variable.

The dominating wind direction in the GoF is southwesterly. If there are strong enough winds from this direction, the transport of water near the surface is roughly towards south-east or east, as per the Ekman motion theory (e.g. Cushman-Roisin and Beckers, 2011). This drives water towards the head of the estuary and can lead to full reversal of the normal estuarine circulation, with outward compensating flows deeper in the water column (e.g. Elken et al., 2003; Liblik et al., 2013). This reversal can then lead to a collapse of water column stratification in winter (Elken et al., 2014).

Circulation patterns look different at different timescales. Averaging currents over longer time periods smoothens the signal and damps features visible in the shorter term. In a large estuary like the GoF, where rotational effects are also important, a cyclonic circulation pattern emerges when the currents are averaged over a sufficiently long period of time. Witting (1912) and Palmén (1930) were the first to note this in the GoF, based on light vessel observations. However, these early estimates were based on only a few data points. Nevertheless, later works by for example Hela (1952), Mikhailov and Chernyshova (1997), Andrejev et al. (2004), Maljutenko et al. (2010), Elken et al. (2011), Soomere et al. (2011), Lagemaa (2012), and Westerlund et al. (2018) have mostly confirmed the main outcomes of these early studies. There is some variation in the results of these studies, mostly explained by inter-annual variability and differences in methodology. In general, long-term average currents are on the order of 1–2 cm/s and have low stability.

As there is seasonal variation in the forcing, so there is seasonal variation in the circulation patterns. Already in the twentieth century, it was known that the stability of currents during seasons was higher than annually. Still, seasonal circulation variations in the GoF have been studied relatively little. Witting (1912) published seasonal circulation maps for the GoF (reproduced by Alenius et al., 1998). Later, Hela (1952) computed monthly currents from lightship observations for two stations between Tallinn and Helsinki. More recently, Soomere et al. (2011) divided the year into four periods (calm and windy periods, and transitional periods between them) for their analysis of current-induced surface transport in the GoF.

With changes in climate, changes to seasonal variation of forcing are also expected (BACC II Author Team, 2015). Investigation of seasonal circulation patterns can deepen our understanding of the dynamics of the basin and how circulation conditions in the GoF will change in the coming decades. Furthermore, as the marine traffic in the gulf is very intensive and the shores of the gulf are heavily populated, a better understanding of circulation dynamics is needed for applications such as estimating the transport of oil, chemicals and nutrients.

In recent years, fully automated measurement platforms, satellites and high-resolution modelling have transformed oceanography. Traditionally, oceanographers had to routinely draw their conclusions from a limited number of observations. While this is still commonly the case for many types of observations, at the same time the amount of data that needs to be processed and analysed is growing rapidly. It is not often possible for a scientist to manually investigate all the data. In these cases, new methods such as machine learning can assist. Machine learning algorithms can be used as exploratory tools to find structure in the data. This data can be both observational and modelling datasets. The use of machine learning methods began in the environmental sciences in the 1990s, and they are nowadays used extensively (for an overview, see e.g. Hsieh, 2009). An important example of a machine learning method in the field of oceanography is the self-organising map (SOM), which is a neural-network-based method that can be used for feature extraction (Kohonen, 1982, 2001; Liu and Weisberg, 2011; Thomson and Emery, 2014).

In this study, we examine circulation patterns in the GoF on a timescale of days to years. The aim is to identify frequent patterns in a hydrodynamic model simulation along with factors affecting their emergence. We analyse connections between different timescales and investigate relations between model forcing and circulation patterns. The data from the high-resolution GoF configuration of the numerical 3D model NEMO (Madec and the NEMO team, 2008) are analysed for the years 2007–2013. Circulation conditions are studied more in-depth for chosen north-south sections with the SOM analysis.

2.

## Materials and methods

2.1.

### Modelling

We used the NEMO 3D ocean model (V3.6) for the GoF with a 0.25 NM (nautical mile), or roughly 500 m, horizontal resolution. This setup is similar to the one presented by Westerlund et al. (2018), which was originally based on the setup by Vankevich et al. (2016). The main differences in the configuration in the present study are in the atmospheric forcing, boundary conditions, and the bathymetry. We ran the model from the beginning of 2006 to the end of 2013. We considered the results from 2006 as the initialisation of the model and chose the years 2007–2013 for a closer analysis. The daily mean values of temperature, salinity and current fields were saved for the analysis.

The horizontal resolution of the configuration (500 m) is well below the typical range of the internal Rossby radius (2–4 km) in the GoF (Alenius et al., 2003). The model domain covers the GoF east from the Vormsi–Kimitoön line roughly at 23°E, where an open boundary towards the Baltic Proper is located (see Fig. 1). Model bathymetry is based on the data from the VELMU (Finnish Inventory Programme for the Marine Environment) depth model from SYKE (Finnish Environment Institute) and the Baltic Sea Bathymetry Database (Baltic Sea Hydrographic Commission). This setup has 94 z-coordinate (with partial step) vertical layers. The topmost vertical layers are 1 m thick, and the layer thickness slightly increases with depth, being about 1.08 m at the lower bound of the z-axis. The time step of the model is 100 s. The ice model LIM3 was included in the setup (Vancoppenolle et al., 2009). Due to the high computational requirements of the configuration, the ice model was only run with a thermodynamic formulation. The lateral boundary condition on the open boundary was taken from the NEMO Nordic 2 NM configuration for the Baltic Sea and North Sea (Hordoir et al., 2013, 2015, 2019). We used initial conditions for the beginning of 2006 from this same model run. Flather boundary conditions were used for barotropic velocities and sea surface height; flow relaxation was used for temperature and salinity.

2.1.1.

#### Model forcing

The EURO4M regional reanalysis product (Dahlgren et al., 2016; Landelius et al., 2016) was used as atmospheric forcing, both in the 0.25 NM GoF configuration and in the coarser configuration that provided the boundary condition. This product has the approximate horizontal resolution of 22 km, and its domain is centred in Europe. We used 10-m wind, longwave and shortwave radiation, humidity, 2-m air temperature, and precipitation fields at 3-hour intervals. The reanalysis was produced with the HIRLAM NWP (Numerical Weather Prediction) model version 7.3. It was constrained with the ERA-Interim product (Dee et al., 2011) on lateral boundaries and also via data assimilation.

Previously, forecasts from the operational HIRLAM NWP system have been used as forcing for the high-resolution GoF configuration (Westerlund et al., 2018). While the reanalysis product used as forcing in this study is also based on the HIRLAM model core, there are differences between this forcing dataset and the one that was used by Westerlund et al. (2018). In the reanalysis product, the forcing is more homogeneous for the whole period, since it is produced with the same resolution and version of the system over the whole time period. This is not always the case when forecasts are used. Also, reanalyses use a larger amount of data for the data assimilation, as usually all observational data are not yet available in real-time for the forecast runs.

By using the reanalysis dataset for this study, we have the same forcing dataset for both this model configuration and the coarser model used for boundary conditions. Furthermore, it enables us to have a longer modelling period than by using forecasts alone, where the aforementioned issues with data homogeneity and technical problems make compilation of a long-term forcing dataset challenging or practically impossible.

On the other hand, it should be noted that the resolution of the reanalysis is coarser than that of the NWP system, which might be an issue in some circumstances. For example, in the Finnish archipelago area on the northern coast of our modelling domain the shape of the shoreline is complex and requires sufficient resolution in the atmospheric model. Furthermore, a Europe-wide reanalysis product might not be as well-tuned to the local conditions of the Nordic countries as is an NWP product that is mainly focused on forecasting the weather in that area.

River runoffs included data for the four main rivers in the GoF area and was based on two sources. The river Kymi discharges were taken from the open data of the Finnish Environment Institute (SYKE). For the Neva, Narva and Luga rivers, data were provided by HydroMet as a part of the GoF Year 2014 (GoF2014) activities (http://www.gulfoffinland.fi).

2.2.

### Measurements

We used CTD data from three monitoring stations near the northern coast of the GoF for model validation: Haapasaari (depth 66 m), Länsi-Tonttu (depth 53 m) and Längden (depth 60 m, locations shown in Fig. 1). Temperature and salinity were available at 1, 5, 10, 20 and 40 m depths and the bottom layer, measured 1–3 times a month during the ice-free season.

We used wind measurements from the FMI’s coastal weather station Kalbådagrund (location shown in Fig. 1) to evaluate the accuracy of the meteorological forcing. This weather station is considered to be representative of open sea weather conditions in the GoF, and it has been used in many earlier studies (e.g. Lips et al., 2011; Tuomi et al., 2012). At Kalbådagrund, wind measurements are made at a 32 m height. From this station, we have data for the main meteorological parameters at 10-minute intervals.

2.3.

### Self-organising map (SOM)

The self-organising map (SOM), also known as the Kohonen Map, is an unsupervised learning algorithm based on artificial neural networks (Kohonen, 1982, 2001). It can be used as a way to reduce the dimensionality and extract features of a large dataset in a way similar to the classical Empirical Orthogonal Function (EOF) analysis or Principal Component Analysis (PCA). This will facilitate visualisation and analysis of the high-dimensional data, which is projected non-linearly, for example, in one- or two-dimensional space. The algorithm finds weight vectors from the original data, which are used to form a map or a codebook. The weights or nodes in this map can be reshaped back to characteristic data patterns. The best matching node is assigned for each point of the original time series (best matching unit, BMU). This way all data points in the time series are clustered into a predefined number of groups represented by nodes of the map. One of the benefits of this method for Earth sciences is that topological properties of the input space are preserved, that is, similar states are near each other on the map.

SOM has previously been used for a number of applications in physical oceanography (Liu and Weisberg, 2011; Thomson and Emery, 2014) from analysing ADCP to satellite data to model results. Applications include identification of characteristic current patterns (e.g. Liu and Weisberg, 2005) and hydrodynamics of coastal and estuarine environments (Williams et al., 2014). There are also other recent examples, for instance from the Mediterranean Sea (Falcieri et al., 2014; Fraysse et al., 2014) and the Pacific Ocean (Hisaki, 2013).

SOM offers some benefits with respect to EOF/PCA analysis. For example, the temporal mean does not have to be removed prior to applying the algorithm, which makes the output easier to visualise (Liu and Weisberg, 2005; Liu et al., 2006a). The outputs of the algorithm are not anomalies, as with EOF/PCA. Also, unlike EOF/PCA analysis, SOM is a non-linear method, which makes it better-suited for asymmetric patterns. Furthermore, missing data values are easier to handle in SOM analysis. A basic example of how SOM and EOF methods differ for feature extraction was provided by (Liu et al., 2006b). They combined several waveforms into a time series and then tried to recover them with both analysis methods. The EOF analysis was unable to recover the sine, step, sawtooth and cosine functions present in the data. A SOM map, however, recovered the waveforms successfully.

An inherent problem with the SOM algorithm is that the user predefines map dimensions. This always involves striking a balance between the easy interpretability of the output and the accuracy of mapping. This deficiency could be addressed with Growing Hierarchical Self Organised Maps (GHSOM) (Thomson and Emery, 2014), but those also add complexity to the analysis. It is also worth noting that statistical clustering algorithms such as SOM do not consider physical conditions, and any placement of a particular data point in any cluster is not definitive but depends on the chosen analysis parameters.

We used the SOMpy implementation of the algorithm (https://github.com/sevamoo/SOMPY). There are a number of parameters that can be adjusted for the SOM. We mostly followed suggestions by Liu et al. (2006b) for small map sizes: we used rectangular neural lattice, planar map shape and batch training. Unlike Liu et al. (2006b), we used random initialisation, as for our application the number of iterations before stabilisation was manageable. We used the Gaussian neighbourhood function, which according to Liu et al. (2006b) gives the smoothest patterns with the lowest noise levels.

3.

## Results

3.1.

### Model validation

Westerlund et al. (2018) validated a model configuration similar to the one in this paper. They compared salinity fields to gridded CTD salinity observations for three cruises in June 2013, June 2014 and September 2014. These 5-day cruises covered over 80 stations each in the western GoF, resulting in a grid with about 4 NM horizontal resolution across the gulf and 9 NM resolution along the gulf. It was found that the vertical structure of the salinity field was overall well-reproduced. In some cases the freshest water on the surface was incorrectly located and distributed wider than in the observations. This is often the case for model configurations of this area (Myrberg et al., 2010). Benchmarking the circulation patterns visually against a HIROMB-based reanalysis product showed that, overall, the patterns were similar in both models. The NEMO configuration in Westerlund et al. (2018) was mostly the same as the one in this study, but it used different atmospheric forcing, boundary conditions, and bathymetry. The atmospheric forcing in Westerlund et al. (2018) was based on FMI-HIRLAM forecasts for the years 2012–2014, and it used a 2 NM Baltic Sea–North Sea configuration on the boundary forced with the same forcing dataset. The domain of that configuration was slightly smaller with the western edge located at 23.5°E.

Here we present additional validation to the version of the configuration at hand. Statistical comparison to CTD observations from three frequently sampled stations (Haapasaari, Länsi-Tonttu and Längden, see Table 1) revealed that the ability of the model to reproduce temperature and salinity values in the area during the time period of investigation is in line with other GoF models. The model performed best in the western GoF, although the differences between stations were relatively small, especially for temperature. The model consistently seems to overestimate temperatures at these stations. Salinity biases are also in most cases negative, indicating modelled salinities at these stations were too high. Errors near the surface are in general smaller than lower in the water column.

The differences between meteorological forcing used in this study and the one used by Westerlund et al. (2018) were discussed in Section 2.1.1. To assess the differences between the forcing datasets, we calculated the statistics for wind forcing at the Kalbådagrund meteorological station for 2013, which was a common year for both of these datasets. Table 2 shows that FMI-HIRLAM has a lower wind speed RMS error and bias at this station. A comparison of wind speed time series (not shown) reveals that while both forcing datasets appear to be similar, the reanalysis-forcing in general has less extreme wind speeds than the forecast-forcing. There are fewer differences in the wind direction.

3.2.

### Seasonal mean currents

Figure 2 shows near-surface currents averaged over a 7-year period 2007–2013. In Fig. 3, mean currents are shown separately for spring (March–May), summer (June–August), autumn (September–November) and winter (December–February) periods. We see that those features that become most clearly visible in the mean over the whole run are those that also consistently appear in the seasonal means. However, those features that change place or direction from one season to another are not clearly visible in the full mean.

Fig. 2.

Annual mean circulation averaged from 0 to 7.5 m depth for the years 2007–2013. Velocities are in m/s. Vector arrows are drawn for every 15 grid points in the longitudinal direction and every 13 grid points in the latitudinal direction.

Fig. 3.

Mean circulation in 2007–2013 averaged from 0 to 7.5 m depth for spring (top), summer, autumn and winter (bottom). Velocities are in m/s. Vector arrows are drawn for every 15 grid points in the longitudinal direction and in every 13 grid points in the latitudinal direction.

The average over the whole period shows moderate (less than 2 cm/s) mean current speeds near the northern coast directed mostly towards west, and stronger (mostly from 2 to 6 cm/s) currents near the centre-line of the gulf. Several loops appear. There is a clear loop between 23°E and 24°E in the west, and a second one around 26°E. There are clear alongshore currents outwards from Neva along the northern coast. Another along-shore current is located west of Narva Bay along the southern coast. See Westerlund et al. (2018) for a discussion of this feature.

Seasonal averages reveal distinctly different structures of the circulation field from one season to another, and from the average over the whole run. We see that stronger currents generally appear in the autumn and the winter. This is expected, as wind speeds are also higher in those seasons than in summer and spring. An outflowing surface current near the northern coast is visible in the summer, winter, and to some extent in the spring. The location of the current pattern has some north-south variability. There is an inflow near the central part of the GoF in summer, autumn and winter seasons, but not in the spring.

In the summer, it is easy to distinguish an inflow near the centre of the gulf and an outflow near the northern coast. The same kind of pattern is visible also in the winter, although in general the field is less structured.

In the spring, there is no clear inflow in the western gulf, but rather an outflow both near the southern coast and the centre line of the GoF. Average currents near the northern coast are small. In the autumn there is a strong loop in the westernmost part of the domain and inflowing currents near the centre of the gulf.

3.3.

### Inter-annual variability

As noted in Section 1, there is inter-annual variability in the circulation field, which explains many of the differences between different studies. Also in our results, circulation maps differ from one year to another. We can demonstrate this by investigating more closely the mean surface circulation maps for 2007–2008 and for 2010–2011 (not shown). These periods were chosen to facilitate the comparison to circulation maps presented by Elken et al. (2011) and Lagemaa (2012) for 2006–2008 and 2010–2011, calculated from the HIROMB model at 1 NM resolution (see Section 4).

When compared to the map of the full simulation period in Fig. 2, the map for 2007–2008 shows somewhat stronger currents overall in the domain. There is a stronger westward current on the northern coast of the GoF extending from the east to approximately 25°E, with current speeds of mostly 2–4 cm/s. On the other hand, the map for 2010–2011 shows much weaker currents in this area on the northern coast. The current towards the east, visible near the centre line in the wider part of the GoF, is shifted towards the northern coast. Overall, the current speeds are lower than in 2007–2008.

3.4.

### SOM analysis of daily mean currents

Averages of the circulation fields revealed interesting differences between the seasons. We used a one-dimensional SOM analysis to further understand how circulation varied during the modelling period on different timescales. We performed the analysis for daily mean velocity fields for the years 2007–2013. This averaging period is longer than the period of inertial oscillations in the Baltic Sea (about 14 hours) and shorter than the longest seiche periods in the basin (around 31 hours). Due to the nature of the algorithm, we estimated that occasional seiches would not affect the output of the clustering in a significant way. We tested this with a sensitivity study that showed filtering the input with a rolling mean filter (window from 1 to 10 days) caused only small changes in the output of the SOM algorithm. Cf. for example, Leppäranta and Myrberg (2009) or Suhhova et al. (2018) for a discussion of periodic oscillations in the Baltic Sea.

Modelled currents were analysed by extracting 7 north-south sections along the domain from 24°E to 28°E (cf. Fig. 1). Locations of these sections were chosen so they would represent topographically and dynamically different domains of the GoF. The easternmost was located in the transitional area between the Neva estuary and the wider gulf, then two in the wider part of the basin between 26°E and 27°E. The rest were in the western gulf. Each of these sections was separately analysed. Vertical velocity was omitted from the analysis; only horizontal velocity components were considered. The number of nodes was set to five, which provided an acceptable compromise between the generality of the information, easy interpretability and the level of detail.

The resulting maps for four of these sections are presented in Figs. 4, 5, 6 and 7. As we chose to use five map nodes, we see five current patterns in each map. Each of these characteristic patterns represents a number of daily current fields that the algorithm has determined to belong to the same cluster. The patterns are topologically ordered on a 1D line so that patterns next to each other represent clusters that the algorithm considers similar. Conversely, patterns furthest away from each other at both ends of the map are considered by the algorithm to represent clusters that differ most from each other. Each pattern or node, numbered here from 0 to 4, also shows the percentage of daily fields that fall into this cluster. For example, in Fig. 4 we see that node 0 shows inflowing currents in the deepest parts of the section, and outflowing currents near the surface. Node 4 shows the opposite. Between these two outermost cases, there are patterns depicting what seem to be transitional states between the two.

Fig. 4.

SOM patterns for modelled currents at a north-south section at 24.04°E. Data covers the years 2007–2013. Nodes are numbered from 0 to 4. Frequency of occurrence as percentages is indicated for each node. Colour contours indicate the u component of velocity, with positive values (red hues) for eastward velocities (into the page). Grey vector arrows show the v component of the velocity field, with every tenth vector displayed. Velocities are in cm/s.

Fig. 5.

Like Fig. 4, but at 24.81°E.

Fig. 6.

Like Fig. 4, but at 26.26°E.

Fig. 7.

Like Fig. 4, but at 27.13°E.

The BMU time series corresponding to the map in Fig. 4 is shown in Fig. 8. For each day in the model run, it shows which cluster that day belongs to. From the BMU time series, we see that there is inter-annual variability and frequent changes of the best matching node. BMU plots for the other sections (not shown) are similar overall, and the BMU is often the same at all the sections. There are some differences, however, and sections further to the east show more frequent and rapid changes of the BMU.

Fig. 8.

SOM BMU time series for the section in Fig. 4. Node numbers are the same as in Fig. 4.

For all inspected sections, the results consistently showed a node in one end (node 4), where the zonal component of the surface current is towards east, either across the whole gulf or almost across the gulf. This depicts reverse estuarine circulation. The thickness of the surface layer varies, but it is most commonly less than 20 m. It is slightly thicker in the western sections than in the east. In the other end, there is a node that shows zonal currents that are mostly directed towards west on the surface (node 0). This depicts normal estuarine circulation.

Overall, nodes representing normal estuarine circulation display a more heterogeneous structure than nodes depicting reversed estuarine circulation. In many of the nodes representing normal estuarine circulation, there are maxima and minima present near the surface. If we compare these nodes to the mean circulation maps presented in Section 3.2, we see that those complex circulation patterns with numerous eddies and loops visible in those maps cannot emerge alone from averages of the relatively homogeneous and overall more barotropic circulation patterns that were observed in the model for reverse estuarine circulation. The nodes representing normal estuarine circulation are a significant contribution to the horizontal structures visible in the seasonal means.

In general, when moving from west to east, the sections display a gradually less regular structure. The shallower, wider part of the gulf in the east has a more complex bottom bathymetry, which also seems to affect the modelled patterns. The relative frequency of the two outermost nodes is quite close to each other for most sections, with both being the BMU for a little more than 1/5 of all days in the dataset. The transitional nodes are the BMU slightly less frequently. The transitional states depict situations where there is a significant amount of variability in the surface circulation, with non-permanent structures like loops or eddies near the surface.

The two outermost nodes shown in the SOM analysis can be more concretely understood by taking two example days when they were the best matching unit and then visualising the circulation in the GoF (Fig. 9). The case when node 4 was the BMU (16 December 2013) shows a uniform surface layer moving to the east and a compensating flow below it. This resembles reversed estuarine circulation. A comparison of this case from December 2013 to published observations from around this time (Lips et al., 2017, their fig. 2) shows concurring results, with outbound currents in most of the water column, but also inbound currents near the surface. In the case when node 0 was the BMU (21 January 2013), the areas of strong westward surface flows are near the coasts, and strong flows are in general more jet-like. This more closely resembles normal estuarine circulation. In this case, current speeds seem to be overall slightly higher near the northern coast.

Fig. 9.

Daily mean velocity maps and zonal component of velocity at north-south section at 24.04°E for 21 January 2013 (normal estuarine circulation) and 16 December 2013 (reversed estuarine circulation). Vector arrows are drawn for every 19 grid points in the longitudinal direction and every 17 grid points in the latitudinal direction. Location of the north-south section is indicated by a green line on the velocity maps.

3.5.

### Wind forcing and currents

Next, we compared the BMU at 24.04°E to model wind forcing at the Kalbådagrund weather station to evaluate the connection between circulation patterns and wind forcing. The years 2011 and 2013 are shown in Figs. 10 and 11. In general, it seems that there is a relation between wind and the BMU. We see that when southwesterly winds dominate, nodes indicating eastward flows near the surface are more frequent at 24.04°E. An example of this can be seen in late 2011. This event from December 2011 to January 2012 was documented by Liblik et al. (2013), based on two ADCPs and CTD data. On the other hand, we see that in summer 2011, southwesterly and easterly winds alternate, and current nodes alternate as well. There is also an interesting period in late 2013, when southwesterly winds dominate for approximately 3 months. This is associated with reversed estuarine circulation in the model. This event is significantly longer in the model than the reversal event in 2011, which according to Liblik et al. (2013) was unusually large. Lips et al. (2017) reported observations for the latter part of the 2013 event, which show a qualitative agreement with the model. An outflowing compensating current is clearly visible in the current observations. Vertical CTD sections show a retreating salt wedge near the bottom during the event, also suggesting outflowing currents. (Modelled surface salinity in the GoF during the 2013 event is discussed further in Section 3.7.)

Fig. 10.

Top: wind speed and direction used as model forcing at Kalbådagrund, 2011 (black dots). 14-day running mean overlay (red line). Bottom: the BMU timeseries at 24.04°E for the same period. Wind vector averages were calculated component-wise.

Fig. 11.

Like Fig. 10, but for 2013.

When we look at a histogram of wind direction for different nodes in Fig. 12, we see that reversal of estuarine circulation is clearly associated with southwesterlies and westerlies. There is another but smaller peak in the distribution for normal estuarine circulation for easterlies. Transitional nodes have a more even distribution. If we look at the corresponding histogram for wind speed (not shown), such clear differences are not observed. Reversed estuarine circulation is associated with slightly higher wind speeds, which is expected, as southwesterly winds tend to be stronger than easterlies in the GoF.

Fig. 12.

Wind direction distribution for BMU’s of SOM nodes as a stacked histogram, 2007–2013. Wind direction taken from the model forcing at the Kalbådagrund station. BMU from the section at 24.04°E.

3.6.

### Seasonality of circulation, based on the SOM analysis

The SOM analysis also allows a more fine-grained understanding of seasonal frequencies of different circulation types, for example by investigating the BMU hit count for each node for each day of the year over the whole modelling period (not shown). This means, that for each day of the year and for each SOM node, we calculate how many times that particular node is the BMU, summed over the modelled time period. This analysis revealed that the relative frequency of the nodes as the BMU changes from one season to another. No clear way to divide the year to different kinds of circulation regimes emerges from this analysis. Overall, we see that in our dataset fully developed reversed estuarine circulation is more common early and late in the year, while the transitional nodes are the best matching unit more frequently in the middle of the year. There is also a period in the autumn when normal estuarine circulation is more common. Another way of looking at the same situation is that from September to March transitional nodes are relatively rare, but from March to September they become more common. These results can also be compared to the division used by Soomere et al. (2011), where May–August is considered the calm period and October–March is considered the windy period. These time spans do not clearly stand out in our analysis.

3.7.

### Salinity gradients in the GoF

One of the main features of the salinity field in the GoF is that surface salinity gradients across the gulf are slanted and the field is not homogeneous on sections across the gulf. Surface salinity is, on average, lower on the northern coast than on the southern coast. For example, Kikas and Lips (2016) reported an average salinity difference of roughly 0.5 g/kg on the Tallinn-Helsinki Ferrybox line for the years 2007–2013 (their Fig. 3). This salinity difference is one of the most important indirect observations suggesting that a cyclonic long-term mean circulation pattern exists in the GoF. The SOM analysis gives an intuitive way to understand how this horizontal salinity structure emerges from the daily circulation patterns.

As noted, the reverse circulation field is quite homogeneous near the surface, as are the transitional states closest to it. Therefore the asymmetry in the long-term salinity structure can only come about from the circulation nodes depicting normal estuarine circulation, where flow is more heterogeneous across the GoF. Typically during normal estuarine circulation, there are outflowing currents present in the model, often on both coasts. Currents near the northern coast are often stronger than near the southern coast, as was the case in Fig. 9.

The surface salinity fields in Fig. 13, taken from the model around the same time as the circulation maps in Fig. 9, demonstrate this process. In January 2013, after normal estuarine circulation had been the dominant unit in the SOM analysis for some time, the surface salinity field shows clearly slanted salinity gradients, with higher salinity on the southern coast than on the northern coast. In December, after reversed estuarine circulation had been dominant for most of autumn, the surface salinity field shows a more complex structure. This point is further elaborated when we compare the difference in the model salinity on the northern and southern coast of the model to the BMU from the SOM analysis (Fig. 14). We see that, on average, salinity is higher on the southern coast when normal estuarine circulation is the BMU. On the other hand, the salinity difference is smaller, or salinity can even be higher, on the northern coast when the SOM analysis suggests there was reversed estuarine circulation in the GoF. This comparison does not take into account the time it takes for the salinity field to react to changed circulation patterns. But it still shows that the BMU and salinity differences are connected to each other and that normal estuarine circulation is required to establish the long-term average salinity field.

Fig. 13.

Mean surface salinity (PSU) in the model for late January 2013 and mid-December 2013 (cf. Fig. 9).

Fig. 14.

Histogram of modelled salinity (PSU) difference $\Delta S$ between northern and southern coast of GoF at 5 m depth, 2007–2013. Nodes 0 and 1 (normal estuarine circulation) and 3 and 4 (reverse estuarine circulation) are combined in this Figure. $\Delta S={S}_{18A}-{S}_{\mathit{LT}},$ where ${S}_{18A}$ is salinity at station 18 A near the southern coast and SLT at station Länsi-Tonttu near the northern coast (locations in Fig. 1).

4.

## Discussion

Seasonal averaging of the circulation fields revealed interesting differences between the seasons. In the spring, for example, there is often ice cover in the area early in the season and thermocline begins to develop closer to summer. We also know that in the spring, winds are generally weaker than on average, but runoffs are larger (as noted by e.g. Hela, 1952). These differences seem to show up in our results as weaker surface currents in the seasonal average and a stronger outflow from the GoF towards the Baltic Proper.

It is also worth considering the many features visible in the seasonal averages that were not visible in the mean of the whole simulation period. For example, there is at least some outflow visible near the northern coast in three of the four seasonal plots. But this feature is practically non-existent in the full mean, as inflowing currents in the same area late in the year overshadow it in the averages.

This same phenomenon was illustrated on a different scale in the BMU figures from the SOM analysis, where we can see several time periods when circulation quickly alternates from one outermost node to another (e.g. in summer 2011). When an average field is calculated over a period of quick changes between different circulation patterns, this results in a pattern that in practice was not present during that period. Long-term patterns represent different processes from short-term patterns.

The modelled mean over the whole 7-year high-resolution model run of the GoF did not reveal the classical cyclonic circulation pattern first described by Witting (1912) and later by Palmén (1930). However, our analysis suggests several reasons why this was so.

Variation from one year to another is an important factor. As our analysis showed, for some periods the mean circulation field resembles more the classical cyclonic pattern than for other periods. The choice of averaging interval is always subjective. A full analysis of inter-annual variability of circulation patterns in the GoF would require a multi-decadal high-resolution model run. This goal is getting closer with new high-resolution model studies such as the one presented in this study, and new multi-decadal analyses such as the ones published recently by Maljutenko and Raudsepp (2019) for the GoF (1 NM horizontal resolution) and Jędrasik and Kowalewski (2019) for the whole Baltic Sea (3 NM horizontal resolution).

In addition to inter-annual variability, our study underlines the importance of wind forcing for the long-term mean circulation field in the GoF. From our analysis, it is possible to see how different patterns contribute to the long-term means. Differences in forcing can lead to differences in the frequencies of the BMU nodes. If model forcing over- or under-represents some particular wind circumstances, these errors accumulate in the long-term averages. For example, from our analysis it seems that common-enough standard estuarine circulation is required for the cyclonic mean circulation pattern to emerge. Therefore, differences in wind direction distribution affecting the frequency of standard estuarine circulation may be one factor why some authors have obtained the classical cyclonic mean circulation pattern while some have not. For instance, if the wind direction distribution in the model forcing data has too frequent southwesterlies, the cyclonic mean circulation pattern would be weaker in the model than in reality.

Westerlund et al. (2018) also presented maps of the long-term mean circulation in the GoF. There were some differences between the results, most of which are likely normal variability between the years and due to differences in methodology. For example, the overall values of mean currents were slightly greater in Westerlund et al. (2018), but as that paper had a shorter averaging interval than this study, it was expected. Westerlund et al. (2018) used data gathered from the operational FMI-HIRLAM model as atmospheric forcing. This paper used the EURO4M atmospheric reanalysis. By comparing the forcing datasets for the overlapping period of these studies, we saw that the FMI-HIRLAM forecasts represented extreme wind events better than the reanalysis product. This may also contribute to the differences in results.

The forcing is also an important difference between the model runs in this study and the ones presented in earlier studies, such as Andrejev et al. (2004). The meteorological dataset in Andrejev et al. (2004) had geostrophic wind forcing with one-degree resolution which was extrapolated to the sea surface and corrected with a constant multiplier. It is possible that with a higher resolution forcing with higher, more variable and less smooth wind speed and direction, the circulation features are less persistent than in the study by Andrejev et al. (2004). Further study is needed to investigate this more closely.

Andrejev et al. (2004) discussed the two-layer structure of the circulation in the GoF visible in their results. Circulation in the very top layer seemed to be mainly wind-driven, whereas in the layers below that a more permanent structure could be observed in their longer-term averages, with outflowing current near the northern coast. Our SOM analysis revealed that many of the nodes had a similar circulation structure to the one presented by Andrejev et al. (2004), even though it does not show up as clearly in the mean values over the whole period. It seems that the averaged current fields presented by Andrejev et al. (2004) correspond more closely to the transient nodes in our SOM analysis. It is possible to find points from the SOM analysis where the currents seem to be quite stable at certain depths.

Elken et al. (2011) and Lagemaa (2012) presented mean circulation maps for two periods partially covered by our analysis period, calculated from the HIROMB model at 1 NM resolution. The map for 2006–2008 showed more features consistent with the cyclonic circulation pattern, with a relatively strong (around 5 cm/s) current along the northern coast of the GoF, and a number of loops in the southern side of the GoF. The map for 2010–2011 did not show such a strong current along the northern coast. Lagemaa (2012) also presented a map of the 2010–2011 circulation field run with a higher 0.5 NM resolution of the model. When we compare these maps to our results (see Section 3.3), we note that we saw similar differences in the mean circulation maps for 2007–2008 and 2010–2011 near the northern coast, although the magnitude of the currents was somewhat lower in our results. The map for 2007–2008 overall resembled more the traditional cyclonic pattern than the one for 2010–2011. We also note from their results that improved resolution intensified currents in many parts of the domain, for example in the southern coast west of Narva Bay, where their results show a relatively strong outward along-shore flow. This feature is similar to the one observed in our results, and it was discussed in length by Westerlund et al. (2018).

Lagemaa (2012) also presented an analysis of wind stress for the two periods discussed and noted that 2010–2011 saw much lower wind stresses along the dominating wind direction than 2006–2008. Our analysis shows that in addition to the wind stress, the wind direction distribution also needs to be considered. There were considerable differences in the wind direction distribution from one year to another, for example in the frequency of southwesterlies.

The results of the EOF/PCA analysis of GoF currents carried out by Elken et al. (2011) provide an interesting point of comparison for our results. They analysed the HIROMB model results for sections in the GoF. Decomposition of zonal currents into EOF modes revealed first what they called a ‘barotropic mode’ (42% of explained variance at a north-south section located at 24.38°E), showing unidirectional currents in the water column. The second mode they called the ‘Ekman mode’ (18% of variance), which showed uniform currents in the upper part of the water column, but then a compensating current of opposite direction in the deeper part. The third mode (7% of variance) and the fourth mode (6% of variance) showed a clearly non-uniform structure in the meridional direction, unlike the two first modes. They identified the third mode as the ‘Bennett-Csanady’ mode, representing a situation in long channels where along-wind coastal jets are compensated by an opposite direction flow in the middle of the channel. As the SOM analysis identifies prototypical flow patterns and the EOF/PCA analysis is a linear decomposition of the flow field anomalies into modes, these two results are not directly comparable. But nevertheless, we can see how the structure of the nodes representing standard estuarine circulation and reverse estuarine circulation, and especially the most notable heterogeneous structures discussed in this article, can arise as a linear combination of these EOF/PCA modes.

Elken et al. (2011) divided the GoF into two regions based on circulation variability. The western region behaves like a wide channel, while the eastern region has a more complex circulation structure due to the topographical features and the vicinity of the Neva estuary. This was seen also in our investigation when sections from east and west were compared. Our analysis suggests that the transition between these two states usually takes place somewhere near 26°E where the gulf widens. This is also consistent with the persistency maps by Andrejev et al. (2004), which showed lower values of persistency in the eastern parts of the GoF. The exact location of this transition zone can of course vary in time.

The SOM analysis revealed that in general the circulation patterns in the GoF can be classified with a one-dimensional presentation, with standard estuarine circulation in one end and reversed estuarine circulation in the other. The response of the circulation field to changing forcing can be fairly rapid, although it may take a day or two due to the inertia of the system. Reversal of estuarine circulation has been studied in the GoF by, for example, Elken et al. (2003), Liblik et al. (2013), Elken et al. (2014) and Lilover et al. (2017). This means events where southwesterly winds push the surface waters towards the head of the estuary and deeper waters are flowing outwards. Southwesterly winds dominate in the area, and the long axis of the gulf is oriented roughly in the west-east direction. These two factors together support reversal of the estuarine circulation. Our data suggest that standard and reversed estuarine circulation are roughly as common, although further study is required to build confidence in what the exact percentages of the two modes are in the GoF overall.

Some indication of the relative frequencies can be inferred from the analysis of flow variability by Lilover et al. (2017), based on data from 10 ADCP installations between 2009 and 2014, measuring usually 4–5 months each. They analysed data from four installations near the thalweg and categorised the flow into four regimes: estuarine circulation (EC), reversed estuarine circulation (REC), unidirectional inflow (UIN) and unidirectional outflow (UOUT). They found that REC was the most common flow type in their data (EC 26%, REC 30%, UIN 25%, UOUT 19%). EC was more common in the summer (34%) than in the winter (17%). UIN was more common than UOUT in the winter but not in the summer. Overall their results show relatively common reversals of estuarine circulation, both in summer and in winter, as did our data. Due to differences in methodology, such as the definition of categories, the relative frequencies of the regimes presented by Lilover et al. (2017) are not directly comparable to our results. Further study would be required for comprehensive comparison and to pinpoint the reasons for the differences. For example, it is possible that some cases categorised as UOUT in their analysis might be reversed estuarine circulation in ours if the layer with eastwards flow in the surface is thin and that is not reliably captured by the ADCP measurement. (They report that uppermost reliable measurements were 5 m or 10 m below the surface, depending on the location.) The same is possible for UIN and standard estuarine circulation. Furthermore, our transitional nodes could be categorised in any of the categories in their analysis, depending on the exact location of the contours.

The analysis of surface salinity in the model revealed that the traditional surface salinity pattern with slanted salinity gradients and lower salinities on the northern coast than on the southern coast emerges from the highly variable currents in the GoF visible on the timescale of days. The heterogeneous structures in surface circulation during normal estuarine circulation supports this pattern, even though a cyclonic circulation pattern does not typically appear in daily averages. Furthermore, as the reversal of estuarine circulation was seen to disrupt the traditional salinity pattern, and as this reversal is associated with southwesterly winds, it is understandable that any changes in the wind direction distribution will also affect the salinity pattern. If reversals become more common, we can expect on average saltier water on the northern coast of the GoF and fresher water on the southern coast than now.

The results of this article rely mainly on model calculations, which is natural given that models enable the study of current patterns in a way that is not possible from spatially or temporally sparse observations. But it is important to keep in mind the value of observations. For example, existing long-term ADCP measurements (e.g. Rasmus et al., 2015; Lilover et al., 2017; Lips et al., 2017; Suhhova et al., 2018) and Ferrybox measurements (e.g. Kikas and Lips, 2016) could be used far more to build confidence in model results. These kinds of comparisons will be in a key role in the work to determine if in fact the long-term circulation patterns in the GoF are changing, as some of the recent modelling studies suggest. Also, new measurements could be helpful. For example, a series of ADCP installations on ideally several latitudinal sections would enable a more detailed investigation of how well models are able to capture true circulation features.

Self-organising maps proved to be a powerful tool for the analysis of the circulation in the area. While they have been successfully used for numerous oceanographic applications around the world, to our knowledge this is the first application to the hydrodynamic modelling of the Baltic Sea. This analysis can be considered complementary to many more traditional techniques such as the EOF/PCA analysis.

As other techniques, SOM also has pros and cons. In addition to the general issues mentioned in Section 2.3, in our case the selection of input to the algorithm was non-trivial. We had to choose by hand which sections were analysed, as it was impossible to analyse the full model output at once due to computational limits. Although we tested several different locations for the sections, it is still possible that some other choice of sections would have resulted in differing results. The brute force way of addressing this problem would be to run the analysis again for larger parts of the output data at once, when available computing power allows it. Another issue that required consideration was how the input data to the algorithm should have been filtered to remove periodic motions. In the end, we opted to use daily averages, as the use of additional filtering did not seem to greatly affect the output of the algorithm. But for other applications and/or algorithms, it may be necessary to use a longer time-averaging window, for instance. Nevertheless, even with these limitations and when applied with care, these algorithms can provide significant insight into huge datasets.

Further applications of the SOM technique could be illuminating. For example, here we chose to use a relatively small, but robust 1D map to make the results more easily accessible. More detailed information might be extracted by using a more refined approach. One might, for example, try to use a larger 2D map to chart transitions from one circulation state to another. This method, along with other machine learning methods, could be applied more extensively, both to modelling and observational data sets in the future. It could be used, for example, as a tool for exploratory analysis of huge modelling or observational datasets.

5.

## Conclusions

We applied the NEMO 3D hydrodynamic model to the analysis of circulation patterns in the GoF. Based on a high-resolution 7-year run of the model, we studied how circulation patterns in the GoF change from season to season.

The main conclusions are:

• There is clear seasonal variation in the circulation patterns in the GoF.
• In many cases, averages hide or dampen circulation patterns that move or change direction from season to season.
• SOM analysis of the modelled currents emphasised the estuarine character of the GoF. It showed how circulation in the gulf changes rapidly between normal estuarine circulation and reverse estuarine circulation, along with transitional states between the two. The dominant wind direction supports this reversal.
• The SOM analysis demonstrated that in our model results, both normal and reverse estuarine circulations are roughly as common in the GoF.
• The emergence of the cyclonic mean circulation pattern seems to require that standard estuarine circulation is common enough during the averaging period, as during standard estuarine circulation there are more heterogeneous structures in the surface currents. A multi-decadal high-resolution model run would be required for a full analysis of inter-annual variability of circulation patterns in the GoF.
• The long-term surface salinity field structure in the GoF, where surface salinities are higher on the southern coast and across-gulf salinity gradients are slanted, is also supported by the heterogeneous structure of the surface currents during normal estuarine circulation. Surface currents during reversed estuarine circulation are quite homogeneous and do not support this structure.