## 1. Introduction

Owing to their tremendous societal impacts and scientific complexity, tropical cyclones (TCs) have been one of the most intense research subjects in the recent few decades. Individual TCs undergo the stages of genesis, growth, and decay. Less than one hundred TCs are generated across the globe every year, and objective TC observations by satellites are available only for ~40 years. When converted into gridded data, the observed genesis number of TCs in a 2.5°lat× 2.5°lon grid during a month for each year is either 0 or 1 in most cases. If composite analysis is performed with these gridded TC data in a certain climate variation mode (e.g., monsoon, El Nino-Southern Oscillation (ENSO), or Madden-Julian Oscillation (MJO)), the resulting composite anomalies of TC genesis frequency (GFq) become very noisy such that a decisive interpretation of the composite results is hindered. For a more robust analysis of the temporal variations of TC genesis in a gridded space, it is necessary to reduce the spatiotemporal resolutions of the analysis substantially or to use other TC genesis variables that change more smoothly in space and time than GFq.

There have been a few studies attempting to generate smoothly varying GFq in a gridded space. For example, Zhao et al. (2010) and Chand et al. (2017) computed spatially smoothed GFq by partitioning individual TC genesis events into nearby grid boxes using uniform or two-dimensional Gaussian distributions with a specified width as a normalized probability density function (PDF) for individual TC genesis events. However, the shape and width of the assumed PDF are somewhat arbitrary without any physical backgrounds supporting their formulations. Emanuel and Nolan (2004) proposed a genesis potential index, $GPI=C\cdot {\left|{\eta}_{850}\right|}^{1.5}\cdot R{H}_{600}^{3}\cdot P{I}^{3}\cdot {\left(1|0.1\cdot {V}_{shear}\right)}^{-2}$ as a proxy for TC genesis, where *η*_{850} is the absolute vorticity at 850 hPa, RH_{600} is the relative humidity at 600 hPa, PI is the potential intensity, V_{shear} is the magnitude of vertical shear of the horizontal wind vector between 200 and 850 hPa, and C is a constant. The four environmental variables constituting genesis potential index (GPI) vary smoothly in space and time and are known to well represent the key physical processes controlling TC genesis. Camargo et al. (2007) showed that GPI reproduces the annual cycle of GFq in different ocean basins and also the ENSO composites reasonably well. Although varying smoothly in space and time and capturing some important aspects of GFq, GPI is an approximate proxy, purely consisting of large-scale environmental variables such that there is a limitation in accurately depicting the spatiotemporal variations of the observed TC genesis. For example, Wang and Moon (2017) showed that GPI has a limitation in representing the intra-seasonal variations of the observed TC genesis.

In this paper, we introduce a genesis probability (GPr) of TC, a spatially blended GFq obtained by combining the discrete GFq with continuous GPI, as will be explained in Section 2. In Section 3, we analyze the spatiotemporal variations and composite anomalies of gridded GFq, GPI, and GPr in association with several climate variation modes and show that GPr converges to GFq over a long-term period, varies more smoothly than GFq, and represents the spatiotemporal variations of GFq much better than GPI and other spatially smoothed GFqs suggested by previous studies. A summary and conclusions are provided in Section 4.

## 2. Data and Analysis Methods

### 2.1. Tropical Cyclone Data

The observed TCs are obtained from the International Best Track Archive for Climate Stewardship (IBTrACS, Knapp et al. (2010)), which compiled individual TC track data across the globe from various sources including the data from the Regional Specialized Meteorological Center (RSMC) Tokyo-Typhoon Center over the western North Pacific (Kunitsugu, 2012) and the data from the US Navy’s Joint Typhoon Warning Center (JTWC), from January 1979 to December 2016 (38 years) (Chu et al., 2002).

To supplement the short period of observation, we also analyzed TCs simulated by the Seoul National University Atmosphere Model Version 0 with a Unified Convection Scheme (SAM0-UNICON, Park et al. (2019)) in a fully coupled mode at the horizontal resolution of 1°lat×1°lon during the pre-industrial period for 600 years. SAM0-UNICON (or simply, SAM0) is one of the international general circulation models participating in phase 6 of the Coupled Model Intercomparison Project (CMIP6; Eyring et al. (2016)). It is based on the Community Atmosphere Model version 5 (CAM5; Neale et al. (2010);Park et al. (2014)) but CAM5’s shallow (Park and Bretherton, 2009) and deep convection schemes (Zhang and McFarlane, 1995) are replaced by a unified convection scheme, UNICON (Park, 2014a,b) with a revised treatment of convective detrainment processes (Park et al., 2017). Park et al. (2019) showed that the overall mean climate, ENSO, and global warming during the 20th century simulated by SAM0 are similar to those from CAM5 but SAM0 substantially improves the simulations of the tropical cyclones (Song et al., 2019), MJO (Madden and Julian (1971);Yoo et al. (2015);Ahn et al. (2019)), and the diurnal cycle of precipitation.

We used the six-hourly instantaneous model outputs at the horizontal resolution of 1°lat×1°lon to define the onset of TCs. The criteria used for defining the onset of simulated TCs (or TC genesis) are identical to the ones used by Park et al. (2019). Based on the methods suggested by previous studies (i.e., Walsh (1997);Hodges et al. (2003);Bengtsson et al. (2006)), Park et al. (2019) defined the onset of SAM0-simulated TCs if the relative vorticity at 850 hPa, *ξ*_{850}, is larger than 12.5×10^{−5} s^{−1}, the warm-core strength, *ξ*_{850}−ξ200, is larger than 12.5×10^{−5} s^{−1}, and the two conditions are satisfied for at least two consecutive days. The observed and simulated individual TCs are converted into the regularly gridded monthly data at the horizontal resolution of 2.5°lat×2.5°lon.

### 2.2. Computation of TC GPr

Our objective is to compute TC GPr from individual TC genesis events (GFq). A fundamental assumption employed in our approach is that, although seemingly sporadic and somewhat ubiquitous, GFq computed from individual TC genesis events is a discretized realization of GPr that is mainly determined by a set of large-scale environmental conditions controlling the formation of TCs, i.e., GPI. Consequently, GPr can be represented by a certain intermediate combination of GFq and GPI. As GPI varies in space and time more smoothly than GFq, GPr will also vary more smoothly than GFq.

We will explain how to compute monthly gridded GPr(*i*,*m*,*y*) for each year for the entire TC genesis events, where *i*, *m*, and *y* are the indices denoting individual 2.5°lat×2.5°lon grid box, month, and year, respectively. Let us assume that we have a set of total TC genesis events *N* for all the *Y* years across the globe. Let us consider a specific *n*th TC genesis event that occurs in the grid box with an index, *j*=13 (Fig. 1b) during a calendar month, *m*, and year, *y*. We compute the genesis probability GPr(*i*,*m*,*y*;*n*) corresponding to the *n*th TC genesis by partitioning a real number 1 into the nearby 25 grid boxes in proportion to the product of the monthly genesis probability index [GPrI(*i*,*m*,*y*), which will be defined later] and the squared inter-annual correlation coefficient [*r*^{2}(*i*,*m*)|_{y=1,Y}] between the monthly GPrI where the nth TC is generated [GPrI(13,*m*)|_{y=1,Y}] and the monthly GPrI in each of the nearby 25 grid boxes [GPrI(*i*,*m*)|_{y=1,Y}]:

where we set GPr(*i*,*m*,*y*;*n*)=0 if the grid box is a nonocean grid or *r*^{2}(*i*,*m*)|_{y=1,Y} < 0.5, and the criterion of 0.5 is empirically chosen to exclude weakly correlated grid boxes. By construction, ${\sum}_{j=1}^{25}G\mathrm{Pr}\left(i,m,y;n\right)=1$. The monthly gridded GPr(*i*,*m*,*y*) for each year is obtained by summing GPr(*i*,*m*,*y*;*n*) for the entire TC genesis events, i.e., $GPr\left(i,m,y\right)={\displaystyle {\sum}_{n=1}^{N}GPr\left(i,m,y;n\right)}$. Note that ${\sum}_{i=1}^{I}GPr\left(i,m,y\right)}={\displaystyle {\sum}_{i=1}^{I}GFq\left(i,m,y\right)$, where *I* is the total number of grid boxes across the globe such that GPr conserves the sum of TC genesis numbers integrated across the globe. The aforementioned Eq. (1) distributes the localized GFqs of individual TCs (e.g., 1 in the grid where TC is generated and 0 in the other grids) into the nearby grid boxes in proportion to the spatiotemporal coherence of monthly GPrI(*i*,*m*,*y*).

As shown in Fig. 1a, GPrI(*i*,*m*,*y*) is obtained from GPr(*i*,*m*,*y*) using an iterative regression analysis. At the first iteration, GPrI(*i*,*m*,*y*) is set to GPI(*i*,*m*,*y*) whereas at the other iterations, GPrI(*i*,*m*,*y*) is obtained from the multiple log-linear regression analysis of climatological monthly gridded GPr(*i*,*m*) (where the overbar denotes the average across all the years, *Y*) over the ocean between 40°N and 40°S for the four climatological monthly environmental variables constituting GPI [i.e., absolute vorticity at 850 hPa (*η*_{850}), relative humidity at 600 hPa (RH_{600}), potential intensity (PI), and the magnitude of vertical shear of the horizontal wind vector between 200 and 850 hPa (V_{shear})]. The number of data points used for the regression analysis is *I _{to}*×12, where

*I*is the number of 2.5°lat×2.5°lon grid boxes in the tropical oceans between 40°N and 40°S. At each iteration, the four environmental variables constituting GPI,

_{to}*ϕ*(

*i*,

*m*,

*y*) (where

*ϕ*=

*η*

_{850}, RH

_{600}, PI, V

_{shear}) are inserted into the climatological regression line to compute GPrI(

*i*,

*m*,

*y*). The iteration (i.e., {GPrI(

*i*,

*m*,

*y*), GFq(m,y)|

_{n=1,N}}→GPr(

*i*,

*m*,

*y*) from Eq. (1) and {GPr(

*i*,

*m*,

*y*),

*ϕ*(

*i*,

*m*,

*y*)}→GPrI(

*i*,

*m*,

*y*) using the regression analysis) is repeated 15 times until a reasonably convergent solution is obtained (i.e., until the powers of the four individual environmental variables converge). In the case of the observations with

*Y*=38, the regression analysis at the last iteration produces $\overline{GPr\left(\iota ,m\right)}\propto {\left|{\eta}_{850}\right|}^{1.08}R{H}_{600}^{2.02}P{I}^{1.92}{\left(1+0.1{V}_{shear}\right)}^{-2.13}$ whereas in the case of the simulations with

*Y*=600, it becomes $\overline{GPr\left(\iota ,m\right)}\propto {\left|{\eta}_{850}\right|}^{1.46}R{H}_{600}^{2.09}P{I}^{1.64}{\left(1+0.1{V}_{shear}\right)}^{-1.12}$. Although the same environmental variables were used in the regression analysis, our GPrI differs from GPI because GPrI is the best fit to the smoothed TC probability, GPr, whereas GPI is the best fit to the localized TC probability, GFq. However, the powers in each of the four environmental variables of the observed GPrI are roughly similar to those of the GPI observed by Emanuel and Nolan (2004). Note that the unit of GPr is [#/2.5°lat×2.5°lon/month], which is identical to that of GFq. In summary, our procedure transforms the PDF of individual TC genesis events from the localized ä-function (i.e., GFq) to the spatially smoothed empirical distribution function (i.e., GPr).

## 3. Results

The weighting function used in Eq. (1) (W=r^{2}·GPrI) spreads the discrete stepwise GFq of individual TCs into the nearby grid boxes according to the spatiotemporal coherence of the controlling environmental variables. If GPr obtained from Eq. (1) is a true GPr of TC, its long-term average in a sufficiently small grid box should be similar to that of GFq. Table 1 summarizes the spatiotemporal correlation coefficients (r) and rootmean- square errors (rmse) between the seasonal climatologies of GFq and GPr at various horizontal resolutions. For comparison, the same analysis with different weighting functions suggested by previous studies and W=r^{2} is also shown (W=1 for Uniform and W from a two-dimensional Gaussian distribution for Gaussian). To complement the short period of available observation data, the same analysis with a much longer duration i.e., 600 years of simulations is also shown. As the size of the grid box increases, the differences in r and rmse between our GPr based on Eq. (1) and other GPrs become small (note that if the size of the grid box is identical to that of the entire earth, the correlation coefficient between GFq and GPr becomes 1 for all weighting functions). The Gaussian weighting performs better than the uniform weighting. In the case of observation, our GPr at 2.5°lat×2.5°lon (5°lat×5°lon) performs as well as the uniform-weighted GPr at 5°lat×5°lon (10olat×10olon). In the case of simulation, our GPr at 2.5°lat×2.5°lon (5°lat×5°lon) performs as well as the Gaussian-weighted GPr at 5°lat×5°lon (10olat×10olon). In all horizontal resolutions both for the observation and simulation, our GPr based on Eq. (1) produces the best results compared with the other GPrs with different weighting functions. With sufficiently long-term simulation data, our GPr at a horizontal resolution of 5°lat×5°lon is almost perfectly correlated with GFq (r=0.974), supporting its validity as a proxy representing a true TC GPr. In summary, our GPr reasonably converges to GFq when averaged over a long-term period in a decent grid size, better depicts GFq with greater spatial details than the other spatially smoothed GFqs, and represents the spatiotemporal variations of GFq much better than GPI (e.g., r(GFq,GPr) =0.974 while r(GFq,GPI)=0.574 at 5°lat×5°lon for the simulation). In the following section, we will compare the performances of GFq, GPr, and GPI to represent the climatology and climate modes-induced variabilities of TC genesis.

Figure 2 shows the climatological distribution of the observed GFq, GPr, and GPI during July-October (JASO) and December-March (DJFM) and Fig. 3 shows the inter-annual time series of seasonal GFq, GPr, and GPI in several regions shown in Fig. 2. The overall spatial pattern of GPr is similar to that of GFq with high TC genesis over the subtropical western and eastern Pacific and western and eastern Atlantic Oceans during JASO and over the southern Indian and southwestern Pacific Oceans during DJFM. However, GPr shows smoother distributions than GFq, and a few isolated scattered spots with high GFq over the subtropical western Pacific Ocean are smoothed out in the GPr plot. The overall spatial pattern of GPI is roughly similar to those of GFq and GPr but GPI shows smoother distributions than GPr with non-zero values in the Persian Gulf and Red Sea during JASO and the western coast of South Africa during DJFM where TC is not generated in reality. The spatial correlation between GPI and GFq is 0.670 during JASO and 0.714 during DJFM, which is lower than that between GPr and GFq (0.924 during JASO and 0.869 during DJFM). Similar to the spatial pattern, GPr shows more (less) smooth temporal variations than GFq (GPI) (Fig. 3). For example, over the South Indian Ocean, the values of GFq are either 0 or 0.084 (except for the year 2011) but GPr has more diverse non-zero values and GPI has even more diverse nonzero values that vary smoothly. In some years, GPr is non-zero even though the corresponding GFq is zero, which is due to the contribution of the probability from the nearby grid boxes owing to the spatiotemporal coherence of GPI. Similar to the spatial correlation, the temporal correlations between GPr and GFq are higher than the correlations between GPI and GFq in all the regions examined. This indicates that GPr is better than GPI in representing temporal variation of TC activity.

Figure 4 shows the composite anomalies of GFq, GPr, and GPI in association with the Asia summer monsoon over the western North Pacific Ocean, ENSO, and MJO. The active Asia summer monsoon years are defined as the years when the WNP monsoon index is positive. Here, the WNP monsoon index is defined as the difference in zonal winds at 850 hPa between the regions averaged over (100°E- 130°E, 5°N-15°N) and (110°E-140°E, 20°N-30°N) during the months of June to September (Wang and Fan, 1999). The El Nino, neutral ENSO, and La Nina events are defined as years in which the standardized detrended monthly sea surface temperature (SST) anomalies averaged over the NINO34 region (170°W- 120°W, 5°S-5°N) during the months of November to January are greater than 1, between −1 and 1, and smaller than −1, respectively. To define the MJO phases for individual days, according to Wheeler and Hendon (2004), we conducted multivariate empirical orthogonal function analysis using bandpass-filtered, outgoing long-wave radiation (OLR) and zonal winds at the levels of 850 and 200 hPa averaged over the range of 15°S-15°N for 20 to 100 days. The daily MJO index is obtained by summing the square of the first two normalized principal components. The days with an MJO index greater than 1 were grouped into eight MJO phases (P1, P2, ..., P7, and P8) based on the two principal components.

During the positive phase of the Asia summer monsoon, an anomalous cyclonic vortex is developed over the western North Pacific Ocean along 20°N between 125°E and 145°E. In this region, GFq tends to be positive (Chen et al., 2004) but the overall anomaly pattern is too noisy. In contrast, GPr is much better organized in space than GFq with systematic increases in the center and southeastern portions of the anomalous cyclonic vortex and decreases in the surrounding areas, which are qualitatively similar to GPI. The composite analysis of TC genesis associated with ENSO shows more drastic differences between GFq and GPr. Almost all GFqs are not statistically significant at the 80% confidence level from the twoside Student’s t-test and GFqs with opposite signs coexist ubiquitously in many regions. However, during El Nino, GPr is significantly positive along the northern flank of the anomalous warm SST in the central and eastern equatorial Pacific and significantly negative over the subtropical western Atlantic including a portion of the eastern Pacific near the coast and also the subtropical western Pacific Oceans. Similar significant GPrs with an opposite sign are observed during La Nina when GPr is significantly positive over the western Philippines extending northeastward into southern Japan. Consistent with previous studies (Choi et al. (2019);Li et al (2013);Murakami et al. (2011)), GPI well represents interannual variation of TC genesis associated with ENSO. The overall patterns of GPI are smoother than those of GFq and GPr. In the subtropical western Pacific Ocean and northern Australia, GFq tends to be positive (negative) in association with the negative (positive) anomalies of OLR during the MIO phases 1 and 2 (3 and 4) and a similar association between GFq and OLR is observed over the eastern equatorial Pacific and subtropical western Atlantic Oceans (Li and Zhou (2014); Maloney and Hartmann (2000a,b)). However, most GFqs are not statistically significant at the 80% confidence level. Similar to the cases of the Asia monsoon and ENSO, the GPrs associated with MJO are better organized in space and more statistically significant than GFq. The pattern correlations between GPr and GFq are higher than those between GPI and GFq.

Figure 5 shows similar composite anomalies as Fig. 4 obtained from the SAM0-UNICON simulations for 600 years. We performed another (independent from the observation data) iterative regression to make the equation for the GPr of SAM0-UNICON simulations. Compared with the observations, the simulated positive SST anomalies during the El Nino years extend too far westward into the western equatorial Pacific Ocean and accordingly, the simulated positive anomalies of GFq and GPr also extend too far westward. Similar features can be seen during the La Nina years. Except for these, the overall patterns of the simulated composite anomalies are similar to those of the observations. However, with a long available period, GFq is more similar to GPr and is much less noisy than the observation. The simulated pattern correlations between GPr and GFq are much higher than the observed ones shown in Fig. 4 for all the composite cases. This indicates that GPr reasonably captures important temporal variations of GFq in association with various climate modes such that it can be used as an alternative proxy for GFq in climate research for TC genesis.

## 4. Summary and Conclusion

The genesis of TCs is highly sporadic in space and time. However, objective observations of TCs are only available for 40 years. Consequently, the composite anomalies of the gridded GFq of TCs in association with various climate variation modes are too noisy to be interpreted, unless the spatiotemporal resolutions of TC analysis are substantially reduced. To complement GFq, we define the GPr of TCs by distributing GFq into nearby grid boxes in proportion to the spatiotemporal coherence of the GPI. GPr is not a direct observation but a hybrid quantity combining the observation (GFq) with the well-known proxy for the TC genesis (GPI). It was shown that GPr varies more smoothly in space and time than GFq; it converges to GFq when averaged over a sufficiently long-term period in a decent grid size, supporting its validity as a proxy representing a true TC GPr; it represents the spatiotemporal variations of GFq much better than GPI; and it depicts GFq with greater spatial details than other spatially smoothed GFqs. The observed composite anomalies of GPr associated with the Asia monsoon, ENSO, and MJO are much less noisy and better organized than those of GFq and are therefore better interpretable.

In short, GPr provides a new opportunity for climate researchers to study TC genesis in a regularly gridded domain in a better way than with GFq and GPI. Particularly, GPr can be utilized to foresee future spatial distribution of TC genesis with a low resolution GCM in a more accurate way than with GPI.