Introduction to Statistics and Indicators in Finance

Introduction to Statistics and Indicators in Finance#

Many fields in finance rely on some sort of data analysis

For example fundamental analysis of a business:
- various numbers from the company itself (i.e. from the balance sheet)
- indicators representing the overall market

Results drive decisions, e.g. for an investement

There are many well established techniques and statistics for pricing financial products.

Some techniques more prominent since the dawn of powerful algorithms and artificial intelligence

Returns#

In order for an investment to be profitable, the money it yields must be higher than the inital investment made (plus transaction costs).

Assess so called return, usually discrete: relative change in investment value $S$.

\[ r_t = \frac{S_t - S_{t-1}}{S_{t-1}} =\frac{S_t}{S_{t-1}} - 1 \]

Note here that one time step $t$ is of arbitrary length, e.g. daily or monthly

Additivity#

Returns my be based on different time spans $\rightarrow$ aggreagte somehow

The property we are looking for is called addititvity: sum shorter-scale returns to get the larger-scale returns

Note that we can’t just split monthly returns into daily returns, this requires making assumptions on the distribution

Daily and weekly returns:
To calculate weekly returns from daily returns, we mustn’t use the daily return as is.

time $t$	0	1	2	3	4	5	6	7
prices $S_t$	100	110	121	110	132	105	112	105
return $r_t$	—	0.10	0.10	-0.09	0.2	-0.20	0.07	-0.06

If we simply added all returns, we’d find a weekly return of $r_{0,7} = 0.12$.
However, using the formula from above, we find that $$ r_{0,7} = \frac{S_7}{S_{0}} - 1 = \frac{105}{100} - 1 = 0.05 $$ and conclude that indeed daily returns cannot simply be added up in order to yield the weekly return.

log-returns#

With a single transformation however, we can can establish the desired additivity over time

The transformation needed is taking the logarithm and getting log returns

Logarithmise the (daily) prices and get the difference between sucessive values.
Or logarithmise the ratio of the past and current price: $$ r_t^{log} = \log{\frac{S_t}{S_{t-1}}} $$ \downarrow(\log{\frac{x}{y}} = \log{x}- \log{y})$$ = \log{S_t} - \log{S_{t-1}} $$

These log returns/continuous returns, now exhibit additivity over time

same prices, but log returns:

time $t$	0	1	2	3	4	5	6	7
prices $S_t$	100	110	121	110	132	105	112	105
log return $r_t^{log}$	—	0.10	0.10	-0.10	0.18	-0.23	0.06	-0.06

Add all log returns to find weekly log return to be $$ r_{0,7} = 0.05 $$.

To convert log returns to discrete returns, take the exponent (inverse of the logarithm) of the log return and subtract 1.
Here, we also get a result of $0.05$

Having a look at the math with logarithmised prices, we can unveil the characteristics of this additive behaviour: $$ r_{t=2}^{log} =\log{\frac{S_t}{S_{t-1}}} = \log{S_{t=2} - \log{S_{t=0}}} $$\downarrow \text{"add zero": } - \log{S_{t=1}} + \log{S_{t=1}} = 0 $$ r_{0, t=2}^{log} =\log{S_{t=2} + 0 - \log{S_{t=0}}} $$ r_{0, t=2}^{log} = (\log{S_{t=2} - \log{S_{t=1}}) + (\log{S_{t=1}} - \log{S_{t=0}}}) $$ r_{0, t=2}^{log} = r_{1, t=2}^{log} + r_{0, t=1}^{log} $$

We notice that log returns are very close to the discrete returns.
This is due to the fact, the logarithm behaves almost linear for values close to zero.

$\rightarrow$ for small values of the discrete return, the log returns are very similar.

Note that this applies almost always for daily returns and even shorter time spans.

For a monthly return (or quarterly, yearly) there “is more time for the value to develop”

Let’s have a look at returns using python

df = pd.DataFrame({
    'S': [100, 110, 121, 110, 132, 105, 112, 105],
    },
index=list(range(8)))
df

	S
0	100
1	110
2	121
3	110
4	132
5	105
6	112
7	105

We can calculate discrete returns simply by using a method of a Series object: .pct_change().
Note the NaN value for the first line.

To calculate log returns with numpy’s np.log()

# use prices
df['discrete_returns'] = df.S.pct_change()
df['log_returns'] = np.log(df.S).diff()
df['log_returns_alt'] = np.log(df.discrete_returns.dropna()+1)
df['discrete_returns_from_log'] = np.exp(df.log_returns) - 1
df

	S	discrete_returns	log_returns	log_returns_alt	discrete_returns_from_log
0	100	NaN	NaN	NaN	NaN
1	110	0.100000	0.095310	0.095310	0.100000
2	121	0.100000	0.095310	0.095310	0.100000
3	110	-0.090909	-0.095310	-0.095310	-0.090909
4	132	0.200000	0.182322	0.182322	0.200000
5	105	-0.204545	-0.228842	-0.228842	-0.204545
6	112	0.066667	0.064539	0.064539	0.066667
7	105	-0.062500	-0.064539	-0.064539	-0.062500

Portfolios and cross sectional additivity#

A portfolio ($PF$) is a collection of investments, e.g. stocks

Portfolios are allocated differently, i.e. by a degree of risk

We can describe a portfolio’s value by the sum of its constituents.
We define their value by a weight (fraction of total capital) $w_i$ multiplied by the respective stock’s value $S_i$. $$ P_i = w_i \cdot S_i$$ where $i$ indicates the company.

Given the prices of single stocks in a PF, how can we calculate the portfolio’s performance over all stocks? $\rightarrow$ calculate the performance from $t=0$ to $t=1$ for the entirety of stocks in our portfolio.

Use returns!

For the cross sectional portfolio return we must not use log returns.
Discrete returns already possess the desired cross sectional additivity.

Looking at the portfolio return, we’ll use the price development of a single stock $$ P_{t+1}^i = P_t^i (1 + r_{t+1}^i) $$

It can now be derived, that discrete returns do indeed possess the cross sectional additivity $$ r_{t+1}^{PF} = \sum_i P^i_t (1 + r_{t+1}^i) $$ = \sum_i{w_i S_t} + \sum_i{w_i S_t r_{t+1}^i} $$ = S_t \cdot (1 + \sum_i w_i r_{t+1}^i) $$ with $\sum_i w_i = 1$.

Let’s have a look at the following data:

time $t$	0	1	2	3	4	5	6	7
company A $A_t$	100	110	121	110	132	105	112	105
company B $B_t$	100	120	124	118	117	135	128	115

For simplicity, we will assume to invest the same amount of money in both stocks. This gives initial portfolio weights $w_1=w_2=0.5$. For such a naive portfolio, we can then just apply the mean, i.e. the portfolio return on day $t$ is just the mean of all returns $r_{i,t}$ for all companies $i$.

Calculate the daily returns of the portfolio using pandas:

df = pd.DataFrame({
    'A': [100, 110, 121, 110, 132, 105, 112, 105],
    'B': [100, 120, 124, 118, 117, 135, 128, 115],
    },
index=list(range(8)))
df

	A	B
0	100	100
1	110	120
2	121	124
3	110	118
4	132	117
5	105	135
6	112	128
7	105	115

df['A_return'] = df.A.pct_change()
df['B_return'] = df.B.pct_change()
df.loc[1:,'naive_pf_return'] = df.loc[1:,['A_return', 'B_return']].mean(axis=1)
df

	A	B	A_return	B_return	naive_pf_return
0	100	100	NaN	NaN	NaN
1	110	120	0.100000	0.200000	0.150000
2	121	124	0.100000	0.033333	0.066667
3	110	118	-0.090909	-0.048387	-0.069648
4	132	117	0.200000	-0.008475	0.095763
5	105	135	-0.204545	0.153846	-0.025350
6	112	128	0.066667	-0.051852	0.007407
7	105	115	-0.062500	-0.101562	-0.082031

What is the total portfolio return over the span of the 8 days above? Again, we can just transform the portfolio return to log returns and sum the whole column. The same rules apply, independent of whether it is portfolio returns or returns from a single asset:

df['naive_pf_log_returns'] = np.log(df.naive_pf_return + 1)
df

	A	B	A_return	B_return	naive_pf_return	naive_pf_log_returns
0	100	100	NaN	NaN	NaN	NaN
1	110	120	0.100000	0.200000	0.150000	0.139762
2	121	124	0.100000	0.033333	0.066667	0.064539
3	110	118	-0.090909	-0.048387	-0.069648	-0.072192
4	132	117	0.200000	-0.008475	0.095763	0.091451
5	105	135	-0.204545	0.153846	-0.025350	-0.025676
6	112	128	0.066667	-0.051852	0.007407	0.007380
7	105	115	-0.062500	-0.101562	-0.082031	-0.085592

log_return_total = df.naive_pf_log_returns.sum().round(4)
print(f'log return over the whole period: {log_return_total.round(4)}')
print(f'log return over the whole period: {(np.exp(log_return_total) - 1).round(4)}')

log return over the whole period: 0.1197
log return over the whole period: 0.1272

Characteristics of returns#

Usually, returns exhibit the following:

expected returns are close to zero (the shorter the time span, the smaller the expected return)
weakly stationary (i.e. constant expected value and variance over time) but usually volatility clustering
skewed distribution

From these items alone, we can start an analysis of stock returns by looking at some (standardized) moments of the empirical data:

the average return as an estimate of the expected return
the empirical variance or standard deviation/volatility
skewness (if negative: left skewed)
(excess) kurtosis (larger $\rightarrow$ fat tails)

Use pandas, by calling the appropriate methods.

We will have a look at real-world data, downloading close prices using the yfinance package and calculating the returns.

import yfinance as yf

msft = yf.Ticker('MSFT').history(start="2020-01-01")

msft = msft[['Close']]
msft['daily_return'] = msft['Close'].pct_change()
msft.dropna(inplace=True)

avg_return = msft.daily_return.mean()
vola = msft.daily_return.std()
skew = msft.daily_return.skew()
kurtis = msft.daily_return.kurt()


print(f'average return {np.round(avg_return,4)}')
print(f'volatility {np.round(vola, 4)}')
print(f'skewness {np.round(skew, 4)}')
print(f'kurtosis {np.round(kurtis, 4)}')

average return 0.0011
volatility 0.0197
skewness 0.0117
kurtosis 6.9091

As we discussed in earlier chapter, it is always recommended to take a look at some charts.
We can plot returns over time as well as look at the distribution.

msft.daily_return.plot()

<AxesSubplot: xlabel='Date'>

_images/93a9128fb27b61b7ab16666c4a82b8b1ad3956a75d5668396aa569e54c0cd02e.png

Over time, we see that the volatility is far from constant.

To look at the distribution, we already know which plot to utilise.

import matplotlib.pyplot as plt
import seaborn as sns

sns.histplot(msft.daily_return, color='forestgreen', kde=True)

<AxesSubplot: xlabel='daily_return', ylabel='Count'>

_images/48798033142c2c5f41b342328fcfcf3bda79d3fcfa74fdb2b26ea43e13cf2c31.png

From the kernel density estimation (solid line), we see the left-skewedness of the distribution.

time \(t\)	0	1	2	3	4	5	6	7
prices \(S_t\)	100	110	121	110	132	105	112	105
return \(r_t\)	—	0.10	0.10	-0.09	0.2	-0.20	0.07	-0.06

time \(t\)	0	1	2	3	4	5	6	7
prices \(S_t\)	100	110	121	110	132	105	112	105
log return \(r_t^{log}\)	—	0.10	0.10	-0.10	0.18	-0.23	0.06	-0.06

time \(t\)	0	1	2	3	4	5	6	7
company A \(A_t\)	100	110	121	110	132	105	112	105
company B \(B_t\)	100	120	124	118	117	135	128	115