Quantum Finance Application on Portfolio Diversification¶
Copyright (c) 2021 Institute for Quantum Computing, Baidu Inc. All Rights Reserved.
Overview¶
Current finance problems can be mainly tackled by three areas of quantum algorithms: quantum simulation, quantum optimization, and quantum machine learning [1,2]. Many financial problems are essentially combinatorial optimization problems, and corresponding algorithms usually have high time complexity and are difficult to implement. Due to the power of quantum computing, these complex problems are expected to be solved by quantum algorithms in the future.
The Quantum Finance module of Paddle Quantum focuses on quantum optimization: how to apply quantum algorithms in real finance optimization problems. This tutorial focuses on how to use quantum algorithms to solve the portfolio diversification problem.
Portfolio Diversification Problem¶
Limited by the lack of professional knowledge and market experience, the investor prefers a passive investment strategy in the actual investment. Index investing is a typical example of passive investing, e.g. an investor buys and holds the Standard & Poor’s 500 (S&P 500) for a long period of time. As an investor, if you do not want to invest in an existing index portfolio, you can also create your specific index portfolio by picking representative stocks from the market.
An important way to balance risk and return in an investment portfolio is to diversify your assets. A specific description of portfolio diversification is as follows: the number of investable stocks is $n$ and the number of stocks included in the portfolio is $K$. Based on some criteria, you need to divide all the stocks into $K$ categories and select the stock from each category that best represents that category. Adding representatives of each category to the index portfolio is better for investment management.
Encoding Portfolio Diversification Problem¶
To transform the portfolio diversification problem into a problem applicable for parameterized quantum circuits, we need to encode the portfolio diversification problem into a Hamiltonian.
To model the problem, two issues need to be clarified. The first one is how to classify different stocks, and the second one is what criteria are used to select representative stocks. To solve these two problems, firstly we define the similarity $\rho_{ij}$ between stock $i$ and stock $j$:
- $\rho_{ii} = 1 \quad $ The stock is similar to itself with a similarity of 1
- $\rho_{ij} \leq 1 \quad$ The larger $\rho_{ij}$ , the higher the similarity between stock $i$ and stock $j$
Due to the correlation of returns between stocks, we can further measure the similarity between the time series on the basis of the covariance matrix. Dynamic Time Warping (DTW) is a common method to measure the similarity of two time series. In this paper, the DTW algorithm is used to calculate the similarity between two stocks. So based on the similarity between different stocks, we can classify the stocks and select representative stocks in each category. We can define $n$ binary variables $x_{ij}$ and $1$ binary variables $y_j$ for each stock. Therefore, given $n$ stocks, there are $n^2 + n$ binary variables. For the variable $x_{ij}$, $i$ denotes the order of stock, and $j$ denotes the position among the $n$ binary variables corresponding to that stock. If two stock has the same index of $j$, they are classified in the same category. Meanwhile, the stock of $i = j$ is the most representative one in that category selected for the index portfolio:
$$ x_{ij}= \begin{cases} 1, & \text{stock $j$ is in the portfolio and it has the highest similarity to stock $i$}\\ 0, & \text{otherwise} \end{cases}, $$$$ y_{j}= \begin{cases} 1, & \text{stock $j$ is selected to the index portfolio}\\ 0, & \text{otherwise} \end{cases}. $$The model can be written as follows:
$$ \mathcal{M}= \max_{x_{ij}}\sum_{i=1}^n\sum_{j=1}^n \rho_{ij}x_{ij}. \tag{1} $$The model needs to satisfy the following constraints:
- Clustering constraint: the index portfolio only include $K$ stocks
- $ \sum_{j=1}^n y_j = K$
- Integer constraint: a stock is either in the index portfolio or not
- $ x_{ij},y_j\in{\{0,1\}}, \forall i = 1, \dots,n; j = 1, \dots, n$
- Consistency constraint: if a stock can represent another stock, it must be in the index portfolio
- $\sum_{j=1}^n x_{ij} = 1, \forall i = 1,\dots,n$
- $x_{ij} \leq y_j, \forall i = 1,\dots,n; j = 1,\dots, n$
- $x_{jj} = y_j, \forall j = 1,\dots,n$
The objective of the model is to maximize the similarity between the $n$ stocks and the selected index stock portfolio.
Since the loss function is to be optimized using the gradient descent method, some modifications made to the loss function based on the model equation and constraints:
$$ \begin{aligned} C_x &= -\sum_{i=1}^{n}\sum_{j=1}^{n}\rho_{ij}x_{ij} + A\left(K- \sum_{j=1}^n y_j \right)^2 + \sum_{i=1}^n A\left(\sum_{j=1}^n 1- x_{ij} \right)^2 \\ &\quad + \sum_{j=1}^n A\left(x_{jj} - y_j\right)^2 + \sum_{i=1}^n \sum_{j=1}^n A\left(x_{ij}(1 - y_j)\right).\\ \end{aligned} \tag{2} $$The first term represents similarity maximization, the next four terms are constraints. $A$ is the penalty parameter, which is usually set to a larger number so that the final binary string representing the index portfolio results satisfies the constraints.
We now need to transform the cost function $C_x$ into a Hamiltonian to realize the encoding of the portfolio diversification problem. Each variable $x_{ij}$ has two possible values, $0$ and $1$, corresponding to quantum states $|0\rangle$ and $|1\rangle$. Note that every variable corresponds to a qubit and so $n^2 + n$ qubits are needed for solving the portfolio diversification problem. The Pauli $Z$ operator has two eigenstates which are the same as the states $|0\rangle$ and $|1\rangle$ . Their corresponding eigenvalues are 1 and -1, respectively. So we consider encoding the cost function as a Hamiltonian using the Pauli $Z$ matrix.
Now we would like to consider the mapping
$$ x_{ij} \mapsto \frac{I-Z_{ij}}{2}, \tag{3} $$where $Z_{ij} = I \otimes I \otimes \ldots \otimes Z \otimes \ldots \otimes I$ with $Z$ operates on the qubit at position $ij$. Under this mapping, the value of $x_{ij}$ represent different meanings. If the qubit $ij$ is in state $|1\rangle$, then $x_{ij} |1\rangle = \frac{I-Z_{ij}}{2} |1\rangle = 1|1\rangle $, which means stock $i$ is in index portfolio. Also, for the qubit $ij$ in state $|0\rangle$, $x_{ij}|0\rangle = \frac{I-Z_{ij}}{2} |0\rangle = 0 |0\rangle $.
Thus using the above mapping, we can transform the cost function $C_x$ into a Hamiltonian $H_C$ for the system of $n^2+n$ qubits and realize the quantumization of the portfolio diversification problem. Then the ground state of $H_C$ is the optimal solution to the portfolio diversification problem. In the following section, we will show how to use a parametrized quantum circuit to find the ground state, i.e., the eigenvector with the smallest eigenvalue.
Paddle Quantum Implementation¶
To investigate the portfolio diversification problem using Paddle Quantum, there are some required packages to import, which are shown below.
# Import packages needed
import numpy as np
import pandas as pd
import datetime
# Import related modules from Paddle Quantum and PaddlePaddle
import paddle
import paddle_quantum
from paddle_quantum.ansatz import Circuit
from paddle_quantum.finance import DataSimulator, portfolio_diversification_hamiltonian
Prepare experimental data¶
In this tutorial, we choose stocks as investment assets. For the data used in the experimental tests, two options are provided:
- The first method is to generate random data according to certain requirements, e.g. number of assets.
If the user prepares data using this method, then when initializing the data, it is necessary to give the list of parameters: a list of names of investable stocks (assets), the start date, and the end date of the trading data.
num_assets = 3 # Number of investable projects
stocks = [("TICKER%s" % i) for i in range(num_assets)]
data = DataSimulator( stocks = stocks, start = datetime.datetime(2016, 1, 1), end = datetime.datetime(2016, 1, 30))
data.randomly_generate() # Generate random data
- The second method is that the user can choose to set the data themselves, i.e. real stock data collected by themselves. Considering that the number of stocks contained in the file may be large, the user can specify the number of stocks used for this experiment, i.e.,
num_assets
as initialized above.
We collect the closing prices of $12$ stocks for $35$ trading days into the realStockData_12.csv
file, where we choose to read only the first $3$ stocks.
In this tutorial, we choose to read real data as experimental data.
df = pd.read_csv('realStockData_12.csv')
dt = []
for i in range(num_assets):
mylist = df['closePrice'+str(i)].tolist()
dt.append(mylist)
# Output the closing price of the seven stocks read from the file for the 35 trading days
print(dt)
# Specify the experimental data as a local file read by the user
data.set_data(dt)
[[16.87, 17.18, 17.07, 17.15, 16.66, 16.79, 16.69, 16.99, 16.76, 16.52, 16.33, 16.39, 16.45, 16.0, 16.09, 15.54, 13.99, 14.6, 14.63, 14.77, 14.62, 14.5, 14.79, 14.77, 14.65, 15.03, 15.37, 15.2, 15.24, 15.59, 15.58, 15.23, 15.04, 14.99, 15.11, 14.5], [32.56, 32.05, 31.51, 31.76, 31.68, 32.2, 31.46, 31.68, 31.39, 30.49, 30.53, 30.46, 29.87, 29.21, 30.11, 28.98, 26.63, 27.62, 27.64, 27.9, 27.5, 28.67, 29.08, 29.08, 29.95, 30.8, 30.42, 29.7, 29.65, 29.85, 29.25, 28.9, 29.33, 30.11, 29.67, 29.59], [5.4, 5.48, 5.46, 5.49, 5.39, 5.47, 5.46, 5.53, 5.5, 5.47, 5.39, 5.35, 5.37, 5.24, 5.26, 5.08, 4.57, 4.44, 4.5, 4.56, 4.52, 4.59, 4.66, 4.67, 4.66, 4.72, 4.84, 4.81, 4.84, 4.88, 4.89, 4.82, 4.74, 4.84, 4.79, 4.63]]
Encoding Hamiltonian¶
Here we construct the Hamiltonian $H_C$ of Eq. (2) with the replacement in Eq. (3).
In the process of encoding Hamiltonian, we first need to calculate the similarity matrix $\rho$ between the returns of each stock, which is available in the finance
module and can be called directly.
rho = data.get_similarity_matrix()
Based on the provided and calculated parameters, the Hamiltonian is constructed below. Here we set the penalty parameter to the number of investable stocks.
q = 2 # Number of stocks in the index portfolio
penalty = num_assets # penalty parameter
hamiltonian = portfolio_diversification_hamiltonian(penalty, rho, q)
Calculating the loss function¶
We adopt a parameterized quantum circuit consisting of $U_3(\vec{\theta})$ and $\text{CNOT}$ gates, that can be constructed by calling the built-in method complex_entangled_layer()
.
After running the quantum circuit, we obtain the circuit output $|\vec{\theta }\rangle$. From the output state of the circuit we can calculate the objective function, and also the loss function of the portfolio diversification problem:
$$ L(\vec{\theta}) = \langle\vec{\theta}|H_C|\vec{\theta}\rangle. \tag{4} $$We then use a classical optimization algorithm to minimize this function and find the optimal parameters $\vec{\theta}^*$. The following code shows a complete network built with Paddle Quantum and PaddlePaddle.
class PDNet(paddle.nn.Layer):
def __init__(self, num_qubits, p, dtype="float64"):
super(PDNet, self).__init__()
self.num_qubits = num_qubits
self.depth = p
self.cir = Circuit(self.num_qubits)
self.cir.complex_entangled_layer(depth=self.depth)
def forward(self):
"""
Forward propagation
"""
state = self.cir(init_state)
loss = loss_func(state)
return loss, self.cir
Training the quantum neural network¶
After defining the quantum neural network, we use the gradient descent method to update the parameters to minimize the expectation value in Eq. (4).
SEED = 1100 # Set a global RNG seed
p = 2 # Number of layers in the quantum circuit
ITR = 150 # Number of training iterations
LR = 0.4 # Learning rate of the optimization method based on gradient descent
Here, we optimize the network defined above in PaddlePaddle.
# number of qubits
n = len(rho)
# Fix paddle random seed
paddle.seed(SEED)
# number of qubits need in circuit
num_qubits = n * (n+1)
# Building Quantum Neural Networks
net = PDNet(num_qubits, p)
# Define the initial state
init_state = paddle_quantum.state.zero_state(num_qubits)
# Define loss function
loss_func = paddle_quantum.loss.ExpecVal(hamiltonian)
# Use Adam optimizer
opt = paddle.optimizer.Adam(learning_rate=LR, parameters=net.parameters())
# Gradient descent iteration
for itr in range(1, ITR + 1):
# run circuit
loss, cir = net()
# compute gradient and optimize
loss.backward()
opt.minimize(loss)
opt.clear_grad()
if itr % 10 == 0:
print("iter:", itr, " loss:", "%.4f"% loss.numpy())
iter: 10 loss: 7.7805 iter: 20 loss: 5.4413 iter: 30 loss: 3.6023 iter: 40 loss: 3.2911 iter: 50 loss: 1.9415 iter: 60 loss: 0.3871 iter: 70 loss: 0.1342 iter: 80 loss: 0.0774 iter: 90 loss: 0.0122 iter: 100 loss: 0.0068 iter: 110 loss: -0.0001 iter: 120 loss: -0.0019 iter: 130 loss: -0.0025 iter: 140 loss: -0.0028 iter: 150 loss: -0.0028
Decoding the quantum solution¶
After obtaining the minimum value of the loss function and the corresponding set of parameters $\vec{\theta}^*$, our task has not been completed. In order to obtain an approximate solution to the portfolio diversification problem, it is necessary to decode the solution to the classical optimization problem from the quantum state $|\vec{\theta}^*\rangle$ output by the circuit. Physically, to decode a quantum state, we need to measure it and then calculate the probability distribution of the measurement results:
$$ p(z) = |\langle z|\vec{\theta}^*\rangle|^2. \tag{5} $$In the case of quantum parameterized circuits with sufficient expressiveness, the greater the probability of a certain bit string, the greater the probability that it corresponds to an optimal solution to the portfolio diversification problem.
Paddle Quantum provides a function to read the probability distribution of the measurement results of the state output by the quantum circuit:
# Repeat the simulated measurement of the circuit output state 2048 times
final_state = cir(init_state)
prob_measure = final_state.measure(shots=2048)
investment = max(prob_measure, key=prob_measure.get)
print("The bit string form of the solution: ", investment)
The bit string form of the solution: 100001001101
After measurement, we have found the bit string with the highest probability of occurrence, the index portfolio in the form of the bit string. As the result above 100001001101
, we have $n = 3$ investable stocks and choose two for the index portfolio。The first $n^2 = 9$ bits of 100001001
represent $x_{ij}$, and every $3$ bits are grouped together. The first bit of the first group of 100
is set to $1$, which means it is classified as a class. The third bit in the second group 001
and the third group 001
is set to $1$, which means they are classified as one class. Also, the positions of $1$ in the first and third groups are satisfied with $i = j$, i.e., these two stocks are the most representative stock of their respective classes. It can be seen that $1$ appears at $j = 1$ and $j = 3$, i.e., two positions are possible for $1$, which corresponds to our presumption of having two stocks in the index portfolio.
The last $3$ position is 101
, which represents $y_j$, indicating that the first stock and the third stock are selected for the index portfolio. If the final result is not such a valid solution as described above, users can still get a better training result by adjusting the parameter values of the parameterized quantum circuit.
Conclusion¶
In this tutorial, we focus on how to classify investable stocks and how to select representative ones for our portfolio. In this problem, each investment item requires $n$ qubits to represent the classification and $1$ qubit to represent whether it is selected for the portfolio or not. Due to the limitation of the number of qubits, the number of investment items that can be handled is still small.
References¶
[1] Orus, Roman, Samuel Mugel, and Enrique Lizaso. "Quantum computing for finance: Overview and prospects." Reviews in Physics 4 (2019): 100028.
[2] Egger, Daniel J., et al. "Quantum computing for Finance: state of the art and future prospects." IEEE Transactions on Quantum Engineering (2020).