In this coursework, you will delve into and replicate selected elements of the research detailed in the paper End-to-End Policy Learning of a Statistical Arbitrage Autoencoder Architecture. However, we will not reproduce the entire study.
Overview¶
This study redefines Statistical Arbitrage (StatArb) by combining Autoencoder architectures and policy learning to generate trading strategies. Traditionally, StatArb involves finding the mean of a synthetic asset through classical or PCA-based methods before developing a mean reversion strategy. However, this paper proposes a data-driven approach using an Autoencoder trained on US stock returns, integrated into a neural network representing portfolio trading policies to output portfolio allocations directly.
Coursework Goal¶
This coursework will replicate these results, providing hands-on experience in implementing and evaluating this innovative end-to-end policy learning Autoencoder within financial trading strategies.
Outline¶
- Data Preparation and Exploration
- Fama French Analysis
- PCA Analysis
- Ornstein Uhlenbeck
- Autoencoder Analysis
Description: The Coursework is graded on a 100 point scale and is divided into five parts. Below is the mark distribution for each question:
| Problem | Question | Number of Marks |
|---|---|---|
| Part A | Question 1 | 4 |
| Question 2 | 1 | |
| Question 3 | 3 | |
| Question 4 | 3 | |
| Question 5 | 1 | |
| Question 6 | 3 | |
| Part B | Question 7 | 1 |
| Question 8 | 5 | |
| Question 9 | 4 | |
| Question 10 | 5 | |
| Question 11 | 2 | |
| Question 12 | 3 | |
| Part C | Question 13 | 3 |
| Question 14 | 1 | |
| Question 15 | 3 | |
| Question 16 | 2 | |
| Question 17 | 7 | |
| Question 18 | 6 | |
| Question 19 | 3 | |
| Part D | Question 20 | 3 |
| Question 21 | 5 | |
| Question 22 | 2 | |
| Part E | Question 23 | 2 |
| Question 24 | 1 | |
| Question 25 | 3 | |
| Question 26 | 10 | |
| Question 27 | 1 | |
| Question 28 | 3 | |
| Question 29 | 3 | |
| Question 30 | 7 |
Please read the questions carefully and do your best. Good luck!
Objectives¶
1. Data Preparation and Exploration¶
Collect, clean, and prepare US stock return data for analysis.
2. Fama French Analysis¶
Utilize Fama French Factors to isolate the idiosyncratic components of stock returns, differentiating them from market-wide effects. This analysis helps in understanding the unique characteristics of individual stocks relative to broader market trends.
3. PCA Analysis¶
Employ Principal Component Analysis (PCA) to identify hidden structures and reduce dimensionality in the data. This method helps in extracting significant patterns that might be obscured in high-dimensional datasets.
4. Ornstein-Uhlenbeck Process¶
Analyze mean-reverting behavior in stock prices using the Ornstein-Uhlenbeck process. This stochastic process is useful for modeling and forecasting based on the assumption that prices will revert to a long-term mean.
5. Building a Basic Autoencoder Model¶
Construct and train a standard Autoencoder to extract residual idiosyncratic risk.
Libraries¶
#Data Download
import requests as re
from bs4 import BeautifulSoup
import yfinance as yf
#Data Management
import pandas as pd
import numpy as np
#Statistical Analysis
import statsmodels.api as sm
#Warnings
import warnings
#Visualisation
import matplotlib.pyplot as plt
import seaborn as sns
#import ace_tools as tools
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import train_test_split
Data Preparation and Exploration¶
Q1: (4 Marks)
Write a Python function that accepts a URL parameter and retrieves the NASDAQ-100 companies and their ticker symbols by scraping the relevant Wikipedia page using Requests and BeautifulSoup. Your function should return the data as a list of tuples, with each tuple containing the company name and its ticker symbol. Then, call your function with the appropriate Wikipedia page URL and print the data in a 'Company: Ticker' format.