Stock Market Performance: A look at Russell indexes in R

Monday, June 12, 2017 - 10:15

I just completed the annual "maintenance" on my little stock market indexes returns "app". I've supported a variant of the script for seven years, changing it pretty significantly year to year. The 2016 version was half python and half R, but this year I opted for R entirely. Who knows, maybe next will be all-in python.

The script serves two purposes: the first is to keep me apprised of daily stock market performance; the second, to test the latest language and visualization goodies in R and python. Goodness knows there's no shortage for either language.

 

My starting point is the file download page for Russell Indexes, certainly among the most revered source in the industry. Each evening after market close I run this notebook, which first downloads and munges the latest Russell index data, then computes and graphs portfolio performance, measured as the growth of a hypothetical initial $1 investment. Typically I look at year-to-date and two year-to-date performance, but the functions can handle any start date for which return data's available. The 82 files accessed comprise 41 separate portfolios, each sourced from a current year and historical data file.

 

The remainder of this script reads, wrangles, computes and graphs Russell index returns, one notebook cell at a time.

 

Load R libraries and set the working directory.

In [70]:
options(warn=-1)
options(scipen = 10)

suppressMessages(library(tidyverse))
suppressMessages(library(data.table))
suppressMessages(library(dtplyr))
suppressMessages(library(feather))
suppressMessages(library(lubridate))
suppressMessages(library(ggthemes))
suppressMessages(library(quantmod))

setwd("c:/data/russell/2017")
 

Define the ubiquitous frequenciesdyn function.

In [71]:
frequenciesdyn <- function(DTstr, xstr)
{    
        return(eval(parse(text=sprintf('%s[,.(count=.N),.(%s)]', DTstr, xstr))) %>% 
               arrange(desc(count)) %>% mutate(percent=100*count/sum(count)))
}
 

Set the url strings for historical and year-to date daily values files for selected Russell portfolios. In the end, I work primarily with the U.S. market Top 200, 1000, 3000, Midcap, 2500, and 2000 Russell indexes.

In [72]:
urllst <- c(
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US3000.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US3000.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US2000.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US2000.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US1000.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US1000.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US5015.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US5015.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US5012.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US5012.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US3001.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US3001.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US5014.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US3002.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US3002.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US5014.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US5013.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US5013.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US1001.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US1001.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US1002.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US1002.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US2001.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US2001.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US2002.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US2002.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US5006.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US5006.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US5007.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US5007.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US5016.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US5016.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US5010.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US5010.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US5011.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US5011.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US5003.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US5003.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_R04550.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_R04550.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_R04900.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_R04900.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_R91353.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_R91353.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_R89950.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_R89950.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_R04375.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_R04375.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_D94620.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_D94620.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_D41430.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_D41430.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_D41432.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_D41432.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_D94622.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_D94622.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_D62132.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_D62132.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_D94621.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_D94621.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_US4000.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_US4000.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_D94643.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_D94643.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_D94642.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_D94642.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_D94639.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_D94639.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_D94636.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_D94636.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_D94641.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_D94641.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_D94638.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_D94638.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_D94640.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_D94640.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_D94637.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_D94637.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_D94680.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_D94680.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valuesytd_D41431.csv",
"http://www.ftse.com/products/russell-index-values/home/getfile?id=valueshist_D41431.csv"    
)
 

Read the individual files and consolidate into the wdta data.table with attributes name, pdate, idxwodiv, and idxwdiv. Write the data.table to a text file.

In [73]:
ptmpre <- proc.time()

wdta <- rbindlist(map(urllst,fread),use.names=TRUE, fill=TRUE)[,1:4]
setnames(wdta, c("name",'pdate','idxwodiv','idxwdiv'))

fname <- "russellwork.csv"
fwrite(wdta,fname)

str(wdta)

ptmpost <- proc.time()
print(ptmpost-ptmpre)
 
Classes 'data.table' and 'data.frame':	166356 obs. of  4 variables:
 $ name    : chr  "Russell 3000®" "Russell 3000®" "Russell 3000®" "Russell 3000®" ...
 $ pdate   : chr  "06/05/2017" "06/02/2017" "06/01/2017" "05/31/2017" ...
 $ idxwodiv: num  2649 2654 2644 2620 2621 ...
 $ idxwdiv : num  7355 7368 7340 7273 7274 ...
 - attr(*, ".internal.selfref")=<externalptr> 
   user  system elapsed 
   1.94    0.50   33.32 
 

Start cleaning the dirty data by eliminating garbage characters and shortening the value names. Convert the date string to an R date variable.

In [74]:
dta <- copy(wdta)

dta$pdate <- mdy(dta$pdate)
dta$name <- gsub('[®,™,®,â„¢,Russell, ]','',dta$name)
dta$name <- gsub('Growth','G',dta$name)
dta$name <- gsub('Value','V',dta$name)
dta$name <- gsub('Va','V',dta$name)

str(dta)
 
Classes 'data.table' and 'data.frame':	166356 obs. of  4 variables:
 $ name    : chr  "3000" "3000" "3000" "3000" ...
 $ pdate   : Date, format: "2017-06-05" "2017-06-02" ...
 $ idxwodiv: num  2649 2654 2644 2620 2621 ...
 $ idxwdiv : num  7355 7368 7340 7273 7274 ...
 - attr(*, ".internal.selfref")=<externalptr> 
 

One data problem observed several years ago is a duplication of the final index record from last year in both the historical and year-to-date files for each index. Identify and eliminate one of the duplicates for each index. Check that the de-duping succeded.

In [75]:
frequenciesdyn("dta","name,pdate,idxwodiv,idxwdiv")[count>1] %>% nrow

cdta <- copy(dta)

nrow(cdta)

ndta <- rbind(
cdta[,`:=`(count=.N),.(name,pdate,idxwodiv,idxwdiv)][count>1][(order(name,pdate))][,.SD[.N],.(name,pdate,idxwodiv,idxwdiv)][,c(1:4)],
cdta[,`:=`(count=.N),.(name,pdate,idxwodiv,idxwdiv)][count==1][(order(name,pdate))][,.SD[.N],.(name,pdate,idxwodiv,idxwdiv)][,c(1:4)])
    
nrow(ndta)

frequenciesdyn("ndta","name,pdate,idxwodiv,idxwdiv")[count>1] %>% nrow
 
41
 
166356
 
166315
 
0
 

Alas, there remain problem duplicate records across name, idxwodiv, and idxwdiv, generally surrounding holidays. Eliminate these duplicates, keeping the final one for each repeated group. The cell frequencies invocation at the end confirms that the dups are indeed deleted.

In [76]:
slug <- frequenciesdyn("ndta","name,idxwodiv,idxwdiv")[count>1] 

nndta <- copy(ndta)

nrow(nndta)

nndta <- rbind(
ndta[,`:=`(count=.N),.(name,idxwodiv,idxwdiv)][count>1][(order(name))][,.SD[.N],.(name,idxwodiv,idxwdiv)][,c(1:4)],
ndta[,`:=`(count=.N),.(name,idxwodiv,idxwdiv)][count==1][(order(name))][,.SD[.N],.(name,idxwodiv,idxwdiv)][,c(1:4)])
    
nrow(nndta)

frequenciesdyn("nndta","name,idxwodiv,idxwdiv")[count>1]
 
166315
 
161559
 
nameidxwodividxwdivcountpercent
 

Now "melt" the nndta data.table around idxwodiv/idxwdiv to the new ndtamelt, which is longer and "skinnier" than nndta. Compute daily percent change for each portfolio.

In [77]:
ndtamelt <- melt(nndta,id.vars=c("name","pdate"), variable.name="type")[,pctch := quantmod::Delt(value),.(name,type)][!is.na(pctch)] %>% arrange(name,pdate)
str(ndtamelt)
 
Classes 'data.table' and 'data.frame':	323036 obs. of  5 variables:
 $ name : chr  "1000" "1000" "1000" "1000" ...
 $ pdate: Date, format: "2005-01-03" "2005-01-03" ...
 $ type : Factor w/ 2 levels "idxwodiv","idxwdiv": 1 2 1 2 1 2 1 2 1 2 ...
 $ value: num  1240 2772 1225 2738 1219 ...
 $ pctch: num  -0.518 -0.6251 -0.0125 -0.0123 -0.0045 ...
 - attr(*, ".internal.selfref")=<externalptr> 
 

Write out data sets for the final data.tables.

In [78]:
fname <- "russell.csv"
fwrite(nndta,fname)

fname <- "russellmelt.csv"
fwrite(ndtamelt,fname)

fname <- "russellmelt.feather"
write_feather(ndtamelt,fname)
 

Now define a function to subset the "tall" data.table for a given start date. Include the major Russell U.S. size portfolios Top200, 1000, 3000, Midcap, 2500, and 2000. For each size, include "Balanced", "Growth", and "Value" potfolios. Calculate the growth of $1 over the timeframe from "sdte".

In [79]:
mkdata <- function(dtamelt, sdte="2005-01-01")
{
    portlist <-  c("2000","2000V", "2000G", "3000", "3000V", "3000G", 
               "2500","2500V","2500G", "Midcap","MidcapV", "MidcapG",
               "Top200","Top200V","Top200G", "1000","1000V", "1000G")

    maxdte <- max(dtamelt[type=="idxwdiv" & is.element(name,portlist)]$pdate)
    startdte <- ymd(sdte)
    if(startdte>maxdte)
    {
        print(c(startdte,maxdte))
        return(NULL)
    }
    grphdf <- copy(dtamelt)[type=="idxwdiv" & 
                pdate >= startdte & is.element(name,portlist)] %>% arrange(name, pdate)
    grphdf[,`:=`(grdollar=cumprod(1+pctch)),name]
    
    g <- 1
    m <- min(grphdf$pdate)-1
    u <- unique(grphdf$name)
    lu <- length(u)

    ngrphdf <- rbind(copy(grphdf)[,c("name","pdate","grdollar")],
                     data.frame(name=u,pdate=rep(m,lu),grdollar=rep(g,lu))) 
    
    return(ngrphdf %>% arrange(name,pdate))
}
 

Graph the data computed in mkdata using ggplot2. The facets are ordered by portfolio company size from left to right and top to bottom -- with Top200 representing the 200 largest companies in the Russell 3000, and the 2000 denoting the smallest 2000 firms in the 3000.

In [80]:
mkgraph <- function(dtamelt, sdte="2005-01-01")
{
   
    ngrphdf <- mkdata(dtamelt,sdte)
    
    sizelst <- c("Top200", "1000", "3000", "Midcap", "2500","2000")

    invisible(map(sizelst,function(nm) ngrphdf[grep(nm,name),size:=nm]))    
    ngrphdf$size <- factor(ngrphdf$size,levels=sizelst)   
        
    ngrphdf[,typeport:="Balanced"][grep("G",name),typeport:="Growth"][grep("V",name),typeport:="Value"]

    g1 <- ggplot(ngrphdf, 
    aes(y=grdollar, x=pdate, color=typeport)) +
    geom_line(size=rel(0.8)) +
    theme(plot.title = element_text(size = 10,colour="black")) +  
    theme(axis.text=element_text(size=rel(.8)),axis.title=element_text(size=rel(.8))) +  
    theme(legend.text = element_text(size=rel(0.7)), legend.title=element_text(size=rel(.7)), 
          legend.position="bottom", legend.box="horizontal") + 
                scale_fill_manual(values=c("blue","red","green")) +
    facet_wrap(~ size, ncol=3) + 
    geom_hline(yintercept=1,col="black", size=.5) +
    theme(panel.grid.major = element_line(colour = "white")) +
    theme(strip.text.x = element_text(size = 8)) + 
    theme(axis.text.x = element_text(angle = 45, size=6), axis.title.x = element_text(size=7)) +
    ggtitle(paste("Russell Portfolios -- ", min(ngrphdf$pdate)+1," to ", 
                      max(ngrphdf$pdate), "\n\n",sep="")) +
    theme(legend.title = element_blank()) + 
    ylab("\nGrowth of $1") +
    xlab("Date\n\n")
        
    return(g1)
}
 

Graph the growth of $1 in 2017 through June 5 for eighteen Russell portfolios. Note how "larger" company portfolios from top left to bottom right are better-performing so far this year, as are growth portfolios compared to balanced and value. The Top200 growth index is up over 30\%, while 2000 value is in the red for the year.

In [81]:
g1 <- mkgraph(ndtamelt,sdte="2017-01-01")
print(g1)
 
 

Small and value reigned in 2016, however, demonstrating a year-to-year change in styles. An individual who invested \$1000 in a Russell 2000 value index portfolio at the beginning of 2016 would be smiling at the over \$1600 she'd be counting now.

In [82]:
g2 <- mkgraph(ndtamelt,sdte="2016-01-01")
print(g2)
 
 

Finally, generate growth of $1 return graphs for all downloaded portfolios to a pdf file.

In [83]:
pdf("russellall.pdf",width=9)

dogrph <- function(nm)
{
    grphdf <- ndtamelt[name==nm & type=="idxwdiv",.(pdate=pdate,pctch=pctch)][order(pdate)][,`:=`(grdollar=cumprod(1+pctch))]
    g1 <- ggplot(grphdf, 
        aes(y=grdollar, x=pdate)) +
        geom_line(size=rel(0.8)) +
        labs(x="Date",y="Growth of $1",title=paste(nm, " -- ", min(grphdf$pdate)," to ", max(grphdf$pdate), "\n\n",sep=""))
    print(g1)
    return(NULL)
}
invisible(map(ndtamelt[type=="idxwdiv", .N, name]$name,dogrph))

dev.off()
 
png: 2
 

That's it till next month!

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Contact us today to find out how Inquidia can show you how to collect, integrate and enrich your data. We do data. You can, too.

Would you like to know more?

Sign up for our fascinating (albeit infrequent) emails. Get the latest news, tips, tricks and other cool info from Inquidia.