This report explores stock prices and returns for companies listed on NASDAQ stock exchange. It includes exploring correlations between multiple stock stats for different sectors in the past 50 working days, as of April 30, 2017.
I am trying to identify the general trends of price and return to get a better idea of what should individual daily traders focus more on to generate more returns, at the highest capital utilization possible.
In other words: What are the cheapest stocks that generate the highest returns on daily basis
The dataset is obtained from Yahoo Finance using tidyquant
R package, which produces the data in a tidy “tibble” based on specific arguments given to tq_exchange()
and tq_get()
functions.
Here is a glimpse of the dataset:
## Observations: 1,839
## Variables: 58
## $ symbol <chr> "AAAP", "AAL", "AAM...
## $ sector <chr> "Health Care", "Tra...
## $ Ask <dbl> 38.21, 43.12, 3.80,...
## $ Ask.Size <dbl> 100, 300, 200, 100,...
## $ Average.Daily.Volume <dbl> 119633, 6971160, 27...
## $ Bid <dbl> 38.16, 43.11, 3.70,...
## $ Bid.Size <dbl> 100, 500, 1100, 300...
## $ Book.Value <dbl> 7.300, 6.970, 5.160...
## $ Change <dbl> 0.2000, 1.0400, -0....
## $ Change.From.200.day.Moving.Average <dbl> 3.8200, -1.5700, -0...
## $ Change.From.50.day.Moving.Average <dbl> 0.0500, 0.2900, -0....
## $ Change.From.52.week.High <dbl> -2.7200, -7.5200, -...
## $ Change.From.52.week.Low <dbl> 14.7000, 18.2700, 0...
## $ Change.in.Percent <dbl> 0.005300, 0.024700,...
## $ Currency <chr> "USD", "USD", "USD"...
## $ Days.High <dbl> 38.390, 43.130, 3.7...
## $ Days.Low <dbl> 37.980, 42.050, 3.6...
## $ Days.Range <chr> "37.98 - 38.39", "4...
## $ Dividend.Pay.Date <chr> NA, "5/30/17", "4/2...
## $ Dividend.per.Share <dbl> NA, 0.40, 0.02, NA,...
## $ Dividend.Yield <dbl> NA, 0.94, 0.51, NA,...
## $ EBITDA <dbl> -7.89e+06, 7.16e+09...
## $ EPS <dbl> -0.670, 4.170, 0.11...
## $ EPS.Estimate.Current.Year <dbl> -0.70, 4.64, NaN, 3...
## $ EPS.Estimate.Next.Quarter <dbl> -0.16, 1.56, 0.00, ...
## $ EPS.Estimate.Next.Year <dbl> 0.31, 5.34, NA, 4.1...
## $ Ex.Dividend.Date <chr> NA, "2/9/17", "4/11...
## $ Float.Shares <dbl> 3.25e+07, 4.46e+08,...
## $ High.52.week <dbl> 40.920, 50.640, 4.6...
## $ Last.Trade.Date <chr> "5/2/17", "5/2/17",...
## $ Last.Trade.Price.Only <dbl> 38.2000, 43.1200, 3...
## $ Last.Trade.Size <dbl> 100, 100, 100, 100,...
## $ Last.Trade.With.Time <chr> "12:08pm - <b>38.20...
## $ Low.52.week <dbl> 23.50, 24.85, 3.06,...
## $ Market.Capitalization <dbl> 1.680e+09, 2.140e+1...
## $ Moving.Average.200.day <dbl> 34.3800, 44.6900, 3...
## $ Moving.Average.50.day <dbl> 38.1500, 42.8300, 3...
## $ Name <chr> "Advanced Accelerat...
## $ Open <dbl> 37.990, 42.060, 3.6...
## $ PE.Ratio <dbl> NA, 10.340, 33.940,...
## $ PEG.Ratio <dbl> 0.00, 3.25, 0.00, 0...
## $ Percent.Change.From.200.day.Moving.Average <dbl> 0.111200, -0.035200...
## $ Percent.Change.From.50.day.Moving.Average <dbl> 0.001200, 0.006800,...
## $ Percent.Change.From.52.week.High <dbl> -0.066500, -0.14850...
## $ Percent.Change.From.52.week.Low <dbl> 0.625500, 0.735200,...
## $ Previous.Close <dbl> 38.00, 42.08, 3.85,...
## $ Price.to.Book <dbl> 5.210, 6.030, 0.750...
## $ Price.to.EPS.Estimate.Current.Year <dbl> NA, 9.290, NA, 12.2...
## $ Price.to.EPS.Estimate.Next.Year <dbl> 123.230, 8.070, NA,...
## $ Price.to.Sales <dbl> 14.2700, 0.5200, 0....
## $ Range.52.week <chr> "23.50 - 40.92", "2...
## $ Revenue <dbl> 1.1700e+08, 4.0400e...
## $ Shares.Outstanding <dbl> 4.40e+07, 4.96e+08,...
## $ Short.Ratio <dbl> 3.99, 4.49, 1.23, 1...
## $ Stock.Exchange <chr> "NMS", "NMS", "NGM"...
## $ Target.Price.1.yr. <dbl> 44.20, 53.13, NA, 6...
## $ Volume <dbl> 55794, 4057054, 469...
## $ Return.Avg.50.day <dbl> 0.001238080, -0.001...
From the above list, we can see that “sector” was loaded as a chr
datatype. For the purpose of our exploratory analysis, we need to convert it to factor.
The dataset has 58 stock stats (variables) for 1,839 companies (observations) listed on NASDAQ from different sectors.
For the purpose of our analysis, we will be focusing on the following variables:
We will also create two new categorical variables:
##
## |Market < Book| |Market > Book| |Market ~ Book|
## 194 1414 82
##
## No Dividends Pays Dividends
## 721 1118
Now, here is a gimps of our final dataset after adding the new variables and changing some of the names:
## Observations: 1,839
## Variables: 10
## $ Symbol <chr> "AAAP", "AAL", "AAME", "AAOI", "AAON", "AAPL"...
## $ Name <chr> "Advanced Accelerator Applicatio", "American ...
## $ Sector <fctr> Health Care, Transportation, Finance, Techno...
## $ Price <dbl> 38.1500, 42.8300, 3.8400, 49.1400, 35.5800, 1...
## $ Return <dbl> 0.001238080, -0.001758505, 0.000932000, 0.007...
## $ Volume <dbl> 119633, 6971160, 2718, 2033430, 161241, 23464...
## $ Market.Cap <dbl> 1.680e+09, 2.140e+10, 7.560e+07, 8.910e+08, 1...
## $ Value.Status <fctr> |Market > Book|, |Market > Book|, |Market < ...
## $ Dividends.Status <fctr> Pays Dividends, No Dividends, No Dividends, ...
## $ PEG.Ratio <dbl> 0.00, 3.25, 0.00, 0.72, 2.83, 1.77, 1.28, 2.4...
And here is a summary of all the variables we have now:
## Symbol Name Sector
## Length:1839 Length:1839 Technology :403
## Class :character Class :character Health Care :399
## Mode :character Mode :character Consumer Services:303
## Capital Goods :152
## Finance :100
## (Other) :464
## NA's : 18
## Price Return Volume
## Min. : 0.126 Min. :-0.0418032 Min. : 11
## 1st Qu.: 5.220 1st Qu.:-0.0015492 1st Qu.: 55789
## Median : 15.390 Median : 0.0004590 Median : 205892
## Mean : 32.185 Mean : 0.0002936 Mean : 828559
## 3rd Qu.: 36.805 3rd Qu.: 0.0023159 3rd Qu.: 643716
## Max. :1776.430 Max. : 0.0525529 Max. :67873800
##
## Market.Cap Value.Status Dividends.Status
## Min. :1.260e+06 |Market < Book|: 194 No Dividends : 721
## 1st Qu.:1.120e+08 |Market > Book|:1414 Pays Dividends:1118
## Median :5.131e+08 |Market ~ Book|: 82
## Mean :6.112e+09 NA's : 149
## 3rd Qu.:1.925e+09
## Max. :7.750e+11
##
## PEG.Ratio
## Min. :-2113.330
## 1st Qu.: 0.000
## Median : 0.000
## Mean : 6.952
## 3rd Qu.: 1.680
## Max. : 4884.770
##
Now, let’s start exploring our categorical variables first:
Here is a summary of sectors in this dataset and the number of companies in each sector:
##
## Basic Industries Capital Goods Consumer Durables
## 67 152 71
## Consumer Non-Durables Consumer Services Energy
## 91 303 54
## Finance Health Care Miscellaneous
## 100 399 76
## Public Utilities Technology Transportation
## 59 403 46
NASDAQ is generally known as a tech index or stock market. The plots above show what kind of tech is dominating the NASDAQ We can see that General Technology, Health Care Technology, and Consumer Services Technology dominate the market.
Here is a summary of the number of companies in this dataset based on whither they pay dividends or not:
##
## No Dividends Pays Dividends
## 721 1118
From the plots above, we can see that a majority of 60% of the companies on NASDAQ pay dividends to their shareholders. It would be interesting to see later on in this project how this might affect trading behavior reflected in the volume variable.
Here is a summary of the number of companies in this dataset based on whither their market values is above, below or within a 0.1 range of their book values: (this is a reflection of the traders perception of the stock’s future value compared to it’s actual accounting value)
##
## |Market < Book| |Market > Book| |Market ~ Book|
## 194 1414 82
From the above plots, we can see that the vast majority of the companies have a market value that’s higher than it’s book value! We will see how this relates to returns and volume later on in this project.
Now, let’s start exploring our numerical variables:
This is basically the average price of the stock in the past 50 days.
## Summary of price variable:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.126 5.220 15.390 32.185 36.805 1776.430
## The highest 10 prices:
## [1] 1776.430 884.560 860.260 841.780 775.199 479.710 396.570
## [8] 378.550 369.977 339.040
In the first plot, we can’t really see enough insights about prices. That’s why I zoomed in by cutting the data points above $100. But still, the distribution is positively skewed and we still can’t see where most of the prices are. For that, I transformed Price (the x-axis) to a log10 scale in the third plot which provided a clearer visualization of the distribution.
From the above table and plots, we can see that 75% of the stocks on NASDAQ are priced below $40 with only few outlier as seen in the box plot. This might get interesting when we plot price against volume to see if it has any effect on how many expensive vs cheap stocks are traded.
We will also plot this against returns to see if expensive stocks generate more returns or not.
Now, let’s break the Price plot by Sectors.
First let’s see the summary of price by Sector:
## NASDAQ_Stats$Sector: Transportation
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.8551 8.2600 20.5850 31.5223 47.0213 157.3940
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Energy
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.126 2.353 8.883 17.464 19.131 116.390
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Public Utilities
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2931 5.9157 14.7770 20.0772 27.6691 104.9100
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Basic Industries
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4263 2.8945 11.6600 24.6578 32.8530 116.6800
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Consumer Durables
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.84 5.59 10.27 28.82 35.16 339.04
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Miscellaneous
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.8163 3.8100 12.2350 53.0215 37.8675 1776.4300
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Consumer Non-Durables
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.532 5.053 14.310 29.101 37.365 200.980
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Finance
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.39 13.18 23.24 44.11 55.13 306.66
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Capital Goods
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.690 5.782 19.300 29.588 35.998 287.529
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Consumer Services
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.336 9.045 20.900 38.317 40.498 884.560
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Health Care
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.398 3.425 10.170 27.247 28.205 775.199
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Technology
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.226 6.105 16.360 33.581 40.270 860.260
And now, let’s plot this:
Well, it’s not clear in this faceted plot, but from the table above we can see the there are more observations on the higher end of price for 3 sectors: Capital Goods, Consumer Services, and Transportation. They have higher medians. So now we know which sector have the most expensive stocks.
To see the price trend clearer, let’s plot a stacked percentage bars:
We can now see the trend, stocks of companies that don’t pay dividends tend to occur more at the higher end of Price (most right end).
What if we facet prices by Dividends.Status?
Summary of Price by Dividends.Status:
## NASDAQ_Stats$Dividends.Status: No Dividends
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.126 10.280 23.640 39.207 49.670 479.710
## --------------------------------------------------------
## NASDAQ_Stats$Dividends.Status: Pays Dividends
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.226 3.889 10.826 27.656 28.700 1776.430
So here we see another clear trend. Stocks of companies that don’t pay dividends are priced higher than those of companies that pay dividends.
This is the average normalized daily return of the stocks in the past 50 days.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0418032 -0.0015492 0.0004590 0.0002936 0.0023159 0.0525529
From the above table and plot, we can see that stock returns are normally distributed around zero (positive returns = gains, negatives returns = losses). Which I think not very promising! However the median that is a little higher than zero (0.0004590). This can help keep some hope that overall returns are positive.
What if we break the Returns plot by Sector? let’s see the summary of returns by Sector:
## NASDAQ_Stats$Sector: Transportation
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0418032 -0.0032070 -0.0004335 -0.0029956 0.0015379 0.0044112
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Energy
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0130787 -0.0048900 -0.0016554 -0.0020009 -0.0001512 0.0184816
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Public Utilities
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.564e-02 -1.892e-03 3.020e-04 3.126e-05 2.150e-03 1.615e-02
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Basic Industries
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0264309 -0.0019162 -0.0003110 -0.0007337 0.0011251 0.0147655
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Consumer Durables
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.479e-02 -1.630e-03 1.650e-04 2.312e-05 1.931e-03 7.544e-03
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Miscellaneous
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0140327 -0.0018777 0.0003315 0.0007331 0.0025855 0.0237884
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Consumer Non-Durables
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0116463 -0.0019872 0.0001760 0.0000399 0.0018882 0.0094820
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Finance
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0332528 -0.0008052 0.0001535 -0.0003367 0.0014954 0.0152871
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Capital Goods
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0146830 -0.0014667 0.0005600 0.0007901 0.0024704 0.0525529
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Consumer Services
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0229238 -0.0007010 0.0007520 0.0007373 0.0026066 0.0150718
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Health Care
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0323631 -0.0020467 0.0005370 0.0003931 0.0033219 0.0353511
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Technology
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0189735 -0.0008825 0.0006450 0.0007000 0.0022929 0.0295837
From this plot, we can see that some industries have more negative returns than positive like Transportation for example. While others have more positive returns than negative, like Public Utilities for example. We can also notice that Technology Secotr has many outliers in both diections.
Let’s do that same of what we did with Price and facet by Dividends.Status:
## NASDAQ_Stats$Dividends.Status: No Dividends
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0418032 -0.0009560 0.0004740 0.0002858 0.0018695 0.0189747
## --------------------------------------------------------
## NASDAQ_Stats$Dividends.Status: Pays Dividends
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.0332528 -0.0019175 0.0003820 0.0002986 0.0027517 0.0525529
This clearly shows yet another trend of stocks of companies that don’t pay dividends! They generate more returns (higher medians).
This is the daily average number of stocks traded per company
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11 55789 205892 828559 643716 67873800
In the first plot of Volume plots, we face the same issue we faced when we plotted Price. We can’t really see enough insights becase the distribution is positively skewed. For that, I transformed Volume (the x-axis) to a log10 scale in the second plot which provided a clearer visualization of the distribution.
From the table and the plot above we see that 75% of the companies have 2/3 of a million shares get traded on daily bases.
Let’s facet that by the sector as we did with Price and Return, but now to see what sectors are highly traded, ot in other words, highly compatitave for daily traders.
Here is the summary of Volume by Sector:
## NASDAQ_Stats$Sector: Transportation
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1157 91879 286721 1514281 798688 28557900
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Energy
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 231 61382 150009 747392 610435 12489100
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Public Utilities
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1892 51186 174013 1408994 424995 44338300
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Basic Industries
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2608 42792 81923 387017 273841 3165230
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Consumer Durables
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 198 29573 84287 249030 218250 3005800
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Miscellaneous
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3259 61206 191723 607815 452434 8710940
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Consumer Non-Durables
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 252 25092 118910 524710 394170 7914830
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Finance
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 287 19144 83046 272162 273978 3154830
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Capital Goods
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1782 45449 139471 332525 272859 5711080
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Consumer Services
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11 51413 232230 841571 701576 26624800
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Health Care
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1500 113726 300444 693319 878052 9696450
## --------------------------------------------------------
## NASDAQ_Stats$Sector: Technology
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 21 73864 319874 1432754 842934 67873800
We can see from the table and plots above that for example, Consumer Services, Health Care, and Technology have higher volumes than other sectors. Is this only because there are more companies in those sectors or because of other factors? We will investigate this more later on in this project.
Now let’s finish this section by faceting by Dividends.Status:
## NASDAQ_Stats$Dividends.Status: No Dividends
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 21 37310 141384 1088399 600323 67873800
## --------------------------------------------------------
## NASDAQ_Stats$Dividends.Status: Pays Dividends
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 11 75882 242167 660987 697288 16022700
We can see form the table above that people are trading more on daily basis stocks of companies that pay dividends. This makes sense as we saw earlier that stocks of companies that don’t pay dividends are generally more expensive, thus less affordable by daily individual traders.
This is the earning per share divided by the earning per share growth. This gives a good indication of how the company is priced compared to it’s earnings growth. Lets plot our PEG.Ratio variable and see if there is any trends there:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2113.330 0.000 0.000 6.952 1.680 4884.770
In the first plot of PEG Ratio, we were not able to see anything. I needed to zoom in by adding limits to the x-axis to see the distribution. In addition, in the second plot, I thought it would add more insights if I color the values by whether it’s above zero (which signals a good growth rate) or below zero (which signals that earnings are declining).
It’s interesting to see how big of a range there is for this variable while the median is zero!
Let’s break it down by Dividends.Status and by Value.Status:
## NASDAQ_Stats$Dividends.Status: No Dividends
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -50.03 0.00 0.96 6.15 1.97 2628.03
## --------------------------------------------------------
## NASDAQ_Stats$Dividends.Status: Pays Dividends
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2113.330 0.000 0.000 7.469 1.450 4884.770
From the table and plot above, it seems that stocks with no Dividends have higher median PEG.Ratios. This goes in line with our theory earlier that such companies grow faster by reinvesting their earnings into the business.
## NASDAQ_Stats$Value.Status: |Market ~ Book|
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -15.730 0.000 0.000 3.084 1.272 81.450
## --------------------------------------------------------
## NASDAQ_Stats$Value.Status: |Market < Book|
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -8.0400 0.0000 0.0000 0.8774 0.0000 70.8500
## --------------------------------------------------------
## NASDAQ_Stats$Value.Status: |Market > Book|
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2113.330 0.000 0.750 8.709 1.930 4884.770
Here we see that companies with market values that are more than their book values tend to have higher median PEG Ratios.
The dataset has 58 stock stats (variables) for 1,839 companies (observations) listed on NASDAQ from different sectors.
For the purpose of our analysis, we will be focusing on two main variables:
To support the exploration, I have used the following features as they either drive or give a signal on price and return:
I also created three more variables from other features in the original dataset to give more classifications of the stocks:
From the initial exploration, we noticed that stocks of companies that don’t pay dividends, are priced higher, generate more returns, yet traded less on daily bases.
Daily traders are most probably individual investors as opposed to long term institutional investors. So they might not be able to afford expensive stocks and might perceive them as riskier investments. However expensive stocks tend to generate more positive returns even on daily bases.
By learning more about the dataset, we get an explanation on why stocks with no dividends are priced higher in the market. The general concept is that when a company doesn’t pay dividends, reinvest their earnings into the company to generate more growth. This is showed clearly in the PEG.Ratio plots above (They tend to have higher median PEG Ratios).
Also, companies with market values that are more than their book values (high Market-to-Book Ratio) tend to have higher median PEG Ratios. This says that the market is good at valuing companies based on their earning growth rates. If one can identify companies that have high PEG Ratio but low Market to Book Ratio, then this would be a catch as it’s under-priced.
Let’s start by plotting all the numerical variables in pairs to see if there is any clear correlation between any pair:
Judging by the correlation coefficients and the scatter plots in the graph above, there seems to be no clear correlations between our numerical variables.
Let’s experiment witb another way to visualize correlations:
ggcorr(NASDAQ_Stats, label = TRUE)
## Warning in ggcorr(NASDAQ_Stats, label = TRUE): data in column(s) 'Symbol',
## 'Name', 'Sector', 'Value.Status', 'Dividends.Status' are not numeric and
## were ignored
The above visualization provides a cleaner way to see the numbers of correlation coefficient, and we can see there is no meaningful correlations.
Let’s see if that holds true when we transform the scales to lgo10. I will start by Price and Volume:
## Correlation between Volume and Price:
## [1] 0.03938329
## Correlation between log10(Volume) and log10(Price):
## [1] 0.1437654
From the table above, we can see that the correlation is slightly higher after transforming the data to a log10 scale, but not enough to form an opinion about the relationship between Price and Volume.
I still wanted to see how this would look if I plot it in two different ways as seen the the two plots above for Price/Volume: - The first plot I scaled both in log10 and zoomed in to the concentrated chunck of the data points to see how they look. - In the second plot, I reversed the axises, and here I was ableto see the slightly higher correlation calculated above seen by the white regression line on top of the data points.
What about Return and Volume Change? Maybe high/low returns could be correlated with trading volumes.
## Correlation between Return and Volume Change:
## [1] -0.03895478
## Correlation between log10(Return) and log10(Volume):
## [1] 0.02218874
By experimenting with the scales to hopefully see some correlations, we still get nothing. As we can see in the aboce figure, the regression line is almost horizontal to the x-axis.
But let’s now try Return and Price, then Return and Market.Cap:
From the above experimental plot, we don’t seem to have any clear correlations between Return and Price.
From the above experimental plot, we don’t seem to have any clear correlations between Return and Market Cap.
As a matter of fact, we are still not able to find correlations between any of our numerical variables, even after the data transformations.
So let’s dig deeper into our categorical variables to find some insights about our dataset.
In the following group of plot, I am just visualizing the summaries by Sector for: Price, Return, Volume, PEG.Ratio.
In the above plot, I was able to create a visualization of Price summary for each Sector. This will help in identifying Sectors with the highest/lowest prices for me to be able to help daily traders acheive more on capital efficiency when selecting stocks to buy.
In the above plot, I was able to create a visualization of Return summary for each Sector. This will help in identifying Sectors with the highest/lowest returns for me to be able to help daily traders acheive more on capital efficiency when selecting stocks to buy, specially if combined with the previous plot of Price summary.
For example, we can see that Finance sector’s stocks are the most expensive stocks and at the same time, one of the lowest lucrative ones.
In the above plot, I was able to create a visualization of Volume summary for each Sector. This will help in identifying Sectors with the highest/lowest volumes for me to be able to help daily traders to benchmark their trading behavior with the rest of the market.
If we follow our Finance sector example, we can see that its’ stocks are among the least traded ones, which makes sense as they are expensive and generate low returns.
For example, Transportation’s stocks have the highest growth ratios, they are not as expensive as Finance or Consumer Services, and they are traded less than Health care and Technology, mainly becase they historically had negaive returns. But with such high PEG Ratios, it seems Transportation’s stocks would be a good bit as their earnings are growing fast.
Now let’s plot the density per Sector for the same variables to see if that adds more insights: Price, Return, Volume, and PEG.Ratio.
These density plots confirm that stocks with no dividends are priced higher, more of them generate positive returns, they are traded less, and more of them have higher PEG Rations.
There is another way to see this clearer, let’s try the Violin plot for this:
Here, we can see the trends in a more clear way. Stocks with no dividends tend to occure more at higher prices, more of them generate positive returns, they are traded less, and more of them have higher PEG Rations.
Unfortunately, I couldn’t find any strong correlations between the numeric features of this dataset, even with multiple experiments in data scale transformation.
I was able to create visualizations for summaries of all the main features broken down by Sector and ordered from the lowest to the highest. This helped me in concluding the following: - Finance is the most expensive sector, yet it generated the lowest positive returns, and it has a relatively low PEG Ratio. This explains why it’s the second lowest traded sector. - The highest returns were generated by 3 sectors: Consumer Services, Technology, and Capital Good (almost the same). Among them, Technology has the cheapest stocks with the third highest PEG Ratio. This explains why it’s the top traded sector. - Surprisingly, Transportation has the highest PEG Ratio while being among the lowest priced sectors. It’s also third in volume and returns.
This leads me to conclude that general Technology companies on NASDAQ are the best option for daily traders in terms of price and return.
Transportation also seems to be a good bet as the PEG Ration is the highest which means it’s growing fast and prices would increase soon.
I was also able to confirm the initial exploration observation from the first part of this project. The density plots confirm that stocks with no dividends are priced higher, more of them generate positive returns, they are traded less, and they have higher PEG Rations.
Given that Technology and Transportation sectors are the best choose, I would select companies from those sectors that pays no dividends to maximize the returns.
Now, let’s experiment with putting it all together. I will start by plotting Price and Return, colored by Volume, and ten sized by Market Cap.
Here are the three plots:
From the above 2 plots for Return and Volume, although it seems that there is kind of a liner relationship, but it’s scattered all over the chart, and we already know there is no correlation from the second section of this project above.
But these plots at least show that the companies with the highest market caps or even highest volumes don’t actually generate the highest returns. So we shouldn’t be blinded by how big the market value of the company is or how much it’s stocks are traded.
Now let’s try the same two variables: Price and Return, but this time colored by PEG Ratio with only two colors: above or below zero. Then let’s break that down by Dividends.Status and then by Value.Status as we did before:
What’s interesting about the first chart of the above 3 experimental plots, is that there are many companies with negative PEG Ratios (in red) that generated high returns. I would explain that my market speculations or that they have announced new returns that are higher that expected. Investing in those stocks would be risky specially around earning announcement times.
In general, we can clearly see that stocks with higher PEG Ratios tend be have higher prices, mainly because they come from heather companies (financially). We also notice that there seems to be more of such companies in the “No Dividends” section of the plot. This makes sense as they reinvest their earnings to grow faster.
In the third chart, we clearly notice that companies or stocks with higher PEG Ratios occur much more where the market value is higher than the book value. Which means they are perceived as more valuable by traders.
Now to get things even more interesting, let’s plot Price and Return, but this time let’s color by Return with two colors (above or below zero), and size by Volume. Then break down by Dividends.Status and then by Value.Status.
In the first chart above, we see very clearly that the highest priced stocks don’t necessarily generate the highest returns. Actually, as seen by the line, the higher the price, the lower the return for stocks with positive returns.
The faceted plots didn’t really give any more information than what we already have from the previous sections.
Below, I am doing exactly the same as the above three charts, but sizing by Market Cap instead of Volume hoping to see some new information.
Unfortunately, the above 3 experimental plots don’t seem to add any new insights to what we already have except that it makes it clearer.
Now let’s replace Price on the x-axis by Volume! Maybe this will reveal a hidden key relationship.
These 3 charts show that stocks that generate the highest returns are not usually traded that often. Which means daily traders shouldn’t just follow the crowds!
What is very interesting about this chart is that there are four companies with minimal trading volume that generate as good returns as the most traded and most values stocks!
Faceted plots didn’t add any value here.
Next, I am doing the exact same of the above three charts but I am replacing Market Cap by Price for the size.
In the first plot above, we see that Volume has almost no effect on return!
Here we still see the same four companies appear once again, but now we get to know that they are also not that expensive! They seem like a good catch for daily traders. This is the kind of stocks that individual daily traders should focus on.
Faceted plots didn’t add any value here.
Plotting Price and Return sized by Market Cap and colored by Volume helped in seeing big companies based on their actual returns instead of how big the market value is or how much it’s stocks are traded. We saw that most of such big companies are out-performed by smaller ones in terms of returns.
I was surprised to see many companies with negative PEG Ratios (in red) that generated high returns. But as explained above this could be due to either market speculations or better earnings announcements. But in general I would stay away from those companies unless they announce good new returns to show real potential for growth.
I was also able to identify some companies with minimal trading volume but with as good returns as the most traded and most values stocks!
Unfortunately, I was not able to spot any linear relationships to create models. Have we had started with the daily prices dataset, we might have been able to predict some future prices. But this one is mainly about insights derived from the stock stats.
This is a great visualization of price, volume and return summaries by sector. It gives daily traders a quick overview of where to focus their attention. Almost half of the valuable information we got from this dataset is loaded in those three box plots.
For example:
- Finance is the most expensive sector, yet it generated the lowest positive returns, and it has a relatively low PEG Ratio. This explains why it’s the second lowest traded sector.
The highest returns were generated by 3 sectors: Consumer Services, Technology, and Capital Good (almost the same). Among them, Technology has the cheapest stocks with the third highest PEG Ratio. This explains why it’s the top traded sector.
Surprisingly, Transportation has the highest PEG Ratio while being among the lowest priced sectors. It’s also third in volume and returns.
This leads to conclude that general Technology companies on NASDAQ are the best option for daily traders in terms of price and return.
Transportation also seems to be a good bet as the PEG Ration is the highest which means it’s growing fast and prices would increase soon.
Tjis is an important plot because it shows very clearly that expensive stocks or stocks of companies with the highest market caps, don’t necessarily generate the highest returns. Actually, as seen by the line, the higher the price, the lower the return for stocks with positive returns. Daily trader should not be blinded by the how bug the company value.
In this chart, we are actually able to see that most of the stocks that generate the highest returns are not being traded that often. Volume has almost no effect on returns. This means daily traders shouldn’t just follow the crowds!
This chart also shows some companies with minimal trading volume that generate as good returns as the most traded and most values stocks! Such stocks seem to be a good catch for daily traders to keep an eye on.
This has been an amazing experience! It started rough, and took much more than I anticipated, but it’s worth all the time I invested in it.
The biggest challenge was learning how to get and prepare the data in the first place. Once that was solved, I struggled to decided what direction to take and which of the 58 variables to consider. But as I started exploring, the direction kind of forced itself. Giving the nature of the dataset, it all came down to two main stats, price and return. all other variables/features were used to get more insights about how much would it cost to buy a stock, and how much return would that stock generate.
Unfortunately, there was no clear correlations between any of the numerical variables in th dataset. But I was able to get some interesting and sometimes surprising insights from it. Have I started with the other available dataset, the daily stock prices, I might have been able to create models to predict stock prices and returns. But that could be a good project for the next module: Machine Learning.
In the future, if I’m to analyze this dataset again, I would definitely focus more on specific Sectors and get into the details of Company’s names to develop some opinions about specific companies to direct my attention to and compare with each other.