Posts

Stock Market Predictions and Visualizations

Image
 The capstone project has finally come around, and for this blog post I'll be discussing my project as well as the steps I took to get there.  I worked with stock market data this time around.  Unlike most of the previous projects, I didn't set out to answer a super specific question based on the data as my goal.  Instead, I wanted to build something that would make analyzing the dataset simple and intuitive with just a few clicks.  I set out to make something extremely interactive, containing a huge amount of information.  I also wanted to include two topics in finance that interest me, yield rates and economic sectors.  Of course this is still a data science project, so there would also have to be an emphasis on machine learning and how it could be used to predict future prices.   To begin, I'll briefly explain the data that I used for this project.  To start with there were two datasets from the main US stock exchanges (NYSE and ...

Predicting Housing Prices with ARIMA

Image
For my most recent project, I was assigned the task of analyzing a set of housing prices from various ZIP codes.  The goal was to find the top five ZIP codes for investment using ARIMA modeling.  I'd like to use this space to walk through my thought process while working through the problem, the methods I used, and give particular attention to the exploratory data analysis step because it was very interesting in this case.   The first point of interest with this problem is that question is intentionally vague and required me to  come up with a few answers before even attempting to answer it.  First of all, in order to evaluate the best investment it is necessary for me to choose a metric to use.  In this case, I've decided to use return on investment to measure the profitability, and adjust it based on risk.  I've chosen five years as the time frame to target since this is a fairly standard investment period and it's short enough that the model ca...

Defaulty Credit Card Data

Image
      For my most recent data science project, I decided to take a look at credit card default rates. More specifically, my "business case" was that I had been hired by a bank to help them predict credit card default.  While the data was not nearly as interesting as some that I've explored earlier, the data science process itself has suddenly gotten far more exciting.  Machine learning is the topic I've been waiting for since day one of Flatiron, and this first project leveraging its usefulness did not disappoint.  I'd like to walk through some of the more interesting parts of the project and discuss what I'm eager to learn more about.     In order to provide a good enough point of reference for the rest of the project highlight reel, I should mention a bit about the data.  This was a binary classification problem, and the target was a variable called 'default' that could be zero or one (with the latter indicating that customer had defaulted)...

Location, Location, Location: Real Estate Data Analysis

Image
For my most recent data science project I was tasked with creating a business case related to a provided dataset, and then solving it.  The data in question here described a few years of home sales in King County, Washington, along with plenty of variables describing each home sold.  I knew right away that I wanted to use price as my outcome variable, since most business cases that came to mind would use it as the deciding factor.  Examining the rest of the variables in the dataset, most had a fairly obvious relationship to price (at least when looked at from a high level overview).  For example, I figured that more bedrooms, bathrooms, or square footage would all lead to higher prices, so these didn't really interest me for a business case.  What caught my eye instead was the location data associated with each entry.  Every house had fairly precise latitude and longitude coordinates as variables, as well as a ZIP code.  These are two components that I...

Movie (Data) Magic: Reviewing the Critics

    It certainly takes a specific set of skills to be a movie critic.  Between the knowledge of film history and theory, the ability to eloquently articulate their thoughts, and a keen eye for detail, there is a reason many people look to critics when deciding how to spend a Friday night at the movies.  Despite the occasional upset fan, there is clearly a demand for movie criticism, and the public has put their collective trust in this group to judge the quality of films.  What I would like to examine in this post, however, is not the critics' ability to grade a film on its merits.  Instead, I am interested in examining some data to see how well professional reviews can predict a movie's box office success.  In addition to that, I will also compare critic reviews to audience reviews to see if one group is a better indicator of a high grossing movie.   Since this post is a part of my first project in a data science course, I will also detail s...