Wednesday, April 8, 2015

Predicting Parole Violators

Summary: In this experiment, we will build a model that will predict for an inmate that if he will commit crime when released on parole.

AzureML Galleryhttps://gallery.azureml.net/Details/eae76138f1f84dfeb7bfb6e9ea7dfa26

Description:
Parole board in Criminal Justice System, analyze an inmate's application (along with behavior, history etc) and decide to release an inmate on parole or deny the application. This type of model can be very helpful for the parole board.

The dataset used in this experiment is taken from 15.071x The Analytics Edge, a course taught by MIT, which is a subset of the data provided by National Corrections Reporting Program. This model uses Logistic Regression (also known as Logit Regression) for prediction. We will try to predict "violator" column's value which can be 1 (if parolee violated the parole) or 0 (if parolee did not committed any crime while on parole). Since the outcome of our predictive model is either 0 or 1 we can apply "Two-Class Logistic Regression" algorithm to our dataset.

Dataset:
male: 1 if the parolee is male, 0 if female
race: 1 if the parolee is white, 2 otherwise
age: the parolee's age (in years) when he or she was released from prison
state: a code for the parolee's state. 2 is Kentucky, 3 is Louisiana, 4 is Virginia, and 1 is any other state.
time.served: the number of months the parolee served in prison (limited by the inclusion criteria to not exceed 6 months).
max.sentence: the maximum sentence length for all charges, in months (limited by the inclusion criteria to not exceed 18 months).
multiple.offenses: 1 if the parolee was incarcerated for multiple offenses, 0 otherwise.
crime: 2 is larceny, 3 is drug-related crime, 4 is driving-related crime, and 1 is any other crime.
violator: 1 if the parolee violated the parole, and 0 if the parolee completed the parole without violation.



Steps:
  1. Drag the parole.csv from Saved Datasets. 
  2. Split the datset into 70% Training data and 30% Test data. Make the random seed to 144.
  3. Fill the 70% training data into Train Model and select the column "violator" as we are trying to predict this column's outcome.
  4. Select "Two-Class Logistic Regression" from Machine Learning section and apply it to the Training Model. Leave default values as it is.
  5. Connect the outcome of the Train Model to the Score Model that also takes 30% of the test data to apply the algorithm.
  6. Lastly we have to evaluate the Score Model using Evaluate Model to determine the accuracy of our model.
  7. Select Visualize from the output node of the Train Model and you will see the Accuracy of this model is determined as 0.906 which is pretty awesome. Remember perfect accuracy is 1. Area Under the Curve (AUC) has been calculated as 0.882 (with 0.5 threshold value) which also shows that this is a pretty strong model for parole prediction. 

Ref: http://www.icpsr.umich.edu/icpsrweb/NACJD/series/38/studies/26521?archive=NACJD&sortBy=7
Ref: https://www.edx.org/course/v2/analytics-edge-mitx-15-071x-0

Thursday, March 26, 2015

Predictive Analytics with AzureML

I started exploring AzureML(Azure Machine Learning) few weeks back and quickly fell in love with its simplicity and robustness.

I grabbed the sample data of Down Jones Index from UC Irvine Machine Learning Repository and applied the Linear Regression algorithm to create a prediction model to estimate the future values of Microsoft stock's opening weekly price (so that I can be rich) and here how my model looks like in AzureML.


First I am removing the entire rows with missing values from the data. Then I am applying the filter for MSFT symbol in the first split and I am dividing the data to 80-20 ratio to train the actual model on 80% of the data with the help of Linear Regression algorithm. After that I am trying to predict price variable in Train Model and verifying it using 20% of remaining data. In the last, I am evaluating the model that how effective and reliable it is.

At this point I need to seriously improve my model using other algorithms, removing/adding new variables etc because the Coefficient of Determination is nowhere closer to 1 and Mean Absolute ErrorRoot Mean Squared ErrorRelative Absolute Error & Relative Squared Error are very high. But that's how a prediction model (more or less) will eventually look like in AzureML. It can be published as a web service with few clicks.

I have published this experiment/source to AzureML gallery and can be accessed here:

https://gallery.azureml.net/Details/3f4d92649bfa4fa3bf4b0c93a3635227

My next step would be to grab data from SharePoint lists and apply some prediction algos on it.

Knowledge Chat Bot (No Code)

Recently my customers was looking for a solution where on-field guys can search for answers related to their tasks and it was a perfect scen...