Neuroimaging: Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, the paper illustrates how scikit-learn can be used to perform some key analysis steps.
Decision Trees with Scikit & Pandas: The post covers decision trees (for classification) in python, using scikit-learn and pandas. The emphasis is on the basics and understanding the resulting decision tree including: Importing a csv file using pandas, Using pandas to prep the data for the scikit-learn decision tree code, Drawing the tree, and Producing pseudocode that represents the tree.
Decomposing the Random Forest model : The author exposes tree paths of predictions of Random Forests. The implementation for sklearn required a hacky patch for exposing the paths. Fortunately, since 0.17.dev, scikit-learn has two additions in the API that make this relatively straightforward: obtaining leaf node_ids for predictions, and storing all intermediate values in all nodes in decision trees, not only leaf nodes. Combining these, it is possible to extract the prediction paths for each individual prediction and decompose the predictions via inspecting the paths.
Feature Unions & Pipeline: Zac Stewart shows the value of pipeline models based on his experience in Kaggle competitions. The pipeline module of scikit-learn allows you to chain transformers and estimators together in such a way that you can use them as a single unit. This comes in very handy when you need to jump through a few hoops of data extraction, transformation, normalization, and finally train your model (or use it to generate predictions).
Majority Rule Ensemble Classifier in Scikit-learn: A simple and conservative approach of implementing a weighted majority rule ensemble classifier in scikit-learn that yielded remarkably good results when Sebastian Raschka tried it in a kaggle competition.
Predicting Customer Churn: YHat shows a case study on using Scikit learn to predict customer churn.
Text Classification using NLTK & Scikit learn: A great presentation by Olivier Grisel on using NLTK & Scikit learn to do text classification.
Clustering with Sci Kit Learn: The author uses the K-Means clustering technique to show the example.
Classification with Scikit Learn using three different methods - Logistic Regression, Discriminant Analysis, and Nearest Neighbor.
Hidden Markov Models - Really simple example using Wikipedia to create a Hidden Markov Model for sentences