Python and Machine Learning

For many, Python isn’t a language of the past, but one of the present and future. Although Python was created in the 80s, it’s very prominent in web development and industries that depend on flexible, and simple programming languages.

Python fits that bill better than many other languages. In fact, Python is such a simple language to learn and use that it’s often one of the first languages those new to programming work with. What makes Python so easy to use? First off, it’s an interpreted language, which means you don’t have to compile code in order to execute it: you simply write and run your code.

Python also adheres to a philosophy that makes it possible to write very few lines of code to complete a task. And with so many available modules, classes, exceptions, data types, and libraries, there’s very little this language can’t do.

Although Python’s specialty is web and web applications, it has a number of other tricks up its sleeve. One such trick is Machine Learning. For those that don’t know, Machine Learning is a subset of Artificial Intelligence where computer algorithms automatically improve through experience. In other words, making a computer perform some task without actually programming for said task.

And considering Machine Learning has become widespread, with companies like Google, Amazon, LinkedIn, and Facebook depending on it, ML is making it possible for exponential leaps in modern technological advances.

Let’s take a look at why you should consider Python for your machine learning needs.

Is Python good for machine learning?

In a word, yes. In fact, Python is thought of as the preferred language for Machine Learning. Although Python might be considered “slower” than some languages, it’s capability with data handling capacity is one of the best.

But what makes Python so exceptional with Machine Learning? There are a number of reasons, including:

1- It’s easy to learn (compared to other languages that are capable of integrating with ML).

2- It’s open-source, which means it can be better integrated with new technologies as they arise.

3- It can interact with nearly any programming language and platform.

4- It has an incredible library ecosystem.

5- It’s very flexible, with the ability to go either Object-Oriented or scripting routes.

6- It offers plenty of visualization options.

7- It has great community support.

How do I start learning Machine Learning with Python?

To get started with Machine Learning and Python you must first learn the Python language. Once you’ve taken care of that, you’ll also need to become familiar with a framework like Django. With those under your belt, you then must learn one of the ML-specific libraries. You will need to also understand the concepts of basic machine learning. For example, every machine learning algorithm uses three key concepts:

1- Representation, which is how you represent knowledge.

2- Evaluation, which is the way to evaluate programs, with a focus on accuracy, prediction, recall, squared error, likelihood, posterior probability, cost, margin, entropy, and k-L divergence.

3- Optimization, which is how ML programs are generated.

You will also need to understand the four types of machine learning:

1- Supervised learning is where training data includes the desired output.

2- Unsupervised learning is where training data does not include the desired output.

3- Semi-supervised learning is where training data includes more than one desired output.

4- Reinforcement learning is where an agent learns to achieve a goal in an uncertain, complex environment.

Once you have a solid understanding of how Machine Learning actually works, you are ready to begin your journey by using Python to make it happen.

Is Python fast enough for machine learning?

Although Python isn’t the fastest programming language available, it has already proved itself more than capable enough to handle the demands made by Machine Learning. To overcome what some might consider a downfall, Python has a number of tools available to compensate. For example, Pandas is a tool used in data science for cleaning, transforming, manipulating, and analyzing data. With such tools, you can ensure the data you use with Machine Learning is better optimized for usage.

Which Python version is best for machine learning?

At one point, the best version of Python to use for Machine Learning was 2.7. However, the 2.x iteration of Python has been sunsetted, which means you will have to use a version of Python 3.0 or newer. As of this writing, the most recent stable release of Python is 3.9.0. If you want to have the most recent features and security updates, your best bet is to use that or more recent versions (if available).

Python machine learning library

Before we tackle an example, you’ll need to know which libraries are available to use for Machine Learning in Python. The languages available are:

NumPy is a library for large multi-dimensional array and matrix processing.
SciPy is a library that contains modules for optimization, linear algebra, integration, and statistics.
Scikit-learn is the library used for classical Machine Learning algorithms (in particular, those for supervised and unsupervised learning)
Theano is used to define, evaluate, and optimize mathematical expressions involving multi-dimensional arrays.
TensorFlow is a framework that involves defining and running computations involving tensors.
Keras is a high-level neural networks API.
PyTorch allows developers to perform computations on Tensors with GPU acceleration.
Pandas is one of the most popular libraries for data analysis.
Matplotlib is the most widely used library for data visualization.

Python machine learning example

We’re going to demonstrate a basic experiment using machine learning with Python on Ubuntu Linux. You’ll first need to install Python with the command:

sudo apt-get install python3 -y

Once that installation completes, you’ll need to install the necessary libraries. First, let’s install the necessary libraries with the command:

sudo apt-get install python3-numpy python3-scipy python3-matplotlib python3-pandas python3-sympy python3-nose python3-sklearn -y

Next, gain access to the Python console with the command:

python3

In that console, check all of the necessary libraries by pasting the following and hitting Enter on your keyboard:

# Python version
import sys
print('Python: {}'.format(sys.version))
# SciPy
import scipy
print('scipy: {}'.format(scipy.__version__))
# NumPy
import numpy
print('numpy: {}'.format(numpy.__version__))
# matplotlib
import matplotlib
print('matplotlib: {}'.format(matplotlib.__version__))
# pandas
import pandas
print('pandas: {}'.format(pandas.__version__))
# scikit-learn
import sklearn
print('sklearn: {}'.format(sklearn.__version__))

The output should print out the versions for each library.

Next, we’re going to load the libraries. Paste the following into the console:

# Load libraries
from pandas import read_csv
from pandas.plotting import scatter_matrix
from matplotlib import pyplot
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB

Hit Enter to return to the Python console.

Load a dataset for the experiment with the following:

# Load dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
dataset = read_csv(url, names=names)

View the first 20 rows of data with the command:

# head
print(dataset.head(20))

You should see the first 20 rows of data printed out.

Let’s summarize all the data by pasting the following into the Python console:

# summarize the data
from pandas import read_csv
# Load dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
dataset = read_csv(url, names=names)
# shape
print(dataset.shape)
# head
print(dataset.head(20))
# descriptions
print(dataset.describe())
# class distribution
print(dataset.groupby('class').size())

Finally, you can build and evaluate models from the loaded data by pasting the following into the console:

# Spot Check Algorithms
models = []
models.append(('LR', LogisticRegression(solver='liblinear', multi_class='ovr')))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier()))
models.append(('NB', GaussianNB()))
models.append(('SVM', SVC(gamma='auto')))
# evaluate each model in turn
results = []
names = []
for name, model in models:
kfold = StratifiedKFold(n_splits=10, random_state=1, shuffle=True)
cv_results = cross_val_score(model, X_train, Y_train, cv=kfold, scoring='accuracy')
results.append(cv_results)
names.append(name)
print('%s: %f (%f)' % (name, cv_results.mean(), cv_results.std()))

After pasting the above into the console, hit Enter, and it will print out the different models and accuracy estimations, which can be compared to choose the most accurate.

And that’s a fairly simple example of Machine Learning with Python.

Conclusion

And that’s how and why Python is considered the best language for Machine Learning. With this information, you are ready to begin your journey in this fascinating field. There’s a lot to learn and we’ve only really scratched the surface, so expect to spend a good amount of time learning the ropes of this complicated technology.

If you enjoyed this article, check out one of our other Python articles.