Fake News Classifier to Tackle COVID-19 Disinformation-II

An effort to tackle one of the most pressing problems faced by the world currently, Fake News.

Shaunak Varudandi
Towards Data Science

--

(Image by author)

Introduction

Coronavirus (COVID-19) is an infectious disease that has resulted in an ongoing pandemic. The disease was first identified in Wuhan, China, and the first case was identified in December 2019. As of 21st May, 2021, more than 160 million cases have been reported across 180 countries and territories. The sheer scale of this pandemic has led to myriad problems for the current generation. One of the acute problems that I have come across is the circulation of bogus news articles and in today’s world, spurious news articles can cause panic and mass hysteria. I realized the gravity of this problem and decided to base my next machine learning project on resolving this issue.

This my second article on the Fake News Classifier Project that will elaborate on the steps needed to deploy the trained SVM classifier as a web application on Heroku using the Flask framework. Those who haven't read my 1st article can do so via the following link.

Problem Statement

To develop a web application that accurately classifies a news article on COVID-19 into real news or fake news.

Work Flow

In my 1st article, Fake News Classifier to Tackle COVID-19 Disinformation-I, I had performed Data Pre-Processing, Data Engineering, Model Training, Model Testing, and Model Evaluation in Google Colab. I concluded that the SVM Machine Learning algorithm provides optimum results when tasked with classifying a news article as Genuine or Fake.

Since the initial analysis was completed in the 1st article, the steps described below will focus on deploying the trained SVM model as a web application on Heroku using the Flask framework.

Step 1: Train the SVM Machine Learning model and save the trained model and the Tf-Idf vectors as a pickle file respectively.

First, I created a Python script file known as model.py and performed all the requisite steps needed to train an SVM model using the training data. The detailed step-by-step approach can be found in Part-I of this article. Later, the trained SVM classifier and the Tf-Idf vectors were saved as a pickle file.

Step 2: Create a web application.

As we have now saved our trained SVM classifier as well as the Tfi-Idf vectors, it is time to divert our attention towards creating the web application. To do so, I have created a Python script file known as app.py. App.py acts as a backend for our web application and is concerned with processing the data entered by a user and classifying the news article using the trained SVM classifier.

One more file which is of paramount importance is the Index.html file. Index.html is the file that contains the code for the front end of our web application and it communicates with app.py(our backend) to display results to the user.

Step 3: Create the configuration files needed to host the web application on Heroku.

Before we can host the web application onto Heroku, some important configuration files are needed. The 1st file is known as the “Procfile”. A Procfile is used to specify the file that needs to be executed first when a user visits the web application. The procfile contains the following configuration statement.

web:gunicorn app:app

The first parameter “app” is the file that needs to be executed first (app.py) whereas the second parameter “app” is the Flask(__name__). The second required configuration file is the “requiremnts.txt” file. This file is needed since it enables the Heroku environment to download all the required libraries needed for the smooth running of our web app.

Step 4: Commit the entire code to a GitHub repository.

The next step is to create a new repository on GitHub. This repository will be connected to the Heroku platform and Heroku will access all the required code files from the repository itself. The files which I uploaded to my GitHub repository are app.py, model.pkl, tfidf.pkl, nltk.txt, requirements.txt, procfile, and the templates folder. Once the upload is complete, move to Step 5.

Step 5: Create a new account on Heroku and link the newly created GitHub repository to your Heroku account.

1 - If you do not have a Heroku account already, create one and log into your Heroku account. Navigate to the top-right corner of your Heroku dashboard and click on “New” and next choose “Create New App”.

(Image by author)

2 - Now, enter your desired app name, let the region be the United States and lastly, click on “Create App”.

(Image by Author)

3- Once the application is configured, you will be redirected to the dashboard of your newly created Heroku app. Here, we will connect this app to our GitHub repository. Scroll down to the deployment method and click on “Connect to GitHub”.

(Image by Author)

4- This will open a prompt immediately below the “Connect to GitHub” option where it will ask you to enter the repository name of your web app. Enter the name of your GitHub repository, click on the “Search” button, and then click on the “Connect” option that is located next to your desired repository. Wait for a few seconds and your web application repository will be connected to the Heroku app.

(Image by Author)

Step 5: Deploy the web application using Heroku.

There are two options that Heroku provides to complete the web app deployment. The two options are namely, Automatic deployment and Manual deployment. For this project, I have decided to choose Manual Deployment. To perform Manual deployment, scroll down and navigate to Manual deploy. Make sure that the repository branch is correct and then click on “Deploy Branch”. The deployment should start automatically and you can see all the installations in the log file which is displayed under the Build section.

(Image by Author)

If the installation was successful, you will receive a URL at the end of the build logs. Additionally, you will get a message after the Deploy to Heroku section stating that the “app was successfully deployed”. This URL can be used to access the web application from anywhere around the world and you can share it amongst your peers and let them test the deployed web application.

(Image by Author)

Conclusion

The following project was the first end-to-end Machine Learning project that I completed wherein I successfully deployed my SVM Classifier on the Heroku cloud platform. An end-to-end Machine Learning project opens up several facets of the technology stack and helps you get involved with the front-end as well as the backend of an application.

In my opinion, developing highly accurate Machine Learning models is of no use if it stays in your Jupyter or Colab notebook. An end-to-end project provides an opportunity to showcase the accuracy of your model to a broad spectrum of end-users, starting from a layman to an experienced IT professional. This ideology motivated me to make it a point that henceforth, I will focus on deploying every Machine Learning project that I undertake.

The entire project can be found on my Github page. I hope you enjoyed reading my blog.

--

--

MBA (Co-op) student at the DeGroote School of Business || Aspiring Business Data Analyst.