top of page

Design Your Own Word Cloud: Data Science

Updated: Jun 29, 2020

Learn in this data science project how to design your own word cloud using Python.

Word Cloud is a great way to visually represent words, phrases with its importance.

In this project, go through the given 3 datasets and create word clouds. Once you are comfortable, give it a try for your own datasets.

Introduction:


Data Science in simpler terms is the study of the data using various algorithms and processes. Technically, this field of science helps in driving useful information from the pool of data which may be structured or unstructured. Different fields like mathematics, statistics, and computer science play a crucial role in defining the way of mastering data.

In the present era of advancement of technologies, we often hear different terms like data mining, machine learning, deep learning, and big data which are nothing but the subsets of data science.

Data Science deliverables are Prediction, Classification, Recommendation, Pattern Recognition, Fraud Detection, Forecasting, Automated Processes. It is very interesting to know about the lifecycle or phases of data science from acquiring data to driving conclusions.

As we proceed further, any data experiment is nothing without programming languages, tools, and techniques.

Here, we are discussing a versatile language known as Python which is a high-level, general-purpose programming language that can be used for software development, data science, building web applications, etc. Moreover, from getting started to a single line code of “Hello World” to writing a thousand lines of codes, python comes with a gigantic pool of libraries which are useful as well as fast to fetch your desired results.

This project is about visualizing your word cloud from the uploaded dataset, this also gives us an upper hand in changing the way of coding throughout like choosing your data, changing font size, width, the height of the word cloud.


Prerequisites:

  1. Install Python for Windows or Mac.

  1. Install Anaconda for Windows or Mac.


Requirement:


1. To implement this problem we need to use a few Python libraries like Matplotlib and WordCloud


2. Install these libraries on your system using the following commands:

OS : Windows (Run Command Prompt) or Mac(Terminal Window)

Matplotlib used for plotting:- pip install matplotlib

WordCloud used for visualizing:- pip install wordcloud


3. Download Datasets to deep dive into creating the word clouds.


Sports
.txt
Download TXT • 378B


Top Movies
.txt
Download TXT • 2KB


Countries
.txt
Download TXT • 2KB



Steps:


  1. Open the Anaconda application from your system to launch Jupyter Notebook.

  2. Create a new notebook in Jupyter Notebook to start coding.

  3. Import all the required libraries to generate a word cloud as shown in pseudo-code

  4. Data Analysis: upload the dataset and analyze the data

  5. Data Visualization: generate a word cloud from the dataset by setting the background color, font size, width, height.


Pseudo Code:


Input : Import Python libraries, datasets
Output : Data-driven Word Cloud 
 
# A fun way to create your own text and generate word cloud
# Importing Libraries
    import numpy as np
    import matplotlib.pyplot as plt
    import pandas as pd
    from wordcloud import WordCloud
    %matplotlib inline

# Data Visualization : visualize the data using word cloud
    text = "square"
    plt.figure(figsize=(10,10))
    wordcloud = WordCloud(background_color='white', repeat = 
              True).generate(text)
    plt.title("My Data WordCloud")
    plt.imshow(wordcloud)
    plt.axis("off")
    plt.show()

# Upload datasets and generate word cloud
# Importing Libraries
    import matplotlib.pyplot as plt
    from wordcloud import WordCloud
    %matplotlib inline
 
# Data Analysis : upload dataset and analyze the data
    dataset = open("Top Movies.txt", "r").read()
           print(dataset)
 
# Data Visualization : visualize the data using word cloud
    plt.figure(figsize=(10,10))
    wordcloud = WordCloud(background_color='white',min_font_size     
                 = 10, width = 500,height= 500).generate(dataset)
    plt.title("My Data WordCloud")
    plt.imshow(wordcloud)
    plt.axis("off")
    plt.show()

# Repeat for Top Movies.txt dataset like below in another cell of jupyter Notebook
    %matplotlib inline
    from wordcloud import WordCloud, STOPWORDS
    import matplotlib.pyplot as plt

    dataset = open("Top Movies.txt", "r").read()

    plt.figure(figsize=(10,10))
    wordcloud = WordCloud     
                (background_color='white').generate(dataset)
    plt.title("wordcloud")
    plt.imshow(wordcloud,interpolation="bilinear")
    plt.axis("off")
    plt.show()

Learning Opportunity:


  1. Word Cloud is one of the data visualization techniques used for representing text data where the size of each word indicates its occurrence.

  2. Mostly used for analyzing data from social network websites.

  3. One can try to adjust color, size, and number of text inside your word cloud.

  4. One of the best ways is to mask your word cloud in any shape of your choice.

  5. Try to implement all datasets with different requirements.

  6. You can create your dataset to build your own word cloud.


Time Required: 40 mins


Cost <= $5


153 views1 comment

Recent Posts

See All
bottom of page