Updated: Jun 29, 2020
Learn in this data science project how to design your own word cloud using Python.
Word Cloud is a great way to visually represent words, phrases with its importance.
In this project, go through the given 3 datasets and create word clouds. Once you are comfortable, give it a try for your own datasets.
Data Science in simpler terms is the study of the data using various algorithms and processes. Technically, this field of science helps in driving useful information from the pool of data which may be structured or unstructured. Different fields like mathematics, statistics, and computer science play a crucial role in defining the way of mastering data.
In the present era of advancement of technologies, we often hear different terms like data mining, machine learning, deep learning, and big data which are nothing but the subsets of data science.
Data Science deliverables are Prediction, Classification, Recommendation, Pattern Recognition, Fraud Detection, Forecasting, Automated Processes. It is very interesting to know about the lifecycle or phases of data science from acquiring data to driving conclusions.
As we proceed further, any data experiment is nothing without programming languages, tools, and techniques.
Here, we are discussing a versatile language known as Python which is a high-level, general-purpose programming language that can be used for software development, data science, building web applications, etc. Moreover, from getting started to a single line code of “Hello World” to writing a thousand lines of codes, python comes with a gigantic pool of libraries which are useful as well as fast to fetch your desired results.
This project is about visualizing your word cloud from the uploaded dataset, this also gives us an upper hand in changing the way of coding throughout like choosing your data, changing font size, width, the height of the word cloud.
Install Python for Windows or Mac.
Install Anaconda for Windows or Mac.
1. To implement this problem we need to use a few Python libraries like Matplotlib and WordCloud
2. Install these libraries on your system using the following commands:
OS : Windows (Run Command Prompt) or Mac(Terminal Window)
Matplotlib used for plotting:- pip install matplotlib
WordCloud used for visualizing:- pip install wordcloud
3. Download Datasets to deep dive into creating the word clouds.
Open the Anaconda application from your system to launch Jupyter Notebook.
Create a new notebook in Jupyter Notebook to start coding.
Import all the required libraries to generate a word cloud as shown in pseudo-code
Data Analysis: upload the dataset and analyze the data
Data Visualization: generate a word cloud from the dataset by setting the background color, font size, width, height.
Input : Import Python libraries, datasets Output : Data-driven Word Cloud # A fun way to create your own text and generate word cloud # Importing Libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd from wordcloud import WordCloud %matplotlib inline # Data Visualization : visualize the data using word cloud text = "square" plt.figure(figsize=(10,10)) wordcloud = WordCloud(background_color='white', repeat = True).generate(text) plt.title("My Data WordCloud") plt.imshow(wordcloud) plt.axis("off") plt.show() # Upload datasets and generate word cloud # Importing Libraries import matplotlib.pyplot as plt from wordcloud import WordCloud %matplotlib inline # Data Analysis : upload dataset and analyze the data dataset = open("Top Movies.txt", "r").read() print(dataset) # Data Visualization : visualize the data using word cloud plt.figure(figsize=(10,10)) wordcloud = WordCloud(background_color='white',min_font_size = 10, width = 500,height= 500).generate(dataset) plt.title("My Data WordCloud") plt.imshow(wordcloud) plt.axis("off") plt.show() # Repeat for Top Movies.txt dataset like below in another cell of jupyter Notebook %matplotlib inline from wordcloud import WordCloud, STOPWORDS import matplotlib.pyplot as plt dataset = open("Top Movies.txt", "r").read() plt.figure(figsize=(10,10)) wordcloud = WordCloud (background_color='white').generate(dataset) plt.title("wordcloud") plt.imshow(wordcloud,interpolation="bilinear") plt.axis("off") plt.show()
Word Cloud is one of the data visualization techniques used for representing text data where the size of each word indicates its occurrence.
Mostly used for analyzing data from social network websites.
One can try to adjust color, size, and number of text inside your word cloud.
One of the best ways is to mask your word cloud in any shape of your choice.
Try to implement all datasets with different requirements.
You can create your dataset to build your own word cloud.
Time Required: 40 mins
Cost <= $5