Machine Learning has become part of our everyday lives, encountered in music recommendations, search engines, or virtual assistants on our smartphones. Additionally, Machine Learning is now involved in many aspects of business, such as detecting unusual behaviors in a production system, understanding consumer behavior for forecasting, and analyzing medical reports.
Learning a basic algorithm for Machine Learning or solving an advanced problem has become easier thanks to the simple access to information on the internet and the availability of numerous open-source tools that save us a lot of time. To develop a software program or solve a problem with a Machine Learning algorithm, it is important to write clear code that shows the result of this algorithm. The code should contain all the necessary tasks in a clear structure. In the daily life of a data scientist, it can sometimes happen that you write code for a task you have already implemented in another project in the past. Traditionally, you can copy the already existing code and paste it where it is needed. This can save a lot of time if the task is complicated or time-consuming, as the code does not need to be rewritten.
However, sometimes the data scientist is not primarily interested in how a particular task works but only needs the result. Therefore, it would be helpful to have a reference for code snippets that allow you to quickly and practically implement the tasks needed for the project.
Coding notebooks
For data scientists, hardly a day goes by without writing a code snippet. Besides reading research papers, writing code for testing and further modifications is essential. In addition to developing pure scripts like Python files, there are many tools for Machine Learning that enable code development in an interactive environment, such as Jupyter Notebook and Google Colab. Both are considered efficient for code development before scripts are written for production. In the data science community, Jupyter Notebook is an indispensable tool for research and project development. The ability to adjust code execution and utilize interactive visualization has made Jupyter Notebook enormously popular in recent years.
For these reasons, it is becoming easier to learn programming, develop software tools, or solve practical problems using Machine Learning. Typically, a software project consists of a lot of code development. But we have found that sometimes you might want to develop a complex task without worrying about simple programming tasks like importing basic libraries, uploading data, or visualizing data. Or you might want to save some coding tricks you used in previous projects to make your work easier. There are so many different solutions for running Python code for a specific data science project that it can be very overwhelming for young data scientists to find the right code. For all these reasons and more, we developed an extension for Jupyter environments—the Snippet Library—that stores code snippets you can use whenever you need them and provides basic code snippets for a typical data science project.
Why does the snippet library support data scientists?
The idea of the Snippet Library is to create a tool for rapid prototyping, aiming to develop data analysis workflows efficiently and easily. It seeks to combine the advantages of graphical tools like KNIME or Rapid Miner and programming languages like Python. The Snippet Library is a menu extension for Jupyter Notebook. It adds a customizable menu to insert code snippets, code examples, and boilerplate code into notebooks.
The extension’s dropdown menu includes several snippets for direct use in a notebook. A data scientist can quickly access basic codes for many problems, such as: How do I read a CSV file? How do I visualize missing values in a Pandas DataFrame? How do I display the correlation matrix of my Pandas DataFrame (and make it look good)? There is a snippet for everything.
How to get the Snippet Library running
To get started and use the extension, you first need to install the Snippet Library. Simply install the library via pip:
- pip install snippetlib
If you want to try the Snippet Library on Jupyter Lab, you need to enter the following:
- pip install snippetlib_jl
Sometimes the Jupyter Lab extension does not install properly via pip. To ensure a safe installation, you should try installing it from the GitHub repo.
Using existing coding snippets: The menu offers several snippets categorized by typical task groups for data exploration and visualization:
- Data: includes snippets for reading, writing, and transforming data.
- Modeling: contains snippets for training Machine Learning models, mostly based on sklearn. The code examples include classification, regression, and clustering tasks.
- Plotting: includes snippets for plotting data with Matplotlib, Bokeh, and Pandas.
- Utils: contains snippets that are useful but not mentioned in the above categories, such as creating a Flask server, an interactive table, and more.
Adding custom snippets to the library: The Snippet extension allows users to add their own snippets to the library. There are two ways:
The snippets can be uploaded from readable files. To upload a new snippet into the Jupyter Notebook, the following Python command must be entered into the Jupyter Notebook:
- *from snippetlib import upload_snippet as us*
- *upload_snippets = us.Upload_Snippet()*
After refreshing the page once, the newly added snippet should be visible in the snippet menu. Alternatively, the code snippet can be copied from the cell and directly saved in the Snippet Library.
Creating new snippets: To create new snippets in Jupyter Notebook, the following command should be used in the Jupyter Notebook:
- *from snippetlib import paste_snippet as ps*
- *paste_snippets = ps.Paste_Snippet()*
Again, after refreshing the page once, the newly added snippet should be visible in the snippet menu.
Now it’s up to you: Try out the library and use it to simplify some of your work steps. The Snippet Library provides data scientists with the ability to rapidly prototype while developing data analysis workflows quickly and easily. With the option to insert snippets using the extension, you can focus on more important tasks in your projects and worry less about re-implementing old code.