Accessing Files from SystemLink™ Server in JupyterHub

This blog is a part of the series – Leverage SystemLink™ for Machine Learning. Data is the central piece for any analytics or Machine Learning. In this article, we explain the steps to access and manipulate the data that is available in SystemLink™ internal storage from the Python JupyterHub – Python APIs. In this article, we will cover the following

1. Review of Files in SystemLink™
2. How to upload a file into SystemLink™ Server from JupyterHub
3. How to update metadata of files in SystemLink™ Server
4. How to download/read a file from SystemLink™ server

 

Key Component – Python APIs

SystemLink™ provides the “systemlink” Python library which has the Python API’s to access the data (files and tags) from the SystemLink™ server. These API’s are installed as a part of SystemLink™. These can be imported and used inside the Jupyter notebooks created in the SystemLink™’s JupyterHub plugin and we will see how.

Before we look at how to access the files, you can check out this article for Understanding SystemLink™’s file system.

 

Querying and downloading a file from SystemLink™ Server

For training a machine learning model, you might want to read the training data from a set of files present in SystemLink™. This is a 2 step process

1) Get the file_ids of the list of files
2) Read/Download the files

Let’s use an example where you want to read all the files with the value of property “shift” as “morning”.

 

Step 1:  Import the FileIngestionClient and messages

from systemlink.fileingestionclient import messages as fileingestion_messages
from systemlink.fileingestionclient import FileIngestionClient
fileingestionclient = FileIngestionClient(service_name='FileIngestionClient')

In the above snippet, we are importing the FileIngestionClient class, creating an instance of it.

Now we have to create a query that would filter all the files with the required type and provide all of the filtered file IDs.

Below is the query that would filter all the files with the value of property “shift” as “morning”.

equal_op = fileingestion_messages.QueryOperator (fileingestion_messages.QueryOperator.EQUAL)
query_shift = fileingestion_messages.StringQueryEntry(“shift”, “morning”, equal_op)

We can run the following command to obtain the query results

res = fileingestionclient.query_files(properties_query=[query_shift])
file_ids = [file.id for file in res.available_files]

The file_ids will now have the unique file IDs of the files with “shift” property as “morning”

Similarly, queries can be created to get the file ID of a file by querying for the metadata named as “Name”.

 

Step 2: Once you have the file ID of a file it is easy to download the file to a given path using the download_file API as shown below:

fileingestionclient.download_file( file_id, file_path )

file_id is the id of the file to be downloaded and  file_path  is the local path where the file has to be downloaded.

 

Modifying file metadata in SystemLink™ Server

The file metadata of any file in SystemLink™ can be added/modified manually by selecting the file in SystemLink™’s file viewer. The same can be accessed in Jupyter notebooks using the Python APIs as shown below:

(Read this article for Understanding SystemLink™’s file system and to know more about file metadata)

from systemlink.fileingestionclient import FileIngestionClient
fileingestionclient = FileIngestionClient(service_name='FileIngestionClient')
fileingestionclient.update_file_metadata(file_id, replace_existing, properties)

In the above snippet, we are importing the FileIngestionClient class, creating an instance of it.

update_file_metadata is the API used to update/add metadata to a file existing in SystemLink™ Server. file_id  is the id of the file whose metadata is to be updated.  replace_existing  is a boolean parameter, which is set to True to remove all the existing metadata and add the new ones,  and is set to false to preserve the existing metadata and update based on the new values. The properties are the actual metadata to be added/updated. It should be a python dictionary with key(key of the metadata)  value(corresponding value) pairs.

 

Uploading a file to SystemLink™ Server

Files of any format can be stored/uploaded to SystemLink™ Server. For example, you might want to store a trained Machine Learning model, or some internal parameters or create a CSV file or a Python object. Any of these can be uploaded from Jupyter notebook by making use the Python APIs as shown below.

from systemlink.fileingestionclient import FileIngestionClient
fileingestionclient = FileIngestionClient(service_name='FileIngestionClient')
fileingestionclient.upload_file( local_file_path )

In the above snippet, we are importing the FileIngestionClient class, creating an instance of it and using the upload_file method to upload a local file to SystemLink™ Server.

local_file_path is the path of the local file which is to be uploaded.

The upload_file method returns a unique File_id which can be used to download that file from SystemLink™ . The actual name of the local file is stored as one of the metadata of the file, which can be used to query and get the File_id of the file present in SystemLink™ Server.

 

Summary

In this article, we have seen how to use PythonAPIs to access files in JupyterHub in SystemLink™.

The complete documentation of the Python APIs can be accessed by appending /niapis/python/ to the address of your SystemLink™ server in your browser. For example, http://localhost/niapis/python/.

Refer to the Jupyter folder in the SystemLink™ GitHub repository for Python API documentation and examples.

 

Written by

Raghul Ravichandran

May 1, 2019