How to cleanup JFrog Artifactory artifacts using Python script

How to cleanup JFrog Artifactory artifacts using Python script

jfrog_artifactory_cleanup

Here in this article we will see how we can approach to clean a JFrog Artifactory with artifacts which haven’t been used or modified than some defined period of time which can be specific to an organizations as per the policies. We will be using the Python as a language along with some popular modules like requests and json to achieve our task in this article.

Test Environment

Fedora 32 installed
Python3 installed

Every project which follows a DevOps CICD pipeline to build there projects use some tools for their source code management and artifacts management. As CICD pipeline helps in building code, testing and releasing it for production in short duration, there are multiple builds and testing happening before the actual artifact gets ready for production deployment. In this process of multiple builds which are pushed to an artifactory repository they get accumulated in long time. At some stage, it would be required for the organizations to clean up their artifactory repository to get rid of old unused artifactory builds or tags and free up space on the artifactory server.

So, lets get started to see how we can clean JFrog Artifactory artifacts using Python script.

Procedure

Step1: Import the required libraries

Here we are going to use the below four libraries for our artifacts clean up task. The ‘requests’ module is used to send HTTP/HTTPS requests onto the JFrog Artifactory to fetch details about the repositories and artifacts. Once the required details are fetched we are converting the response into JSON format using the ‘json’ module to filter the required fields from the JSON response. The datetime module is used to capture the current datetime and format the datetime from the JSON response for the artifacts last modified response which we will use further to set a condition for artifacts deletion.

import requests
import json
import datetime
import sys
import dateutil.parser

Step2: Set the Environment variables

Here we are setting the below environment variable which capture the artifactory url, header we want to pass as a part of request, binary data that we want to pass as a part of the request. We are also setting the username and password which will be passed as auth data to the HTTP request along with the current_date and days_older variable for preparing our clean condition.

url = 'https://jfrog_server_fqdn/artifactory'
headers = {"content-type": "text/plain"}
data = 'items.find({"repo": "repository_name", "name": "manifest.json"})'
username = 'username'
password = 'password'
auth=(username, password)
#current_time = datetime.datetime.now()
current_date = datetime.date.today()
#current_date = datetime.datetime.utcnow().isoformat()
days_older = 730

Step3: Function to list repositories and tags

In this function we are passing the artifactory url, headers, binary data and authentication data which will be sent as a part of HTTP POST request. Also please note the REST API ‘/api/search/aql’ which we are appending to the artifactory url which is a search api using the artifactory query language we are using to search the artifactory for artifacts consisting of manifest.json file based on the binary data find condition.

Once we have the response the AQL search api query we are using that response to convert to JSON format using the ‘json’ module and fetch the results field which consist of multiple artifacts with all the details. From this result details we are fetching the repository name, path and last modified fields to further use them for the artifacts cleanup.

Also we are preparing a new url named artifact_url which consist of the artifactory along with repository name and path appended to frame the complete url for the respective artifact tag.

def list_repo_tags(url, headers, data, auth):

    response = requests.post(url+'/api/search/aql', headers=headers, data=data, auth=auth)
    #print(response.status_code)
    if response.status_code > 300:
        print("Unable to search artifacts in artifactory repository. Exiting")
        sys.exit(1)
    #print(response.text)
    #print(response.json())

    json_data = json.loads(response.text)
    tags = json_data["results"]

    for eachItem in tags:
        repo = eachItem['repo']
        path = eachItem['path']
        last_modified_date = eachItem['modified']
        #print('{}, {}, {}'.format(repo, path, last_modified))
        artifact_url = url+"/"+eachItem['repo']+"/"+eachItem['path']
        #print(artifact_url)
        delete_repo_tags(artifact_url, last_modified_date, current_date, days_older, auth)

Step4: Delete the artifacts older then days_older value

Here in this function we are using the prepared artifact_url and we formatting the current_date and formatted_date using the datetime module to only get the date in format “%Y-%m-%d”. Once we have the current_date and formatted_date in the mentioned format, we are taking a difference of these two date to find the delta number of days between them. If this delta number of days is greater than the days_older value we are taking that artifact into consideration for deletion using the requests.delete as shown below.

I have commented our the delete request as it would delete the artifacts from the repository. Please make sure that you test this code in your test environment before using it in production environment.

def delete_repo_tags(artifact_url, last_modified_date, current_date, days_older, auth):

    formated_date = dateutil.parser.isoparse(last_modified_date).date()
    #print(artifact_url)
    #print(formated_date)
    #print(current_date)

    date_format = "%Y-%m-%d"
    x = datetime.datetime.strptime(str(formated_date), date_format)
    y = datetime.datetime.strptime(str(current_date), date_format)
    num_of_days = (y - x).days
    #print(num_of_days)

    if num_of_days > days_older:
        print(artifact_url)
        print(formated_date)
        print(current_date)
        print(num_of_days)
        # requests.delete(artifact_url, auth=auth)

Step5: Calling the list_repo_tags function

Now, we have all our environment variables and function definition ready, lets call our list_repo_tags function from where the execution would be started.

list_repo_tags(url, headers, data, auth)

Step6: Complete Code

Here the complete code for your reference.

[admin@fedser32 rsk-docker]$ cat clean_up_test.py 
#!/usr/bin/env python

import requests
import json
import datetime
import sys
import dateutil.parser

### Environment variables

url = 'https://jfrog_server_fqdn/artifactory'
headers = {"content-type": "text/plain"}
data = 'items.find({"repo": "repository_name", "name": "manifest.json"})'
username = 'username'
password = 'password'
auth=(username, password)
#current_time = datetime.datetime.now()
current_date = datetime.date.today()
#current_date = datetime.datetime.utcnow().isoformat()
days_older = 730

#print('"current_time : {}"'.format(current_time))
#print('"current_date : {}"'.format(current_date))

### Delete repository tags older than days_older

def delete_repo_tags(artifact_url, last_modified_date, current_date, days_older, auth):

    formated_date = dateutil.parser.isoparse(last_modified_date).date()
    #print(artifact_url)
    #print(formated_date)
    #print(current_date)

    date_format = "%Y-%m-%d"
    x = datetime.datetime.strptime(str(formated_date), date_format)
    y = datetime.datetime.strptime(str(current_date), date_format)
    num_of_days = (y - x).days
    #print(num_of_days)

    if num_of_days > days_older:
        print(artifact_url)
        print(formated_date)
        print(current_date)
        print(num_of_days)
        # print(requests.delete(artifact_url, auth=auth))

### List repository tags

def list_repo_tags(url, headers, data, auth):

    response = requests.post(url+'/api/search/aql', headers=headers, data=data, auth=auth)
    if response.status_code > 300:
        print("Unable to search artifacts in artifactory repository. Exiting")
        sys.exit(1)
    #print(response.text)
    #print(response.json())

    json_data = json.loads(response.text)
    tags = json_data["results"]

    for eachItem in tags:
        repo = eachItem['repo']
        path = eachItem['path']
        last_modified_date = eachItem['modified']
        #print('{}, {}, {}'.format(repo, path, last_modified))
        artifact_url = url+"/"+eachItem['repo']+"/"+eachItem['path']
        #print(artifact_url)
        delete_repo_tags(artifact_url, last_modified_date, current_date, days_older, auth)
        

list_repo_tags(url, headers, data, auth)

Hope you enjoyed reading this article. Thank you..

2 COMMENTS

comments user
Danibe

Hi,

thank you very much for this article!
Please, can I ask you how do you use it for nuget repositories?

Thank you.

Regards,
Daniela

    comments user
    novicejava1

    Thanks.. As of now haven’t got an opportunity to work on nuget repositories.. But said that there should be some similar procedure to identity your repositories, list packages or artifacts and remove some artifacts older than a particular timestamp. Haven’t worked on .net anytime so not sure how it can be achieved there.