Label Studio for Visual Annotations

Author

Michael Achmann-Denkler

Published

January 29, 2024

The setup for visual annotation projects (as well as videos and audio), includes an additional step: We need to store our media files somewhere accessible for labelstudio. My choice is Google Cloud Buckets, as the setup is (relatively) easy. Additionally, the coupons provided by Google are sufficient to pay for the expenses of hosting our images on the Google Cloud. The manual below is almost identical to my Medium story “How to Accelerate your Annotation Process for Visual Social Media Analysis with Label Studio”. Please follow the steps outline in the story (first paragraphs and screenshots) to obtain your cloud credential json file. Alternatively use the credentials provided on GRIPS for our course.

Cloud Bucket Setup & IAM

Important in contrast to the medium story, one additional step is necessary as pointed out by the Label Studio manual: we still need to configure the CORS of our bucket.

Open any page inside your cloud project. Click on the right hand terminal symbol (yellow, 1) first. The terminal opens and takes a moment to spin up. In the meantime visit the Label Studio manual to copy the values.

Copy the values from the documentation into the terminal. Hit Enter. Replace the bucketname with yours in the second step.

Grant acces. Wait a few seconds. Your images are now accessible from Label Studio

Creating the Annotation Project

First lets install the packages:

!pip -q install label-studio-sdk gcloud
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/454.4 kB ? eta -:--:--     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━ 348.2/454.4 kB 10.2 MB/s eta 0:00:01     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 454.4/454.4 kB 9.5 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
  Building wheel for gcloud (setup.py) ... done

Next, setup Google Cloud. Please specify the file path for the credentials file in order to upload images to google cloud bucket (provided via GRIPS or your own).

#@title ## Gcloud Setup
#@markdown 

import json
from gcloud import storage
from oauth2client.service_account import ServiceAccountCredentials

gcloud_credentials_path = '/content/vsma-course-2324-72da2075ad3a.json' #@param {type: "string"}
gcloud_bucket = 'label-studio-vsma' #@param {type: "string"}

with open(gcloud_credentials_path, 'rb') as f:
  credentials_dict = json.loads(f.read())

credentials = ServiceAccountCredentials.from_json_keyfile_dict(
    credentials_dict
)
client = storage.Client(credentials=credentials, project='local-grove-153811')
bucket = client.get_bucket(gcloud_bucket)

Let’s read the dataframe from previous sessions

import pandas as pd

df = pd.read_csv('/content/drive/MyDrive/2024-01-19-AfD-Stories-Exported.csv')
df.head()
Unnamed: 0.3 Unnamed: 0.2 Unnamed: 0.1 Unnamed: 0 ID Time of Posting Type of Content video_url image_url Username ... Is Verified Stickers Accessibility Caption Attribution URL image_path OCR Objects caption Vertex Caption Ensemble
0 0 0 0 1 2125373886060513565_1484534097 2019-09-04 08:05:27 Image NaN NaN afd.bund ... True [] Photo by Alternative für Deutschland on Septem... NaN /content/media/images/afd.bund/212537388606051... FACEBOOK\nAfD\nf\nSwipe up\nund werde Fan! NaN a collage of a picture of a person flying a kite an ad for facebook shows a drawing of a facebo... Digital and Social Media Campaigning
1 1 1 1 2 2125374701022077222_1484534097 2019-09-04 08:07:04 Image NaN NaN afd.bund ... True [] Photo by Alternative für Deutschland on Septem... NaN /content/media/images/afd.bund/212537470102207... YOUTUBE\nAfD\nSwipe up\nund abonniere uns! NaN a poster of a man with a red face an advertisement for youtube with a red backgr... Digital and Social Media Campaigning
2 2 2 2 3 2490851226217175299_1484534097 2021-01-20 14:23:30 Image NaN NaN afd.bund ... True [] Photo by Alternative für Deutschland on Januar... NaN /content/media/images/afd.bund/249085122621717... TELEGRAM\nAfD\nSwipe up\nund folge uns! NaN a large blue and white photo of a plane an advertisement for telegram with a blue back... Digital and Social Media Campaigning
3 3 3 3 4 2600840011884997131_1484534097 2021-06-21 08:31:45 Image NaN NaN afd.bund ... True [] Photo by Alternative für Deutschland on June 2... NaN /content/media/images/afd.bund/260084001188499... Pol\nBeih 3x Person, 1x Chair, 1x Table, 1x Picture frame a woman sitting at a desk with a laptop two women are sitting at a table talking to ea... Public Engagement
4 4 4 4 5 2600852794831609459_1484534097 2021-06-21 08:57:09 Image NaN NaN afd.bund ... True [] Photo by Alternative für Deutschland in Berlin... NaN /content/media/images/afd.bund/260085279483160... BERLIN, GERMANY\n2160 25.000\nMON 422 150M\nA0... 4x Person, 1x Furniture, 1x Television a man sitting in front of a screen with a tv a camera is recording a man sitting at a table... Traditional Media Campaigning

5 rows × 23 columns

And let’s unzip the images

!unzip /content/drive/MyDrive/2024-01-19-AfD-Stories-Exported.zip

Upload files to Cloud Bucket

We’re using the naming convention {cloud-bucket}/{username}/{id}.jpg. The naming convention is important, as we will use it later on to map the manual and computational annotations into one dataframe. (See Identifier in the text annotation project).

df["Image"] =  df.apply(lambda row: "gs://{}/{}/{}.jpg".format(gcloud_bucket, row['Username'], row['ID']), axis=1)
from tqdm.notebook import tqdm

df["Image"] = "gs://{}/{}/{}.jpg".format(gcloud_bucket, df['Username'], df['ID'])

uploaded_count = 0
skipped_count = 0

# Use tqdm for progress bar
for row in tqdm(df.itertuples(), total=len(df), desc="Uploading Images"):
    filename = "{}/{}.jpg".format(row.Username, row.ID)
    source_filename = row.image_path
    blob = bucket.blob(filename)

    if not blob.exists(client):
        try:
            blob.upload_from_filename(source_filename)
            uploaded_count += 1
        except FileNotFoundError:
            print(f"Uploading {source_filename} failed: Missing File")
    else:
        skipped_count += 1

print()
print(f"Uploaded {uploaded_count} images successfully, skipped {skipped_count} existing files.")
Uploading /content/media/images/afd.bund/2632909594311219564_1484534097.jpg failed: Missing File
Uploading /content/media/images/afd.bund/2637169242765597715_1484534097.jpg failed: Missing File
Uploading /content/media/images/afd.bund/2637310044636651340_1484534097.jpg failed: Missing File
Uploading /content/media/images/afd.bund/2640856259194124126_1484534097.jpg failed: Missing File
Uploading /content/media/images/afd.bund/2643802824089930195_1484534097.jpg failed: Missing File
Uploading /content/media/images/afd.bund/2653863205891438589_1484534097.jpg failed: Missing File
Uploading /content/media/images/afd.bund/2664113842957989541_1484534097.jpg failed: Missing File
Uploading /content/media/images/afd.bund/2671444844831156334_1484534097.jpg failed: Missing File

Uploaded 1 images successfully, skipped 171 existing files.

LabelStudio Setup

Please specify the the URL and API-Key for you LabelStudio Instance.

import json
from google.colab import userdata

labelstudio_key_name = "label2-key" #@param {type: "string"}
labelstudio_key = userdata.get(labelstudio_key_name)
labelstudio_url = "https://label2.digitalhumanities.io" #@param {type: "string"}

Create LabelStudio Interface

Before creating the LabelStudio project you will need to define your labelling interface. Once the project is set up you will only be able to edit the interface in LabelStudio.

interface = """
<View style="display:flex;">
  <View style="flex:33%">
    <Image name="Image" value="$Image"/>
  </View>
  <View style="flex:66%">
"""

Add a simple coding interface

Do you want add codes (Classification) to the images? Please name your coding instance and add options.
By running this cell multiple times you’re able to add multiple variables (not recommended)

Add the variable name to coding_name, the checkbox labels in coding_values, and define whether to expect single choice or multiple choice input for this variable in coding_choice.

#@title ### Codes
#@markdown Do you want add codes (Classification) to the images? Please name your coding instance and add options. <br/> **By running this cell multiple times you're able to add multiple variables (not recommended)**

coding_name = "Sentiment" #@param {type:"string"}
coding_values = "Positive,Neutral,Negative" #@param {type:"string"}
coding_choice = "single" #@param ["single", "multiple"]

coding_interface = '<Header value="{}" /><Choices name="{}" choice="{}" toName="Image">'.format(coding_name, coding_name,coding_choice)

for value in coding_values.split(","):
  value = value.strip()
  coding_interface += '<Choice value="{}" />'.format(value)

coding_interface += "</Choices>"

interface += coding_interface

print("Added {}".format(coding_name))
Added Sentiment

Don’t forget to run the next line! It closes the interface XML!

interface += """
        </View>
    </View>
    """

Project Upload

This final step creates a LabelStudio project and configures the interface. Define a project_name, and identifier_column. Additionally, you may define a sample_percentage for sampling, we start with \(30\%\). When working with the Open Source version of Label Studio we need to create on project per annotator, enter the number of annotators in num_copies to create multiple copies at once.

from label_studio_sdk import Client
import contextlib
import io

project_name = "vSMA Image Test 1"  #@param {type: "string"}
identifier_column = "ID"  #@param {type: "string"}
#@markdown Percentage for drawing a sample to annotate, e.g. 30%
sample_percentage = 30  #@param {type: "number", min:0, max:100}
#@markdown Number of project copies. **Start testing with 1!**
num_copies = 1  #@param {type: "number", min:0, max:3}

sample_size = round(len(df) * (sample_percentage / 100))

ls = Client(url=labelstudio_url, api_key=labelstudio_key)


# Import all tasks
df_tasks = df[[identifier_column, 'Image']]
df_tasks = df_tasks.sample(sample_size)
df_tasks = df_tasks.fillna("")

for i in range(0, num_copies):
  project_name = f"{project_name} #{i}"
  # Create the project
  project = ls.start_project(
      title=project_name,
      label_config=interface,
      sampling="Uniform sampling"
  )
  # Configure Cloud Storage (in order to be able to view the images)
  project.connect_google_import_storage(bucket=gcloud_bucket, google_application_credentials=json.dumps(credentials_dict))


  with contextlib.redirect_stdout(io.StringIO()):
    project.import_tasks(
          df_tasks.to_dict('records')
        )

  print(f"All done, created project #{i}! Visit {labelstudio_url}/projects/{project.id}/ and get started labelling!")
All done, created project #0! Visit https://label2.digitalhumanities.io/projects/71/ and get started labelling!
Source: Create Label Studio Project (Images)

Annotation Interface

The interface created using the notebook above is very basic. Refer to this manual for creating sophisticated labelling interfaces. In contrast to textual annotations, we need to add the <Image name="Image" value="$Image"/> tag as an object to be annotated. The $Image variable should be equal to the column name where we add the Google Cloud Bucket URL in the dataframe (see above).

Conclusion

This article provided additional information to automatically create image annotation projects. The provided notebook may easily be modified to handle videos and audio files. The code to upload the media files to the cloud bucket would stay the same, we’d have to modify the filenames (for the proper suffixes), and change the labelling interface to be compatible with video or audio files (see Label Studio documentation). This article is an alternative path of my annotation manual, which offers more background information on human annotations.

Reuse

Citation

BibTeX citation:
@online{achmann-denkler2024,
  author = {Achmann-Denkler, Michael},
  title = {Label {Studio} for {Visual} {Annotations}},
  date = {2024-01-29},
  url = {https://social-media-lab.net/evaluation/label-studio-images.html},
  doi = {10.5281/zenodo.10039756},
  langid = {en}
}
For attribution, please cite this work as:
Achmann-Denkler, Michael. 2024. “Label Studio for Visual Annotations.” January 29, 2024. https://doi.org/10.5281/zenodo.10039756.