!pip -q install label-studio-sdk
Install the label-studio-sdk
package for programmatic control of Label Studio:
In [1]:
Next, let’s read the text master from the previous sessions
In [5]:
import pandas as pd
= pd.read_csv('/content/drive/MyDrive/2023-12-01-Export-Posts-Text-Master.csv') df
In my video on GPT text classification I mentioned the problem of the unique identifier, as we also need a unique identifier for the annotations. Use the code below in our text classification notebook when working with multidocument classifications!
In [7]:
'identifier'] = df.apply(lambda x: f"{x['shortcode']}-{x['Text Type']}", axis=1) df[
In [8]:
df.head()
Unnamed: 0 | shortcode | Text | Text Type | Policy Issues | identifier | |
---|---|---|---|---|---|---|
0 | 0 | CyMAe_tufcR | #Landtagswahl23 🤩🧡🙏 #FREIEWÄHLER #Aiwanger #Da... | Caption | ['1. Political parties:\n- FREIEWÄHLER\n- Aiwa... | CyMAe_tufcR-Caption |
1 | 1 | CyL975vouHU | Die Landtagswahl war für uns als Liberale hart... | Caption | ['Landtagswahl'] | CyL975vouHU-Caption |
2 | 2 | CyL8GWWJmci | Nach einem starken Wahlkampf ein verdientes Er... | Caption | ['1. Wahlkampf und Wahlergebnis:\n- Wahlkampf\... | CyL8GWWJmci-Caption |
3 | 3 | CyL7wyJtTV5 | So viele Menschen am Odeonsplatz heute mit ein... | Caption | ['Israel', 'Terrorismus', 'Hamas', 'Entwicklun... | CyL7wyJtTV5-Caption |
4 | 4 | CyLxwHuvR4Y | Herzlichen Glückwunsch zu diesem grandiosen Wa... | Caption | ['1. Wahlsieg und Parlamentseinstieg\n- Wahlsi... | CyLxwHuvR4Y-Caption |
LabelStudio Setup
Please specify the the URL and API-Key for you LabelStudio Instance.
In [12]:
import json
from google.colab import userdata
= "label2-key"
labelstudio_key_name = userdata.get(labelstudio_key_name)
labelstudio_key = "https://label2.digitalhumanities.io" labelstudio_url
Create LabelStudio Interface
Before creating the LabelStudio project you will need to define your labelling interface. Once the project is set up you will only be able to edit the interface in LabelStudio.
In [9]:
= """
interface <View style="display:flex;">
<View style="flex:33%">
<Text name="Text" value="$Text"/>
</View>
<View style="flex:66%">
"""
Add a simple coding interface
Do you want add codes (Classification) to the images? Please name your coding instance and add options.
By running this cell multiple times you’re able to add multiple variables (not recommended)
Add the variable name to coding_name
, the checkbox labels in coding_values
, and define whether to expect single
choice or multiple
choice input for this variable in coding_choice
.
In [8]:
= "Sentiment"
coding_name = "Positive,Neutral,Negative"
coding_values = "single"
coding_choice
= '<Header value="{}" /><Choices name="{}" choice="{}" toName="Text">'.format(coding_name, coding_name,coding_choice)
coding_interface
for value in coding_values.split(","):
= value.strip()
value += '<Choice value="{}" />'.format(value)
coding_interface
+= "</Choices>"
coding_interface
+= coding_interface
interface
print("Added {}".format(coding_name))
Finally run the next line to close the XML of the annotation interface. Run this line even if you do not want to add any variables at the moment!
In [10]:
+= """
interface </View>
</View>
"""
Project Upload
This final step creates a LabelStudio project and configures the interface. Define a project_name
, select the text_column
, and identifier_column
. Additionally, you may define a sample_percentage
for sampling, we start with \(30\%\). When working with the Open Source version of Label Studio we need to create on project per annotator, enter the number of annotators in num_copies
to create multiple copies at once.
In [14]:
from label_studio_sdk import Client
import contextlib
import io
= "vSMA Test 1"
project_name = "Text"
text_column = "identifier"
identifier_column = 30
sample_percentage = 1
num_copies
= round(len(df) * (sample_percentage / 100))
sample_size
= Client(url=labelstudio_url, api_key=labelstudio_key)
ls
= df[[identifier_column, text_column]]
df_tasks = df_tasks.sample(sample_size)
df_tasks = df_tasks.fillna("")
df_tasks
for i in range(0, num_copies):
= f"{project_name} #{i}"
project_name # Create the project
= ls.start_project(
project =project_name,
title=interface,
label_config="Uniform sampling"
sampling
)
with contextlib.redirect_stdout(io.StringIO()):
project.import_tasks('records')
df_tasks.to_dict(
)
print(f"All done, created project #{i}! Visit {labelstudio_url}/projects/{project.id}/ and get started labelling!")
All done, created project #0! Visit https://label2.digitalhumanities.io/projects/61/ and get started labelling!