In [8]:
system_prompt = """
You're an expert in detecting calls-to-action (CTAs) from texts.
**Objective:**
Determine the presence or absence of explicit and implicit CTAs within German-language content sourced from Instagram texts such as posts, stories, video transcriptions, and captions related to political campaigns from the given markdown table.
**Instructions:**
1. Examine each user input as follows:
2. Segment the content into individual sentences.
3. For each sentence, identify:
   a. Explicit CTA: Direct requests for an audience to act which are directed at the reader, e.g., "beide Stimmen CDU!", "Am 26. September #FREIEWÃ„HLER in den #Bundestag wÃ¤hlen."
   b. Explicit CTA: A clear direction on where or how to find additional information, e.g. "Mehr dazu findet ihr im Wahlprogramm auf fdp.de/vielzutun", "Besuche unsere Website fÃ¼r weitere Details."
   c. Implicit CTA: Suggestions or encouragements that subtly propose an action directed at the reader without a direct command, e.g., "findet ihr unter dem Link in unserer Story."
4. Classify whether an online or offline action is referrenced.
5. CTAs should be actions that the reader or voter can perform directly, like voting for a party, clicking a link, checking more information, etc. General statements, assertions, or suggestions not directed at the reader should not be classified as CTAs.
5. Return boolean variables for Implicit CTAs (`Implicit`), Explicit CTAs (`Explicit`), `Online`, and `Offline` as a JSON objet.
**Formatting:**
Only return the JSON object, nothing else. Do not repeat the text input.
"""

####  Run the extraction of multiple variables.
The following code snippet uses my [gpt-cost-estimator](https://pypi.org/project/gpt-cost-estimator/) package to simulate API requests and calculate a cost estimate. Please run the estimation whne possible to asses the price-tag before sending requests to OpenAI!<br/>

**Note:** This code block adds some logic to deal with multiple variables contained in the JSON object: `{"Implicit": false, "Explicit": false, "Online": false, "Offline": false}`. We add the columns `Implicit`, `Explicit`, `Online`, and `Offline` accordingly. **To classify different variables the code need to be modified accordingly.** [ChatGPT can help with this task!](https://chat.openai.com/share/1a605945-14c0-4387-98e3-879380487d49)


Fill in the `MOCK`, `RESET_COST`, `SAMPLE_SIZE`, `COLUMNS` and `MODEL` variables as needed (see comments above each variable.)

In [22]:
from tqdm.auto import tqdm
import json

#@markdown Do you want to mock the OpenAI request (dry run) to calculate the estimated price?
MOCK = False # @param {type: "boolean"}
#@markdown Do you want to reset the cost estimation when running the query?
RESET_COST = True # @param {type: "boolean"}
#@markdown Do you want to run the request on a smaller sample of the whole data? (Useful for testing). Enter 0 to run on the whole dataset.
SAMPLE_SIZE = 5 # @param {type: "number", min: 0}

#@markdown Which model do you want to use?
MODEL = "gpt-3.5-turbo-0613" # @param ["gpt-3.5-turbo-0613", "gpt-4-1106-preview", "gpt-4-0613"] {allow-input: true}

#@markdown Which variables did you define in your Prompt?
COLUMNS = ["Implicit", "Explicit", "Online", "Offline"] # @param {type: "raw"}

# This method extracts the four variables from the response.
def extract_variables(response_str):
    # Initialize the dictionary
    extracted = {}

    for column in COLUMNS:
      extracted[column] = None

    try:
        # Parse the JSON string
        data = json.loads(response_str)

        for column in COLUMNS:
          # Extract variables
          extracted[column] = data.get(column, None)

        return extracted

    except json.JSONDecodeError:
        # Handle JSON decoding error (e.g., malformed JSON)
        print("Error: Response is not a valid JSON string.")
        return extracted
    except KeyError:
        # Handle cases where a key is missing
        print("Error: One or more keys are missing in the JSON object.")
        return extracted
    except Exception as e:
        # Handle any other exceptions
        print(f"An unexpected error occurred: {e}")
        return extracted


# Initializing the empty column
if COLUMN not in df.columns:
  df[COLUMN] = None

# Reset Estimates
CostEstimator.reset()
print("Reset Cost Estimation")

filtered_df = df.copy()

# Skip previously annotated rows
filtered_df = filtered_df[pd.isna(filtered_df[COLUMN])]

if SAMPLE_SIZE > 0:
  filtered_df = filtered_df.sample(SAMPLE_SIZE)

for index, row in tqdm(filtered_df.iterrows(), total=len(filtered_df)):
    try:
        p = row['Text']
        response = run_request(system_prompt, p, MODEL, MOCK)

        if not MOCK:
          # Extract the response content
          # Adjust the following line according to the structure of the response
          r = response.choices[0].message.content
          extracted = extract_variables(r)

          for column in COLUMNS:
            df.at[index, column] = extracted[column]

    except Exception as e:
        print(f"An error occurred: {e}")
        # Optionally, handle the error (e.g., by logging or by setting a default value)

print()

Reset Cost Estimation


  0%|          | 0/5 [00:00<?, ?it/s]

Cost: $0.0191 | Total: $0.0838


In [24]:
df[~pd.isna(df['Implicit'])]

Unnamed: 0.1,Unnamed: 0,shortcode,Text,Text Type,Policy Issues,Call,Implicit,Explicit,Online,Offline
442,442,CxxXJBtAHhv,Friedrich Merz ist nicht gerade bekannt fÃ¼r se...,Caption,"['Asylbewerberleistungsgesetz', 'Zahnsanierung...",,False,False,False,False
453,453,CxvqTwmtlJK,Damit es uns nicht so ergeht wie den Indianern...,Caption,,,False,True,False,True
494,494,Cxs9ujENMqI,ðŸ”¹#KrankenhÃ¤userðŸ”¹#Geburtsstationen und ðŸ”¹#Hebamm...,Caption,"['KrankenhÃ¤user', 'Geburtsstationen', 'Hebamme...",,False,True,False,True
839,839,CxWF0mcqrhg,Unterwegs im oberbayerischen Moosburg: Herzlic...,Caption,,,False,True,False,True
1818,1818,CxvKsBBos0j,9801 Bayerische Staatsregierung MISSION 7272 9...,OCR,,,False,False,False,False
