= """
system_prompt You're an expert in detecting calls-to-action (CTAs) from texts.
**Objective:**
Determine the presence or absence of explicit and implicit CTAs within German-language content sourced from Instagram texts such as posts, stories, video transcriptions, and captions related to political campaigns from the given markdown table.
**Instructions:**
1. Examine each user input as follows:
2. Segment the content into individual sentences.
3. For each sentence, identify:
a. Explicit CTA: Direct requests for an audience to act which are directed at the reader, e.g., "beide Stimmen CDU!", "Am 26. September #FREIEWÄHLER in den #Bundestag wählen."
b. Explicit CTA: A clear direction on where or how to find additional information, e.g. "Mehr dazu findet ihr im Wahlprogramm auf fdp.de/vielzutun", "Besuche unsere Website für weitere Details."
c. Implicit CTA: Suggestions or encouragements that subtly propose an action directed at the reader without a direct command, e.g., "findet ihr unter dem Link in unserer Story."
4. Classify whether an online or offline action is referrenced.
5. CTAs should be actions that the reader or voter can perform directly, like voting for a party, clicking a link, checking more information, etc. General statements, assertions, or suggestions not directed at the reader should not be classified as CTAs.
5. Return boolean variables for Implicit CTAs (`Implicit`), Explicit CTAs (`Explicit`), `Online`, and `Offline` as a JSON objet.
**Formatting:**
Only return the JSON object, nothing else. Do not repeat the text input.
"""
Run the extraction of multiple variables.
In [8]:
The following code snippet uses my gpt-cost-estimator package to simulate API requests and calculate a cost estimate. Please run the estimation whne possible to asses the price-tag before sending requests to OpenAI!
Note: This code block adds some logic to deal with multiple variables contained in the JSON object: {"Implicit": false, "Explicit": false, "Online": false, "Offline": false}
. We add the columns Implicit
, Explicit
, Online
, and Offline
accordingly. To classify different variables the code need to be modified accordingly. ChatGPT can help with this task!
Fill in the MOCK
, RESET_COST
, SAMPLE_SIZE
, COLUMNS
and MODEL
variables as needed (see comments above each variable.)
In [22]:
from tqdm.auto import tqdm
import json
#@markdown Do you want to mock the OpenAI request (dry run) to calculate the estimated price?
= False # @param {type: "boolean"}
MOCK #@markdown Do you want to reset the cost estimation when running the query?
= True # @param {type: "boolean"}
RESET_COST #@markdown Do you want to run the request on a smaller sample of the whole data? (Useful for testing). Enter 0 to run on the whole dataset.
= 5 # @param {type: "number", min: 0}
SAMPLE_SIZE
#@markdown Which model do you want to use?
= "gpt-3.5-turbo-0613" # @param ["gpt-3.5-turbo-0613", "gpt-4-1106-preview", "gpt-4-0613"] {allow-input: true}
MODEL
#@markdown Which variables did you define in your Prompt?
= ["Implicit", "Explicit", "Online", "Offline"] # @param {type: "raw"}
COLUMNS
# This method extracts the four variables from the response.
def extract_variables(response_str):
# Initialize the dictionary
= {}
extracted
for column in COLUMNS:
= None
extracted[column]
try:
# Parse the JSON string
= json.loads(response_str)
data
for column in COLUMNS:
# Extract variables
= data.get(column, None)
extracted[column]
return extracted
except json.JSONDecodeError:
# Handle JSON decoding error (e.g., malformed JSON)
print("Error: Response is not a valid JSON string.")
return extracted
except KeyError:
# Handle cases where a key is missing
print("Error: One or more keys are missing in the JSON object.")
return extracted
except Exception as e:
# Handle any other exceptions
print(f"An unexpected error occurred: {e}")
return extracted
# Initializing the empty column
if COLUMN not in df.columns:
= None
df[COLUMN]
# Reset Estimates
CostEstimator.reset()print("Reset Cost Estimation")
= df.copy()
filtered_df
# Skip previously annotated rows
= filtered_df[pd.isna(filtered_df[COLUMN])]
filtered_df
if SAMPLE_SIZE > 0:
= filtered_df.sample(SAMPLE_SIZE)
filtered_df
for index, row in tqdm(filtered_df.iterrows(), total=len(filtered_df)):
try:
= row['Text']
p = run_request(system_prompt, p, MODEL, MOCK)
response
if not MOCK:
# Extract the response content
# Adjust the following line according to the structure of the response
= response.choices[0].message.content
r = extract_variables(r)
extracted
for column in COLUMNS:
= extracted[column]
df.at[index, column]
except Exception as e:
print(f"An error occurred: {e}")
# Optionally, handle the error (e.g., by logging or by setting a default value)
print()
Reset Cost Estimation
Cost: $0.0191 | Total: $0.0838
In [24]:
~pd.isna(df['Implicit'])] df[
Unnamed: 0 | shortcode | Text | Text Type | Policy Issues | Call | Implicit | Explicit | Online | Offline | |
---|---|---|---|---|---|---|---|---|---|---|
442 | 442 | CxxXJBtAHhv | Friedrich Merz ist nicht gerade bekannt für se... | Caption | ['Asylbewerberleistungsgesetz', 'Zahnsanierung... | None | False | False | False | False |
453 | 453 | CxvqTwmtlJK | Damit es uns nicht so ergeht wie den Indianern... | Caption | NaN | None | False | True | False | True |
494 | 494 | Cxs9ujENMqI | 🔹#Krankenhäuser🔹#Geburtsstationen und 🔹#Hebamm... | Caption | ['Krankenhäuser', 'Geburtsstationen', 'Hebamme... | None | False | True | False | True |
839 | 839 | CxWF0mcqrhg | Unterwegs im oberbayerischen Moosburg: Herzlic... | Caption | NaN | None | False | True | False | True |
1818 | 1818 | CxvKsBBos0j | 9801 Bayerische Staatsregierung MISSION 7272 9... | OCR | NaN | None | False | False | False | False |