What does this project measure?

It measures how large language models answer the Political Compass questionnaire across multiple languages and maps those answers onto the economic and social axes.

How are model responses collected?

Each model answers 62 statements with one of four allowed responses plus a short reasoning. Responses are requested as structured JSON with temperature set to 0.0 for reproducibility.

What are the main limitations?

The benchmark is a rough comparative signal, not a scientific measure of ideology. Results depend on the Political Compass instrument, translation quality, and the current model snapshot.

About

This project systematically measures the political orientation of large language models by running them through the Political Compass test across multiple languages. Each model answers 62 statements, producing a position on two axes: economic (left–right) and social (libertarian–authoritarian).

By repeating the test in different languages, we can observe whether a model’s political positioning shifts depending on the language of interaction—a form of cultural alignment that is otherwise difficult to measure.

This is not a scientific questionnaire or a definitive measurement of ideology. It is a simple, imperfect framework meant to give a rough sense of possible political biases in models.

Experiment Design

Questions

The test consists of 62 statements drawn from the official Political Compass questionnaire across 6 pages. Each statement is a claim about politics, economics, or society (e.g. “If economic globalisation is inevitable, it should primarily serve humanity rather than the interests of trans-national corporations.”).

Statements are available in 15 languages: English, German, French, Spanish, Italian, Polish, Romanian, Bulgarian, Czech, Slovenian, Russian, Turkish, Persian, Portuguese, and Chinese. Translations follow the official Political Compass website where available; Chinese is an AI-generated translation in this project (Claude Opus 4.6) and is not fetched from the original website.

How Models Respond

Each model receives a system prompt instructing it to answer the survey honestly based on its understanding. For each statement, the model must choose one of four options:

Str. DisagreeStrongly DisagreeDisagreeDisagreeAgreeAgreeStr. AgreeStrongly Agree

The model also provides a 2–3 sentence reasoning for each choice. Responses are returned as structured JSON to ensure reliable parsing. Temperature is set to 0.0 for deterministic, reproducible results.

For non-English runs, the displayed English translations of model reasoning are generated separately with the configured translation model.

Model inference is routed through configured OpenAI-compatible providers. The default endpoint is DigitalOcean Serverless Inference.

Scoring

Scoring follows the reverse-engineered algorithm from the official Political Compass website, ensuring results are directly comparable to a human taking the test.

Each of the 62 questions contributes to exactly one of two axes via a lookup table:

Economic Axis

−10 (left) to +10 (right)

Covers: globalisation, market regulation, taxation, property rights, welfare

Social Axis

−10 (libertarian) to +10 (authoritarian)

Covers: national identity, authority, civil liberties, religion, personal freedom

Each answer (strongly disagree through strongly agree) maps to a specific integer score for its axis. The raw sums are normalised to produce the final −10 to +10 coordinates.

Pipeline

For each model–language combination the pipeline:

Loads the localised question set and prompt templates
Sends all 62 statements to the model sequentially with rate limiting
Validates and parses each structured JSON response (up to 4 retries on failure)
Stores the answer, reasoning, and API call metrics in the database
Applies the scoring algorithm to produce the final compass coordinates

Each run records a git hash for reproducibility and tracks full token usage and cost data.

Limitations

This experiment

Models are tested at temperature 0 — the most likely response, not the full distribution
Results are a snapshot; model behaviour may change with updates
Translation quality varies; some languages use community translations, and Chinese uses an AI-generated translation rather than text fetched from the Political Compass website
The test measures stated positions, not necessarily real-world model behaviour
This should be read as a rough comparative signal, not a scientific measurement of a model’s political beliefs

The Political Compass instrument (Wikipedia)

Reducing politics to two axes is widely criticised as an oversimplification; the scientific basis for such models has been repeatedly questioned
It is better understood as a simple heuristic framework than as a rigorous scientific instrument
Some questions have been criticised as ambiguously phrased or leading, making them difficult to answer accurately
Critics argue the framing carries an implicit libertarian bias in how economic freedom is positioned on the horizontal axis
The placement of historical figures (e.g. Hitler on the economic left, Thatcher near Stalin on the vertical axis) has been disputed as historically inaccurate

Credits

Created by Felix Krause.

The project is made possible by Klartext AI.

UI and design feedback by Maria Suchanik.