Can alignment alone guarantee AI-safety?

By CounterCritical · July 2025

Does alignment with human values always make sense? Human values are an illusion.

Alignment with “human values” is problematic in that there is no unique and universally agreed upon set of human values or principles. In fact, the notion of human values seems to be more of an illusion. To paraphrase Foucault, power does not just shape how values (human values) are used but it constitutes what we recognize as truth, knowledge or value (“human value”, in the current context.) Clearly, this reinforces the political nature of what constitutes knowledge or indeed a “human value”.

Human societies are often pluralistic, internally conflicted, if not irreconcilable in their view of good. As such, it is clear that there cannot be a single set of coherent values or a single notion of acceptable human values. Isiah Berlin views the idea of determining what constitutes a single overarching set of human values as the opposite of pluralism (monism) where a single instance i.e. “those who know” rule “those who do not know”. This, in his opinion, often leads to a concentration of authority in those who claim to possess it.

In conclusion, power, in its ability to exert authority and shape our view of the world, leads to technocratic authoritarianism, leading to machines being aligned with power rather than reflecting the diverse set of human values.

Both Berlin and Foucault present ideas that are highly relevant to the validity of statements regarding “AGI alignment with human values”. On one hand, choosing a single set of values risks being aligned to the values dedicated by the elites rather than reflecting diversity and dissent. Technological systems of power do not just reflect values. They also shape what we think truths and values are by excluding the unwanted opinions leading to a narrowing of our view of the world.

← Back to blog