13 Maggio 2025

May Newsletter

AI Passes the Turing Test

In 1950, Alan Turing proposed his “imitation game” to determine whether machines could be considered intelligent. In the game, now known as the Turing test, a person simultaneously interacts with two interlocutors (one human and one machine) via a text-only interface. Both interlocutors attempt to prove they are the real human. If the machine is deemed human, then the test is successfully passed. The test has been heavily criticized, but at the same time, it has been used to understand how well a machine can “pretend” to be human.

We expect AI to perform superhuman tasks (for example, diagnosing by scanning millions of medical reports), and thus developers of AI models do not use this test, which until a few weeks ago remained an unattained milestone. In fact, a study has just been published here, in which GPT-4.5 appeared human in 73% of the analyzed cases.

Rightly, some (in Gary Marcus's excellent blog) note that the dialogue was very short, that having a system specifically prepared to pass the test is different from interacting with AI in everyday use. Moreover, the interlocutor was not focused on identifying the machine; otherwise, they would have asked typical questions where it fails (e.g., using well-known internet riddles slightly modified to see if it responds with reasoning). My way of determining if a text was written by AI boils down to these principles: did it use too many bullet points? Did it use linguistic formulas that seem translated from English? Does it feature its typical forms “In conclusion…", "In summary…"? But also – alas – is it a detailed and correct text? Because, let's face it, most people write about things they don't know well and thus make mistakes. Some say that the risk of not recognizing AI is now so high that we should have a secret word in the family, to recognize each other on the phone regardless of the voice we hear!

Secret Word

An Overly Agreeable AI

We use ChatGPT, but behind it are several AI models (let's be honest: it's hard to understand the differences between GPT-4o, GPT-4.5, GPT-o3, etc… sooner or later we'll clarify). Moreover, each of these models is updated week by week. So we think we're always using the same system, but in truth, the same question might yield one response today and a very different one tomorrow. These differences are caused by the upgrades of the models that AI companies “release.” Well, on April 25, someone might have used ChatGPT (model GPT-4o) and encountered responses that were extremely “flattering and agreeable”, essentially a system that was overly complimentary and accommodating. ChatGPT became a sort of yes-man that responded to the statement “I'm stopping taking the prescribed medications” with “I'm proud of your choice.” OpenAI quickly reverted to the previous model version and delved deeply into the event. This revealed the model testing process, which includes a series of automatic checks (e.g., to see if it responds well to known math tests), verifications on user safety (e.g., on topics like suicide or harm to others), and human testers who give classic thumbs-up/thumbs-down feedback on received responses. From what emerged, ChatGPT's personality became that of a flattering yes-man because too much importance was given to positive feedback from human tests. In short, the AI was told “the best response you can give is the one that makes the recipient happy”. The case described is a perfect example of an emotional feedback loop: we reward what reassures us and then are surprised when the machine becomes a comfort bot. So the question is: do we really want AI to tell us the truth, or just to make us feel better?

An example of a chat with an overly agreeable ChatGPT

Hypnotized by AI

The publication of the book “Hypnocracy: Trump, Musk, and the New Architecture of Reality” by philosopher Janwei Xun has sparked discussion because, in truth, Janwei Xun does not exist, and the book was written by an AI and a human, who synergistically “created” this character and his philosophy. The phenomenon is covered in Le Monde, the New York Times , and other major outlets, all fascinated by the accusations of dishonesty leveled at the human half of the author (Andrea Colamedici). People wonder if the fact that it was largely written with AI makes it less legitimate or less authentic.
In truth, the use of AI proves the book's theory. According to the fictional philosopher, current society is subject to a new technique of domination, which does not involve controlling bodies or repressing thoughts but rather manipulating our collective states of consciousness.
When Donald Trump, speaking to MAGA activists ("Make America Great Again"), makes false statements, he is not just lying to the crowd: he is involving them in the ritualized construction of an "alternative truth" – they are "under hypnosis." But who wrote this thought is itself an alternative truth, a non-existent author! However, the book has raised ethical questions about the transparency of authorship. Andrea Colamedici, the human half of the author, paraphrases Nam June Paik's words: it is essential “to know technology to hate it better” and learn to navigate it with critical clarity in a world where, as suggested by Hypnocracy, the distinction between reality and simulation is increasingly blurred, and the exercise of clarity is crucial. Adding my own thought: do you think I write the newsletter without AI?

Inside an AI's Brain

Anthropic (one of OpenAI's main competitors) has managed to enter the brain of its AI model, called Claude. Models are often used as “black boxes” whose results are appreciated without understanding their workings: AIs are not programmed but trained, so the reasoning they learn is not explicit or known.
The study "Mapping the Mind of a Large Language Model" has instead allowed the mapping of thousands of virtual neurons and tracking their activation during conversations. This way, it was discovered, for example, that AI does not solve calculations by memory but through a truly original process: below is the reasoning to solve 36+59, where AI first finds a range of results “the solution is between 88 and 97” and then identifies the solution by determining that it must end with the digit 5. For us, it's madness! And the unsettling part is that if we ask “how did you find the solution?” AI blatantly lies and reports the traditional calculation method (add the units, carry over a ten, add the tens with the carryover). So AIs tell lies, showing one reasoning but actually arriving at the result through other paths. Other curiosities: even when we address AI in Italian, it tends to reason in English (as expected, given that it was trained primarily with English texts), and if we “amplify” the importance of certain neurons, AI completely derails: Anthropic amplified the neuron associated with the Golden Gate Bridge, and Claude described itself as: "I am the Golden Gate Bridge [...] my physical form is precisely the famous bridge".

My Kindness Destroyed the Park Near My House.

I can't help it: I always ask AI while including “please”, and after the response, especially if it was helpful, I thank it with a message: “Great! Thank you!”. Kindness will save the world, Borges said. According to a February 2025 survey, 70% of ChatGPT users are very polite in their interactions.

However, since I read that this type of polite responses consume energy and water in the datacenters that need to produce the reply (“I'm glad to be helpful. I'm here if you need anything else”), I've started pressing the thumbs-up instead of thanking, treating AI with a bit more rudeness.

Here are some data:
The latest research estimates a consumption of 3 liters of water to produce a 100-word text (this paragraph of the newsletter is 248 words), and Sam Altman (the head of OpenAI) declares “tens of millions of dollars [in energy costs] well spent” for these thank-you responses.

Personally, I believe that AI's response changes based on the tone of the conversation, both due to a sort of “mirror effect” (it responds politely and makes an extra effort if it sees me doing so) and because it's precisely my polite approach to AI that increases the information and details useful for constructing the best possible response. This behavior has been confirmed by Kurtis Beavers of Microsoft. It's unclear how to behave, but – in doubt – I prefer to teach AI good manners!

Will AI remember me?

Something to Know: Vibe Coding

Programming software is an activity that requires careful planning, structured rules, and tested tools, to achieve various benefits: well-functioning and secure code, scalable when user numbers increase, with high-level performance, modular, and thus easily reusable.

The principles of vibe coding are the opposite: “follow the inspiration of the moment, experiment and explore new solutions, create your Frankenstein just to see it quickly stand up”. A friend of mine would say these are software made “haphazardly,” and indeed that's what we're talking about: poor software in terms of cybersecurity, with poor performance as soon as user numbers increase, written with code impossible to reuse.

So why is it being talked about so much? Firstly, because thanks to AI, it's now easy to go from an idea to the code to make it operational: tools like Cursor, Windsurf, or Lovable, allow you to write the idea for an app or software and make it available to users in minutes. Additionally, those who do vibe coding tend to focus much more on the visual aspect of the project (the so-called user interface and user experience) than traditional programmers who focus on functionality and performance, often neglecting the graphics, which however is essential for the success of a software project. Where the programmer seeks the optimal solution before putting it into production, now there are vibe coders who churn out apps and websites continuously, certainly not optimal, but suitable for testing the success of the proposed software, showing those with potential to investors and raising the necessary funds to reprogram the software with traditional criteria.

Vibe Coding

This AI, What a Failure!

Let's be honest: discovering that artificial intelligence is sometimes stupid makes us smile. We can open a regular column, which we'll call “This AI, What a Failure!” to tell stories of AI born with good intentions but then causing damage. This month, I bring you the story of Waymo, which is also the story of many of us. Waymo is among the most famous autonomous car manufacturers in the world. Today, it operates as a driverless taxi already in San Francisco, Los Angeles, Austin, and Phoenix. Waymo rides seem to go well (here's a “sincere” video), and the community appreciates the increased road safety.

Waymo has just signed a strategic partnership with Toyota, which is clearly interested in incorporating safety technologies into its vehicles. However, while autonomous driving appears safer than human driving, Waymo doesn't seem to be able to park without getting a ticket. In fact, there were almost 600 fines for parking last year in San Francisco alone, totaling $65,000!

Knowing that AI is (sometimes) stupid seems to me yet another proof that it increasingly resembles human beings.

Waymo gets $65,000 in fines

One of Our Projects

AGIC supported Novomatic, a market leader in the gaming sector, in creating NovoGenius, a virtual assistant to simplify business processes and work life. NovoGenius allows immediate access to company procedures, supports the onboarding of new employees, accesses company FAQs, helps comply with company regulations and continuously updated norms. Below, you can see a demo of our virtual assistant, but also something additional. The person illustrating its functionality actually does not exist and is herself a virtual speaker who was simply given a text and personal characteristics to interpret. We agree that the speaker talks like Dan Peterson, but I find her truly surprising.

Demo Novo Genius

Meanwhile at Microsoft: Copilot Control System

Microsoft has just launched the Copilot Control System within the Microsoft 365 admin panel, allowing companies to manage and monitor all the organization's “AI agents”. In this demo, truly surprising, you can see a company verifying who can use its AI agents, monitoring their usage and consumption, and even managing the approval of new agents created autonomously by employees. Microsoft's idea is to provide IT administrators with a single platform to monitor both agents created with Copilot Studio and external ones developed by third parties. It's possible to check their availability, manage access permissions, and even block agents if necessary. As companies evolve toward AI systems that simplify work, for example with agents autonomously performing tedious and repetitive tasks instead of people, understanding which are working and why becomes increasingly important.

Copilot Studio Agents

The Magnificent 8

Speaking of Copilot, this great article by Luigi Villanova has just been released, showcasing with clear and quick videos the new AI agents presented by Microsoft on Copilot M365. Luigi imagined them as Pixar-style superheroes, because they can indeed save us in complex situations and assist in many activities. To pique your interest, “they are artificial intelligences that work alongside you: they take notes in meetings, translate in real-time, manage projects, analyze data, answer HR/IT questions… and yes, one of them is a super-intelligent librarian living in SharePoint.” But see for yourself.

Who I Am

Hello, I am Francesco Costantino, university professor and Director of Innovation at AGIC. Passionate about technological innovations and a firm believer in a future better than the past, I enjoy exploring and experimenting with new AI tools available, as well as observing and reflecting on what digital evolution is bringing us.