Can AI be used to predict visa overstayers?
Using classification models to predict people who are likely to overstaying their visa
This article has been co-authored by Michelangiolo Mazzeschi — https://firstname.lastname@example.org
To control illegal immigraton focusing on asylum seekers who cross land borders ignores the real problem: people who overstay their visas (e.g. tourist, student, medical reasons) or overstayers of expired residence or work permits.
Most irregular migrants originally entered the EU legally on short-stay visas, but remain in the EU for economic reasons once their visa has expired.
Overstayers outnumber illegal migrants
Immigrants, who enter United States legally on student, tourist, or work visas and then stay past their visa’s expiration date have outnumbered border crossings by a ratio of about 2 to 1. Elsewhere, the issue is even more pronounced. Most people who are in Britain illegally, for example, entered legally and simply stayed on after their visa expired.
Systematic identification of people ‘overstaying’ in the Schengen area is one of its major challenges and is primarily facilitated by the absence of any system for recording entry/exit movements in Europe.
European countries are still not able to fully account for the flows of non EU individuals who entered the EU legally and extended their stay without obtaining the necessary permits.
The Schengen Borders Code has no provisions on the recording of cross-border movements. The current procedure requires only that passports be stamped with dates of entry and exit. This is the sole method available to border guards or national Police when calculating whether a right to stay has been exceeded.
From January 1, 2023 the European Travel Information and Authorisation System (ETIAS) should be in operation but it applies only to visa-exempt visitors. ETIAS will be a largely automated IT system created to identify security, irregular migration or high epidemic risks whilst at the same time facilitate crossing borders for the vast majority of travellers who do not pose such risks.
Is there a way to use AI for profiling immigrants?
Can this be done in way to avoid discrimination against persons on the grounds of sex, colour, ethnic or social origin, genetic features, language, religion or belief, political or any other opinion, membership of a national minority, property, birth, disability, age or sexual orientation?
- Status Withdrawn
- Country of Residence
- Decision for entering the country
- Resettlement Framework
Is this data (“Data”) useful and can be sufficient to make a prediction on whether an immigrant would become an “overstayer” after his/her legal entry in the EU??
How can AI be applied to immigration?
AI is a powerful tool to make predictions. By giving the AI historical mathematical data, the AI can find patterns in the data and become specialized in predicting the same thing all over again. This tool can be also used in predicting illegal immigrants, if we have sufficient data.
This is how AI works with a very simple example
For any given data, you manually have to decide which part of your data acts as predictors (Features) and which part of the data you want to predict (Labels).
After having set those partitions, you simply give them to a model, and the AI finds the rules by itself. Meaning that from the predictors (Features), the AI will be able to predict the Labels.
There are different kinds of AI, according to the one of our choice, results will vary, and the final output can be more or less accurate.
Training an AI to predict overstayers
In the paper “Artificial Intelligence and Predicting Illegal Immigration to the USA”, researchers Azizi and Yektansani built a model (“Model”) able to estimate the probability of an individual overstaying in the US.
The Model takes into account all the Features (Sex, Age, N. Children, Wage, …, essentially the predictors) and takes Legal Status (0 = Undocumented, 1 = Legal Immigrant) as Labels (the value we want to predict).
To test the performance of the AI they have developed, Azizi and Yektansani have split the dataset in 70:30 proportion. The big chunk with a random 70% of the data (4,396 samples) has been used to train the AI to find the rules (“Rules”), the remaining 30% of the data (1,885 samples) has been used to test the Model.
The AI has found the Rules to map Features and Labels. The researchers have tested the AI on the remaining part of the data. Below is the chart showing the accuracy of different models.
After applying different classification models to make the best prediction, the researchers have reached a threshold of 80% accuracy.
How can the Model be improved?
To build aModel that can be used in a more effective and neutral way, the following improvements could be implemented:
More data: accumulating more data, in compliance with privacy rules, may help to improve the accuracy of the Model. Data which could be useful to profile immigrants and assess the risk of a possible overstay, are:
- Personality Traits (OCEAN model) of the immigrant
- Gini Coefficient of the country
- Hofstede cultural dimensions
No discrimination: the Model should also be trained in a way to avoid bias and guarantee that the findings are not influenced by sex, racial, religion, and ethnic profiling.