The internet is the world’s largest information database. Inside the giant mountain of data it contains, there are insights and tools to inform just about any issue your organization might be having. The problem is, when confronted with the issues, you can’t simply brandish the whole mountain in defense. To carve out the correct tools, OSINT needs to be smartly applied.
What Is OSINT? How Does It Work?
Open Source Intelligence, or OSINT, is the process of collecting data from any available source of information that is legally public. This can be done manually, with any number of search tools online, or through software that fetches data and completes the process automatically through machine learning algorithms. Generally, that process will include these steps:
- Determining where the public information is
- Determining exactly what information needs to be extracted
- Harvesting the data from where it is stored
- Preparing the data for optimized analysis
- Analysis that combines gathered data with existing data – data enrichment
- Reportage of any insights from the data
Imagine giving a press conference where you present a recent pain point in your organization, asking for suggestions. The room is suddenly filled with shouted answers – many are bad, even more are totally irrelevant. OSINT application is how you lower the volume on the bad responses, so you can hear the handful of good ideas from the back of the room, whispered.
What Qualifies as Publicly Available Information?
The importance of leveraging an OSINT process against a database as large as the open internet can be best explained by considering the size of the body of data.
Included among publicly visible data:
- Any information associated with any IP address
- Every single public social media post, including photos, videos, and text
- Digital marketplace listings
- Maps and geolocations on posts and submitted data
- Government-level public information like marriages, property deeds, court proceedings
- Phone numbers and devices, accounts, and locations associated with them
These are only the most commonly utilized datasets, but already represent a massive amount of data. For example, the amount of identifying information that can be discovered through a reverse phone lookup of a number associated with a smartphone might include every app and social registration of that user, and potentially additional personal details that the person has included in those profiles such as employment, interests, photos and more. Raw data from the open web database like this is not a particularly valuable resource without any treatment, as the overwhelming amount of raw data for a particular query will be irrelevant, outdated, corrupt, or simply bad. This is where OSINT starts to step in.
Who Is OSINT Relevant For? Who Uses It?
Open Source Intelligence is woven into the tapestry of data science. Data scientists will inevitably be looking at datasets that have been aggregated by an OSINT process, regardless of the industry or sector.
OSINT will be at its most relevant when addressing specific security needs, but in general, the applications can be thought of as either increasing profits or keeping people safer.
For corporations, OSINT can be applied to profit margins through:
- Customer segmentation: Leveraging an OSINT process against incoming customer traffic can first help your company recognize different customer values, who can then be directed down different pipelines to maximize revenue
- Targeted marketing: OSINT can be utilized by advertisers and affiliates to both learn about the specific wants of niche customers, as well as how best to engage their attention
- Competitor/market monitoring: Data scientists might set up an OSINT process to regularly scan the publishings of a competitor or industry peer for news that is relevant to their market
Entities whose primary concern is safety, like fraud prevention teams, counter-terrorism units, investigators, and law enforcement can leverage OSINT to:
- Investigate: When confronting a security concern or a crime, any data point about a potential bad actor can open up a wealth of identifying information that might lead to apprehension
- Mitigate fraud: Fraud and risk management tools employ OSINT to develop the digital footprints of incoming users, which can then be assessed for indicators of potential malicious behavior
- Quality check: Pen testers, or white-hat hackers employed to test how penetrable a security system is, will use available OSINT aggregators to look for vulnerabilities in their organization’s structure
- Background check: Data like arrests, court proceedings, and other government-controlled occurrences are in the public record and, naturally, can provide insight
Notably, the nature of OSINT means that it is indeed open for anyone to use, including potential bad actors. Security professionals like pen testers keep this in mind and execute hypothetical malicious situations based on what they think an actual fraudster might do. Staying ahead of anonymous criminals’ creativity is an unrealistic option compared to simply keeping pace, and a fully outfitted cybersecurity team will certainly include a pen tester team and a fraud prevention stack to that end.
Case Study
In a BBC Panorama documentary and accompanying article, journalist Kafui Okpattah details how he leveraged OSINT and specifically social media lookups to identify a fraudster.
The perpetrator, a YouTube rapper going by the name Tankz, regularly posted videos of his exploits involving stolen PII and digital marketplace exploitation, often posing with ill-gotten goods or advertising reams of stolen identities to sell.
Okpattah goes into detail about manually crawling Tankz’s open-source data – initially his YouTube videos – to pull out identifying information. By simply comparing the backgrounds in his videos to Google Street View images, Tankz was determined to be local to the Wembley Stadium neighborhood of London, and then was able to be physically located by combining snippets of his car with his student accommodations. To determine his name, the investigators followed the trail of his music to iTunes, where his channel was registered under a real name, Luke Joseph. Tankz’s eBay account also led to the name Luke Joseph, and a second confirmation gave the investigators confidence they had found the right person.
After the piece was published, all major social media platforms pulled the associated accounts, effectively ending Joseph’s fraud career, and every step of the investigation only utilized OSINT.
Conclusion
Writing for the Police Foundation, soon-to-be Metropolitan Police Commissioner Sir Mark Rowley QPM describes OSINT as a “critical component of modern intelligence and investigative tools.” Sir Rowley makes this statement as the former head of U.K. Counter Terrorism Policing, where his team was responsible for foiling 27 would-be extremist plots. He notes that insights from OSINT research explicitly yielded information that assisted in this, and that information was not to be found in existing datasets or curated databases. Deciding not to refer to aggregated OSINT data he says leads to the potential for “both embarrassment and intelligence failure.”
For his services during his tenure as head of UKCTP, Sir Rowley was awarded a bachelor knighthood.
While referring to OSINT might not yield a knighthood for every data scientist who employs it, avoiding both embarrassment and intelligence failure is a modest and achievable goal.