How to Scale Your OSINT Research with Data Enrichment by Gergo Varga

Open source intelligence provides a rich treasure-trove of freely available data. It’s possible to tap into it with nothing more than a web browser and a search engine. 

A data point such as an email address or telephone number is all that’s needed to start following a trail – and there are many possible directions to take. For example, you can Google an email address, run the domain through a free WHOIS checker, search for it on data breach databases, or on multiple social networks. 

The trouble is, it’s a slow process. 

For example, say you’ve been harvesting email addresses and want to drill down to the best marketing targets, or that you wish to batch check the legitimacy of potential customers. OSINT research is a great way of doing these things, but manual searches can take forever. It just doesn't scale.   

Data enrichment can provide the solution, and that’s what we look at in this guide. Read on to find out how data enrichment can help you to perform your OSINT research in a fast and more automated way. 

Data Enrichment 101

Data enrichment is the process of gathering a wealth of information from a single data point and combining it into a report or profile. Such a pursuit can start with the phone number provided by a website user, as SEON explains in a breakdown of reverse phone lookups, or their IP address, as well as anything that might be easy to access or ask for.

For example, from an email address, you can – among other things – find out:

  • Which online profiles the address is linked to. This covers both social media accounts, and accounts on many websites and services, such as Amazon, Skype, Quora, and Discord.
  • What domain the email address uses, how long it’s been active, whether it’s suspicious, and whether it has valid MX records. 
  • Whether the email address has been involved in historic data breaches. 

From a phone number, you can ascertain things like:

  • The carrier manages the number.
  • Whether it’s a mobile or fixed line. 
  • CallerID details (CNAM).
  • Whether it’s a genuine or virtual number.

As you can imagine, there are many possible ways to make use of this information. On the legitimate side, it’s often used for fraud prevention and customer verification purposes. As Trifacta says, there are several benefits to data enrichment, including how it “offers opportunities for cross-sells and upsells because a business has the right data and knows its customers well.”

It would be amiss, however, not to also mention that OSINT data can be used for less honorable purposes – by everyone from marketers to cybercriminals. Fittingly, Cisco describes it as “a boon and an Achilles’ heel.”

How Can Data Enrichment Help with OSINT Research?

There are three key ways that data enrichment can help with OSINT research: speed, automation, and bulk processing

Much of the information listed above is freely available. It is, after all, open. But a business looking to onboard a new customer cannot spend hours searching through dozens of social networks and online databases. Data enrichment tools do it in seconds: Simply feed in a data point (such as email address, IP address or phone number) and receive a full breakdown of all the available information. 

In some cases, such manual lookups are sufficient. A manual search tool or a browser extension considerably speeds up the process of small-scale OSINT research tasks. However, tasks with more volume or throughput benefit from a level of automation.

This can include batch processing tools, APIs that integrate with existing systems, and tools that include risk score functionality. The latter can add or subtract points based on factors that support or question the legitimacy of a phone number, email address, or other data point. Instead of manually reviewing all of the provided OSINT data, users can simply sort by a risk score, calculated using criteria of their choice. 

Real-Life Use Cases

Let’s consider a couple of examples of how data enrichment is used in practice: 

An eCommerce store can integrate a fraud prevention tool with its online ordering systems. As soon as a customer provides an email address and/or phone number, the system can perform an automated check – via an API – that generates a risk score. 

A low-risk score could allow an order to go straight through with no friction. Meanwhile, a high-risk score due to factors like the use of a suspicious email address or temporary phone number will trigger a manual check, or cause the order to be rejected automatically. 

This goes beyond fraud prevention too, as a similar check can tell the merchant which customers are more likely to spend more or be more receptive to cross-selling.

Marketers can also make use of data enrichment. A firm in possession of a large email list may wish to filter out people based in certain countries or remove suspicious and fake addresses before commencing a campaign. 

OSINT research is invaluable here but needs to be done in a fast and automated way. For this latter use case, a CSV file that’s easy to sort and manipulate is ideal.   

Example Crossover Tools and How they are Used

Here are a few example tools that can assist with OSINT research. While each tends to have a primary purpose, it’s often possible to adapt them to your specific use case.

SEON

Part of its end-to-end fraud prevention solutions, SEON offers everything from simple (free) online phone number and email lookup tools to a fully-fledged API. The social media lookup alone queries over 50 social networks – something that would be hugely laborious to do manually. 

SEON incorporates risk scores that are highly customizable and use a whitebox approach. This means that you can have full visibility of how the risk score is calculated. It also uses machine learning to spot new fraud patterns.  

Clearbit

Clearbit is a marketing focussed tool, intended to build prospect lists and to finely target advertising and email campaigns. It’s particularly strong on company and contact data.

Drawing from over 250 data sources, Clearbit is known for its open-source intelligence on companies rather than individuals. It looks for everything from HQ addresses to estimated annual company revenue. 

While ClearBit has extensive functionality, it all comes at a price. Other than a couple of basic free tools, everything is available to paying subscribers only. This is very much an enterprise-grade tool. 

BeenVerified 

BeenVerified is a US-only tool. It’s focused on helping people do due diligence on everything from people to properties and vehicles, primarily using OSINT data. You can also look up criminal records with BeenVerified.

In order to comply with legislation, BeenVerified has to be very specific about how it can be used. It’s intended more as a consumer tool than for business use. For example, it’s not supposed to be used for employment screening or credit checks. That said, there are APIs available.  

How to Use Data Enrichment to Speed Up Your OSINT Research

Here’s an example of how to use a data enrichment tool to speed up your own OSINT research. It uses the free trial of SEON’s product. 

Before beginning, you need a list of email addresses, phone numbers, or IP addresses you wish to research.

  1. Sign up for SEON’s free 14-day trial. You will need to activate your account by clicking a link in the activation email.
  2. Once you gain access, navigate to the “Manual Page” link on the left. From here, you can try out the tool with a single email address, phone number or IP to get an idea of the enriched information you can access. For example, if you enter an email address and click “Submit” you see a comprehensive set of information in the right-hand panel. It includes a breakdown of linked online profiles and social media accounts, data on the email domain, and a list of data breaches the email account has been caught up in. Perhaps ironically, for fraud prevention and more generally email validation purposes, seeing an email has been part of a data breach is a good thing. Per an UpGuard article, individual data breaches have leaked up to 10.88 billion records in recent years, with Yahoo’s 3 billion email breach coming in second. With so many breaches occurring, an address that’s never shown up in one may be a throwaway address – not in genuine, active use.  Dozens more major incidents have found their way to publicity – so legitimate email addresses that have been around for at least a few years are very likely to have been part of them.
  3. Navigate to the “Batch Test” tab. This is where you can research email addresses in bulk. One option is to paste in a list of addresses, one per line, with the results all appearing in the results at the bottom of the page.
  4. Perhaps even more useful to many is the CSV import and export functionality on the same page. This allows you to upload a batch of addresses as a CSV file and receive all of the results together as an emailed CSV.

Using the tool in this way allows you to manipulate and process the results however you wish – sorting based on different criteria, as needed.

SEON’s free trial also includes access to the REST APIs. This is restricted to 120 searches, which is sufficient for a small research project or to get used to how the system works. The API works with Python, Java, cURL and PHP, and is well documented for developers. 

As you will quickly realize, data enrichment is extremely useful in itself, but the true power lies in the ability to do batch lookups. API integration can then exponentially increase its potential.

Obviously, the tools listed here are far from an exhaustive selection. OSINT tools run from consumer-focused online lookups to enterprise-grade systems with a price to match. There are also CLI-based tools like MOSINT that perform similar functions.  

Ultimately, data enrichment is about compiling all of that freely available OSINT data without the need for lots of manual intervention. An hour spent automating the process will achieve a whole lot more in the long run than an hour flicking from search box to search box.   


ABOUT THE AUTHOR

Gergo Varga has been fighting online fraud since 2009 at various companies – even co-founding his own anti-fraud startup. He's the author of the Fraud Prevention Guide for Dummies – SEON Special edition. He currently works as the Senior Content Manager / Evangelist at SEON, using his industry knowledge to keep marketing sharp, communicating between the different departments to understand what's happening on the frontlines of fraud detection. He lives in Budapest, Hungary, and is an avid reader of philosophy and history.

 

 

June 21, 2022
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
© HAKIN9 MEDIA SP. Z O.O. SP. K. 2013