In OSINT, a little Python can go a long way

OSTC
4 min readDec 27, 2022

A lot of different people get into Open Source Intelligence, or OSINT, for a lot of different reasons. Whether it’s because you’re an intrepid detective on a quest for justice, an investigative journalist looking to verify a key source, or merely an inquisitive [*cough* nosy] individual hoping to find interesting tidbits of information on people or groups [*cough* dig up some dirt], knowing how to program (even a little) can give you a serious advantage in the OSINT game.

Imagine being able to automatically gather huge amounts of data from almost anywhere on the open Internet, analyze vast datasets, map out complex networks, and even track people’s locations with just a few lines of code. That’s where Python comes in.

This versatile (yet surprisingly accessible/readable) programming language is a kind of Swiss Army Knife for OSINT practitioners, allowing you to automate all sorts of tasks and uncover information and patterns that might otherwise remain hidden.

Python tasks for OSINT [image by TRADECRAFT]

So how can you use Python in your OSINT work? Let’s have a quick look.

Web Scraping

This is the process of extracting data from online sources using automation. Python has a number of pre-packaged modules, or libraries, that allow you to write code that can navigate to an online source, identify the data you’re interested in, and extract and structure it.

For example, if you’re interested in a particular Twitter user, you can use the Tweepy library to gather all available information about them from the Twitter API, including the content of their tweets and any associated media files or links. And if you’re looking to collect data from a website, you could use the Requests and Beautiful Soup libraries to extract information from the site and save the data to a file or database for later analysis.

Which brings me to my next point…

Data Mining and Analysis

Python has a number of libraries that can be used to mine, analyze, and visualize data. These libraries allow you to import and process large datasets and create charts, plots, and other visualizations to help you understand and communicate the data.

For example, let’s say you have a dataset containing information about a person’s social media activity, including the dates and times of their posts, the number of likes and comments they received, and the content of their posts. You could use libraries such as spaCy, Gensim, and TextBlob to mine and analyze the textual data to find patterns and to answer questions like: What are the most common topics the person writes about? Which hashtags do they use? and What are their feelings on a specific issue?

You could then use libraries like Pandas and Matplotlib to quantify, measure, and visually represent the data to answer questions such as: When is this person most active online? and Which posts gained them the most followers in the shortest amount of time?

Network Analysis

Libraries such as NetworkX and PyGraphviz allow you to analyze the relationships between individuals or groups by creating visualizations of their connections and mathematically assessing the importance of a specific person or entity within the network.

For example, let’s say you want to understand the relationships between a group of people on social media. You could use Python and a network analysis library to create a graph representing the connections between these people, with each person represented as a node and their connections represented as edges. You could then use the library to calculate various metrics, such as degree centrality (the number of connections a person has) or betweenness centrality (the number of times a person acts as a bridge between other people). You could also use the library to identify groups or communities within the network or to identify key influencers or leaders.

Geolocation

Libraries such as Geocoder and Geopy allow you to convert physical addresses or GPS coordinates into latitude and longitude, and vice versa. This can be useful for identifying the location of individuals or organizations based on their online activity.

For example, let’s say you want to understand the locations of a person’s social media activity. You could use a geolocation library to convert any location data you can find in their posts (e.g. geotags, place names, addresses, street signs, landmarks, etc.) into latitude and longitude coordinates. You could then use a mapping library, such as Folium, to create a map showing the locations of the person’s activity. This could help you understand where the person spends their time, and potentially even identify patterns in their movements.

These are just some of the ways that Python can support OSINT work, making it an excellent choice for gathering and analyzing information from publicly-available sources.

But wait, there’s more!

With its large community of developers, Python has an ever-expanding list of useful libraries and tools such as Selenium for automating web browsing tasks, Whois for querying web registrar databases to gather information about domain names, Phonenumbers for parsing and validating phone number data, and Scikit-Learn for machine learning and predictive analytics.

So should you add Python Programming to your new year’s resolutions? Well, let me put it this way: If learning OSINT gives wings to your curiosity, then learning Python straps a warp drive on its backside. Nuff said?

For more OSINT and Python tools, tips, and topics, follow TRADECRAFT on Twitter, LinkedIn, and Mastodon

--

--