Well, as the title suggests, we are going to look into how to harvest information, using various methods like writing scrapers for specific websites which display the email addresses of their clients (of course behind an authentication mechanism) and sniffing your network to find out the email addresses (and maybe, bank and credit card details) of users logged into the network and naive enough to do financial transactions over such a network which can be compromised, and checking emails.
Naturally, by now, I hope you understand that the “harvesting information” part of the title is just there to sanitize this blog post, so that people cannot say I am putting up evil ideas using the noble python language. How to use a tool is up to you. It can be used for benign purposes, but it can also be used for “not so benign” purposes. The decision is for you to make. I will just introduce you to what can be done using python in the context of network intelligence gathering.
Enough talk, let us now talk about the tools available for the “trade”. Later, we will see how we can use them from within python code. I will primarily be using a Linux distro, (specifically Ubuntu 14.04), but you can definitely stretch this for other Linux and UNIX distros, and possibly Microsoft Windows and Mac too. I think that nowadays Windows has all the tools that are available as open source code in the internet. They may have modified them to a certain extent and made them proprietary, but they are available within the Microsoft suits.
The Tools for the Trade:
The following are some of the more popular tools used in the domain of packet sniffing. Please note that popularity doesn't mean they are the best. Popularity depends on the size of the community that supports the tool, and there are vested interests of commercial organizations that create these communities. I do not want to get deeper into this muck, but I am stating this here so as to make you aware (if you are not already aware) that there are tools far better than the ones listed below. I will name a few in the course of this post, and I will leave the rest as an exercise for you.
- Pcap – This is a product of the (in) famous libpcap library that you would possibly get on any Unix or Linux machine. There is a port for MS Windows too, so if you are a Windows buff, you won't be disappointed. This is a lethal tool for packet capturing, and it captures all packets on the network, even if they are being routed to some other node in the network. However, in this article, we will concentrate on an extension named pcapy which is capable of accessing methods/functions from the pcap library. Basically, what is normally done is hackers/programmers use pcapy to capture packets on the network. In order to analyze the contents of these packets, they use Impacket, because it is very handy tool to analyze network packets and displaying them on the screen or by redirecting/writing the output in a file.
- Nmap – No packet sniffing article can ignore this particular tool. It has been there in use for ages now, and naturally, python has an extension of this extremely potent tool. Nmap is particularly used for packet sniffing and tracking in *nix platforms, but there is a windows port as well. It is one of the tools that can tell you every fact you need to know about your network (or networks other individuals use if you can gain access to their private broadband connections, and I am sure you can collect phone numbers from them ( :-D ) along with other sensitive information like a bank transactions details, online banking passwords, etc. Nice way of treating your unsuspecting neighbours, but bear in mind, these can be dangerous for you if your city's cybercrime department gets to know about it. And they do come to know if you are doing it frequently with large sums of money, and the cops may take you in as one of their guests in cells).
- dpkt – This is a tool for creating and parsing captured packets. This is widely used for information gathering from data inside the captured packets, and this data may compromise a target location (which may be your computer).
- pyshark - A variant of Wireshark, possibly you have heard about this already, but you might not know its capabilities. It provides the user with the information from captured packets. The info includes (but it is not limited to) IP/IPv6 destination address, length of packet in bytes, source address (again IP/Ipv6), index of a tcp stream to which the packet belongs, etc. Needless to mention, this is a tool of choice for most network hackers.
- Scapy – Scapy is different from the above tools as it is not exactly a tool, but it is a sort of framework to do network packet manipulations. In order to use this, you need to place your Network Interface Card (NIC) in promiscuous mode. This will allow you to capture packets that are not intended for you, but they use the same network you use. It is written in Python, hence most python pro's like it.
- Raw Sockets and ICMP – This is a bare-metal implementation of packet capturing tool, and only seasoned programmers/hackers use it. You can manipulate this to do whatever you want (as it gives you all the flexibility at the cost of writing a lot of code to make the socket(s) work and capture low layer protocol packets (like Internet Control Message Protocol, commonly known as ICMP, etc.). In this context, let me first make you aware that the networks we use are mostly variants of the OSI model of networking. Such a network consists of 7 layers, and from bottom (data transfer through wire) to top (presentation of data to the user in a monitor), they are as follows: (i) Physical, (ii) Datalink, (iii) Network, (iv) Transport, (v) sessional, (vi) Presentation, and (vii) Application. However, most popular network protocols (like TCP) do not adhere strictly to this model. In case of TCP, the lower most 3 layers are clubbed into one, and the top 2 are again clubbed together.
- Snoopy from Sensepost – This tool uses python and drones to capture “Probe SSID's” from neighbouring wireless connections and pose as the destination for those probes. That makes the Wi-Fi network “think” that it has reached the service provider's computers, and then once the user makes a financial transaction, all the details are captured by this rogue interceptor, and then it is the free will of the user behind capturing this type of data.
In the following section, we will go over 4 of the above mentioned tools/apps one by one, and we will write some code to do something like a “hello world” program. I hope you would be able to gather more knowledge once you have read this post, and find out a tool that works best for you. Again, please note that there are other open source tools that are better than these, and maybe you should go to Github and take a look at the available mechanisms for the purpose of packet capturing and network sniffing. Needless to say that these will also capture private email addresses, which is the excuse behind this post.
Working with these Tools:
In order to work with these tools, the first thing that you need to do is to install them. So, I will take one tool at a time, show you how to install it on your computer system, and then get into the coding part. Please note that since I use Ubuntu, I would use “apt-get” to install the libraries required for the functioning of these tools, but if you are on some other flavour of Linux or Unix, you would possibly be using “yum” or “rpm” or any other command line package manager in much the same way I use “apt-get”. Basically, you need to pay attention to the libraries I install, as you would need to do the same in whatever way you deem fit.
Pcap : In order to use pcap, you would first need to download and install the “libpcap-dev” library. I did this in the following way:
sudo apt-get install libpcap-dev
Please note I used “sudo” - you would need root permissions to install most of these things. Next, you would need to install “python-libpcap” or “python-pypcap” (depending on the version of the system you are using). I installed “python-pypcap”, and I did it in the following way:
sudo apt-get install python-pypcap
Alternatively, you could use the following if your Ubuntu is below 12.04 version:
sudo apt-get install python-libpcap
Finally, use the following to install the python package. One point to note here is that it is always advisable to use a python virtualenv to work these things out. Of course the libraries will be installed globally, but at least the python packages will reside in the selected virtual environment and hence they would not interfere with your global python environment.
That should be enough to make pcap, pcapy and impacket available to your python interpreter. Note that my network interface name is 'wlan0', so I will be using it in all my examples. If your interface name is different (like 'eth0', 'wlan', etc), then you would need to replace the interface name with the name of your interface. Now let's see some code here:
Running the above code provides the following output on my computer:
['eth0', 'wlan0', 'docker0', 'bluetooth0', 'nflog', 'nfqueue', 'vmnet1', 'vmnet8', 'any', 'lo']
The output will vary according to what interfaces you use, but it will be something similar. Next, let us capture and decode some packets. For the purpose of looking at the data later on, we will be saving the decoded data in a text file:
p = pcapy.open_live(“wlan0”, 2048, False, 1000)
Let us dissect the above code. The first argument to pcapy.open_live() is the name of the network interface to monitor. In my case it is “wlan0”. Your's could be different. The second parameter is the number of bytes to pick up. I have set it to 2048. The third parameter is the mode at which the call will execute. If you want to use 'promiscuous' mode, set it to True. Otherwise, set it to False, and that is exactly what I did. Alternatively, you could set it to 1 or 0, 1 being promiscuous = True, and 0 being promiscuous = False. Setting promiscuous to True will allow pcapy to capture packets that are not destined for the host on which it is set up. Use it with caution, and do it only if it is required. The fourth parameter is the read timeout. The value would be in milliseconds, so I have set it up as 1000, which is basically 1 second. You may set it to 0, if you do not want the read operation to time out, but normally it is not a very good idea.
Next, we call the method “setfilter” of the object returned from pcapy.open_live (). We set this to 'tcp', so that we can get raw packets from the datalink layer. Formally, setfilter sets the BPF filter (Berkeley Packet Filter). A BPF filter will gather packets that of the specific type that is set in its argument (in our case, it is 'tcp')
Next, we are going to write a small callback for processing the received process. In this callback, we will just print out the packet contents, but you are free to add logic to do something more useful.
The EthDecoder helps in parsing the ethernet packet by creating a “decoder” object. We print that out. Next we look into the IP packet inside the ethernet packet by calling the “child ()” method of the ethernet packet. Once we print that out, we look into the TCP packet inside of the ethernet packet.
In order to run the callback, we need to pass it to the “loop ()” method of the object returned by pcapy.open_live. So, the last line would be something like this:
The first parameter is the packet limit. We have set it to 10 here. However, you may make it infinite by making it -1.
The output would be something like this.
This isn't very interesting as I took the liberty to not show you what I am accessing from my computer. But, if you want to do that, you could put the first param of the “loop” call -1, and then started capturing packets as you work. (Or better still, put this program on your target computer, make another program to channelize the output real-time to your own computer and then you could see what your target is doing. You could use a socket to do the channelling, and have this program run in the background using “supervisord”, so that it starts running as soon as your target computer boots up).
Nmap : nmap is a wonderful command line utility available for all Linux and Unix distros. It does some pretty awesome things and we will take a look at it shortly. How do we access its powers from within python? Well, you can simply Google that up, but just to save you a little time, this is how it can be done.
pip install python-nmap
Alternatively, what you can do is download the python-nmap tar-gzipped file from here: https://files.pythonhosted.org/packages/dc/f2/9e1a2953d4d824e183ac033e3d223055e40e695fa6db2cb3e94a864eaa84/python-nmap-0.6.1.tar.gz
Once it has been downloaded, you can extract it using the following command from the command line (please excuse me for targeting linux and unix platforms, you can certainly do all these things on windows computers too, but the process would certainly be different. For example, you would possibly extract the tar-gzipped file on a windows platform by right-clicking on it and selecting the “extract” option from the context menu).
tar xvfz python-nmap-0.6.1.tar.gz
Next, you would need to get into the extracted directory and run the setup.py file to build and install it. This is how it is done:
python setup.py install
That should work out and have the nmap extension for python installed. Now let us take a baby step to see what nmap can do for us. We will create a port scanner first.
nm = nmap.PortScanner()
What you are trying to do here is asking nmap to show you what ports are open in the range of 22 to 443 on your localhost. You would get an output similar to this:
The output looks a bit gibberish at first glance, but if you take a closer look at it, you would get a whole lot of information on what is going on on the ports that are open within the 22 to 443 range. If you just wanted to know which ports are open in the above mentioned range, you could simply do the following:
The output is simple here:
[80, 25, 139, 443, 22]
That is a list of the ports that are open on your computer. You could very well try it over a larger range (may be 22 to 8888, since a lot of people, especially hire python developers, run web servers on ports like 8000, 8001, 8888, etc to test their web apps before deploying them on port 80).
Now, once you get this information on what ports are open and which applications are using those ports (for example, you can see that Samba smbd is using port 139), you could use a vulnerability in that software to do something funny (or may be malicious, depending on the sort of person you are).
In the next step with nmap, let us scan a networked environment and see which nodes are up and running and which are not. In order to do that, you have to write the following code:
nm.scan(hosts='scanme.nmap.org', arguments='-n -v -A')
The arguments -n, -v and -A stand for “Never do DNS resolution”, “ Increase verbosity level” and “Enable OS detection, version detection, script scanning, and traceroute” respectively. The output of this command is voluminous and I am simply putting only selected parts of the output below. Please run this on your own computer to see the entire effect. Also, please change the name of the host to point to the host you are seeking to get info of.
You can look up more capabilities of the nmap tool by typing nmap -h on your command line (not the python interpreter prompt, but the linux/unix command line). Use it to your advantage, as this is really a very powerful tool.
Next comes Pyshark.
Pyshark: Pyshark is an extremely versatile too and it helps in packet capturing as well as analysis of data inside the packet. Installation of pyshark is very simple, especially so when you are installing using “pip”. All you need to do is the following:
pip install pyshark
That's it. Pyshark is now installed as a python module and you can start using it by adding “import pyshark” statement in your code. Now let us take a look at the code to capture packets. Please note that I am using the interface named “wlan0”, and you should adjust it to whatever interface you are using. If you are not so experienced in finding out the interfaces on your computer, just run “ifconfig -a” from the command line. In the response of this command, look for the interface that provides you with an IP address (not the loopback one, 127.0.0.1)
Once you know your interface to check and capture packets from it (for example, mine is 192.168.1.4), you may start to capture packets that are passing through the selected interface. The code for that is pretty simple, and it is shown below:
The output of this tiny script is quite a lot, so I am just putting a random section of it.
Alternatively, you may use pyshark to handle previously captured data (may be using pcap/pcapy, or some other tool). The syntax of such a code would be very similar to the above mentioned script, except that it will take its input from the file itself and not from the network interface (wlan0 in our case above).
The result of the above code would be similar to the LiveCapture code. But what do you do with the results of such output. Well, for starters, you could monitor the output for sometime and find the encrypted data that is being sent. For that, the website you are trying to login into has to be secure (using https). You can start the above script and try to login in your gmail account, and you would get output similar to the following:
All you need to do is decrypt the data to find out the credentials. Of course that is not as easy as it sounds, but it is certainly not impossible. You just need to have some social engineering skills as well as some knowledge of cryptography to get the creds.
Also, please keep in mind that pyshark can be run in promiscuous mode to sniff all packets even if they are not destined for the machine on which it is being run.
Scapy : Scapy is more of a framework than a tool to sniff and manipulate packets. With scapy, you can sniff packets in promiscuous mode. This may be done by typing in the following command on the command line of your linux/unix systems.
Ip link set wlan0 promisc on
This will set the wlan0 interface in promiscuous mode and sniff all packets going over it. Let us see some code here:
The above code is possibly a bit off the mark and you would need to modify it a bit to make it work. But the gist is exactly as shown above. If you are using a version of python less than 2.7.x, then you would possibly need to upgrade your python version to 2.7.x as “cryptography” will remove support for python versions lesser than 2.7.x.
Well, all in all, all of these are very handy and powerful tools. I have just scratched the surface of this topic in this article, and there is actually a lot more to it. Please feel free to research this by yourself, as I think that it is the best way to learn something.