Block Millions Of Ads Network-wide With A Raspberry Pi-hole 2.0

Block Millions Of Ads Network-wide With A Raspberry Pi-hole 2.0
Update 2016-04-24 : Please note this guide is now out-date, and it is recommended that you use the automated install to install and update Pi-hole.
Technically, this guide will still work, but it does not include any of the features and improvements that have been added.  This article will, however, still give you some insight into how the Pi-hole functions.

The Pi-hole can block ads for all devices on your network.  All you need is a Raspberry Pi connected to your router.  It was inspired as a low-cost, open source alternative to the AdTrap.  The Pi-hole works on the the B, B+ and Pi 2 (it can also run on the Zero, but you need a micro-USB-to-Ethernet adapter)

Block Over 900,000 1.6 Million Ad-serving Domains (And More)

Currently, over 900,000 1.6 million (if you decide to use the mahakala source list) domains known to serve advertisements can be blocked before they even reach your computer.  The list of these domains are maintained by different sites across the Internet.   The Pi-hole queries these sites and compiles them into an aggregated list, which is then stored on the Raspberry Pi to block advertisements.

Anytime a Website wants to load something from a domain on the list, it is redirected back to the Pi, so the real server is never contacted.  This can not only speed up your Web browsing, but it will also reduce your bandwidth usage because your browser is not downloading the ad from the server.

Setting It Up (The Easy Way)

These are the steps you will take to activate your Pi-hole and begin blocking ads.  If you prefer to tinker around and do things on your own, take a look at the manual setup.

  1. Install Raspbian onto the Pi
  2. Run this command: curl -L install.pi-hole.net | bash
  3. Configure your computers/devices to use the Pi as their DNS server [critical step]

That Magical Moment…

Once the installation is complete and you have configured your devices to use the Pi-hole as their DNS server, ads are instantly blocked without any configuration.  It’s one of those things that is difficult to describe and more something you need to experience.

Optional Settings After The Pi-hole Is Setup

Alternative Installation Method (DietPi Image)

If you want something even easier, you can try burning this DietPi image to your SD card.  It is already set up as a Pi-hole, but isn’t as up-to-date as the method above, but it is something we are working on.

If You Already Had A Pi-hole Setup…

It probably wasn’t working that well due to the last list update (unless you fixed it), so I suggest just getting a new SD card or wiping it and starting over.  Make a backup of it first if you are worried.

You will be better off installing the new Pi-hole (using the install command).  The original one is very primitive compared to all the enhancements that have been made.

Caveats

Some ads still get through via IPv6 (but there are some things you can try  to improve this) and https.  Near the end of 2015, I noticed a lot of the bigger ad networks started delivering a lot more ads via IPv6.  I am still working on improving IPv6 blocking, but as you may know the Pi-hole is free, but powered by your donations.

How It Works

First, I’ll explain the concepts.  Then, I’ll break down the how the script works.  After that, I’ll give some tips on customizing your Pi-hole to add more domains to block, or use alternative DNS servers.  Finally, I’ll cover, in detail, how to manually set this up.

The Concept

The Pi-hole answers your computer’s DNS queries and if it is a domain on the ad list, it sends the request to the Web server that is also installed on the Pi.  This Web server just serves up a blank page.  So instead of loading the ad from the real domain, the blank page from the Pi is downloaded, thus “blocking” the ad.

Understanding The Components

DNS

When you visit a Website by typing in the domain name (google.com), there is a server out on the Internet that looks up that domain’s IP address and sends you to the correct server (computers understand numbers, not letters). If you knew Google’s IP address, you could go directly there by typing that in instead of google.com.

Armed with the knowledge that each domain exists at a certain IP address, when you visit a Website, that site might be trying to load stuff from other domains (such as advertisements).  So your computer makes a DNS query to find out the IP address of that resource.  If you are using a DNS server on the Internet, it will download the correct content.  But since the Pi-hole is a DNS server, it can tell your computer that SomeAdDomain.com exists at a fake IP address.

Web Server

Now that we can tell your computer to download something from your fake IP address, we can use a Web server to serve up a tiny blank image file or Webpage.  So as far as your computer is concerned, it asked where a resource was, found it, and downloaded it.  From your point of view, you downloaded a blank file instead of the advertisement.

The hosts File

/etc/hosts  works just like DNS except that it’s faster because it is the first place your computer checks where a domain is.  It also can be used to do other things.  In the case of the Pi-hole, this file contains all the entries for the ad-serving domains, which tells your computer that the site exists at the IP address of the Rpi.

If you know about the hosts file, you may be thinking, why do you need a DNS server if the hosts file already does the redirecting?  And doesn’t the hosts file only apply to the computer it exists on?  Normally, yes, but when you tell dnsmasq  to use file, it give the hosts file a super power and makes the file available to any device that queries the DNS server (i.e. any device on your network).  In essence, transforming it into a DNS zone file.

The Script

After November 15, 2015, the script’s code was refactored, so this post is a little outdated from what you will find on Github, but it is still very relevant to understanding how everything works.  All the same concepts are still in the script–they have just been turned into functions to make the code easier to maintain.

Now that you know how the Pi-hole works, I’ll explain how the script was designed to find ad-serving domains and compile them into a huge list, which is then used to create the hosts file. Throughout the script, you will notice a science/Star Trek theme. So things like variable names and echo commands are references to that. 🖖

Get The Raspberry Pi’s IP Address

In order for clients to access the Pi’s Webserver, they need to know where to find it.  Make sure you set a static IP address as it will be stored in this variable and be used to populate the hosts file.

piholeIP=$(ip -4 addr show | awk '{match($0,/[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/); ip = substr($0,RSTART,RLENGTH); print ip}' | sed '/^\s*$/d' | grep -v "127.0.0.1")

Set Some Variables

So I don’t repeat myself (my golden rule of scripting), I use variables to store all of the filenames.

basename=pihole
piholeDir=/etc/$basename
adList=$piholeDir/gravity.list
blacklist=$piholeDir/blacklist.txt
whitelist=$piholeDir/whitelist.txt
latentWhitelist=$piholeDir/latentWhitelist.txt
justDomainsExtension=domains
matter=$basename.0.matter.txt
andLight=$basename.1.andLight.txt
supernova=$basename.2.supernova.txt
eventHorizon=$basename.3.eventHorizon.txt
accretionDisc=$basename.4.accretionDisc.txt
eyeOfTheNeedle=$basename.5.wormhole.txt

Getting The Lists (And Being A Respectful Netizen)

There are many sites out there that maintain lists of ad-serving domains.  These URLs are stored in the sources  array.  An array is used so we can loop through every domain inside and do something to it.  This makes it easier to add additional URLs whenever you want.

A for  loop is used to run some commands on every entry in the array.  First, I store the URL in a variable for better readability.

url=${sources[$i]}

Next, I extract just the domain name and store it in a variable, again, for better readability.

domain=$(echo "${sources[$i]}" | cut -d'/' -f3)

I also set the user agent in a variable as it seems to be a universal one that works for every URL that is being downloaded.

agent="Mozilla/10.0"

I create another variable to save the file with a consistent naming convention of list.indexNumber.domainName.  This helps with debugging and also names the files nicely in case you ever want to look at them.

saveLocation=$origin/"list"."$i"."$domain".$justDomainsExtension

Next, a case  statement determines if the the URL needs a special curl  command.  If it doesn’t, it sets another variable to simply hold the curl command.

cmd="curl"

Now, we are ready to download the lists. Previously, I was storing the lists in RAM (/run/shm) so it was faster, but that ended up causing a lot of errors when the list grew very large.  So now, the lists are downloaded directly to the SD card–but it doesn’t do it recklessly. Thanks to some bash-fu, we can reduce the amount of I/O that takes place while also respecting the bandwidth of the sites that are hosting the lists.

First, a temporary file is made.

patternBuffer=$(mktemp)

Then it checks if the list from the source already exists and is (-r)eadable. If it isn’t, another variable is set, which contains another part of the curl command.

if [[ -r $saveLocation ]]; then
heisenbergCompensator="-z $saveLocation"
fi

Now that all the components of the curl command have been established, we can execute it to downloaded the list.

$cmd -s $heisenbergCompensator -A "$agent" $url > $patternBuffer

If there is (-s)omething in the $patternBuffer, then the list can be transported to your computer.

The content in the $patternBuffer is in a state of flux and needs to be materialized. Lieutenants sed and awk take care of this and save the cleaned up list into the $saveLocation declared earlier.

awk '($1 !~ /^#/) { if (NF>1) {print $2} else {print $1}}' $patternBuffer | sed -nr -e 's/\.{2,}/./g' -e '/\./p' > $saveLocation

If there are no changes, the list is skipped as there is no need to re-download it. Doing it this way respects the bandwidth of the people operating the sites as there is no need to download the entire list if the exact same one is already on the computer.  Some of the sites will block your IP address if you try to download from them too often, which I learned the hard way while trying to test out the Pi-hole.

Another benefit of downloading it locally is that it you always have a copy if the site is no longer available.

Now, while many of the lists downloaded are already in hosts format, their formatting is inconsistent and I like things to look nice and readable, so I have some code to format everything the same way.  For example, one list has a single space between the IP address and the domain name, while others have tabs or additional whitespace.

So one list looks like this:

127.0.0.1 lb.usemaxserver.de
127.0.0.1 tracking.klickthru.com
127.0.0.1 gsmtop.net
127.0.0.1 click.buzzcity.net

And another like this:

127.0.0.1       005.free-counter.co.uk
127.0.0.1       006.free-adult-counters.x-xtra.com
127.0.0.1       006.free-counter.co.uk
127.0.0.1       007.free-counter.co.uk

After the sed and awk finish processing the file, only the domain name remains, allowing me to consistently format them in another step.

lb.usemaxserver.de
tracking.klickthru.com
gsmtop.net
click.buzzcity.net
005.free-counter.co.uk
006.free-adult-counters.x-xtra.com
006.free-counter.co.uk
007.free-counter.co.uk

Aggregating The Lists

Now that there are files from each source that contain only domain names, it is time to aggregate them into a single list while removing any carriage returns or line feeds.

find $origin/ -type f -name "*.$justDomainsExtension" -exec cat {} \; | tr -d '\r' > $origin/$matter

Blacklist entries are appended to the list if /etc/pihole/blacklist.txt  exists.

cat $blacklist >> /tmp/matter.txt

The /etc/pihole/whitelist.txt  file is also parsed if it exists and is stored in a temporary file because it needs special formatting to properly find the domains to remove.

awk -F '[# \t]' 'NF>0&&$1!="" {print "^"$1"$"}' $whitelist | sed 's/\./\\./g' > $latentWhitelist

In order for grep  to accurately remove entries from the list, a dollar sign ($) must be appended to the end of each line.  Then, you can use the -x option to remove the entries.

grep -vxf $latentWhitelist $origin/$matter > $origin/$andLight

Also, since we want to be able to access our sources for future updates, those get whitelisted as well in case they happen to end up on one of the lists.

for url in ${sources[@]}
do
    echo "$url" | awk -F '/' '{print "^"$3"$"}' | sed 's/\./\\./g' >> $latentWhitelist
done

Formatting The List

Now that there is a full list of domains with the blacklist and whitelist applied, I use sed to remove some different types of line feeds (invisible characters at the end of a line of text), which if not removed, result in duplicate domains in the list that sort cannot detect.

cat $origin/$andLight | sed $'s/\r$//' | sed '/^\s*$/d' > $origin/$supernova

Finally, the list can be alphabetized and duplicates can be removed.

sort -u $origin/$supernova > $origin/$eventHorizon

The list is still just a list of domains names, so it needs to be formatted like a hosts file with the IP address, a space, and then the domain name.

cat $origin/$eventHorizon | awk '{sub(/\r$/,""); print "'"$piholeIP"' " $0}' > $origin/$accretionDisc

The last step is to copy the formatted file over as /etc/pihole/gravity.list.

sudo cp $origin/$accretionDisc $adList

If your brain isn’t hurting too much after reading that, you can continue on to learn how to block malware and tracking sites by adding our own sources to the script and making the hosts file even more powerful.

Blocking Malware, Tracking Sites, And More By Adding Your Own Sources

Recently, I rebuilt the gravity.sh  script (the magic behind the Pi-hole) from the ground up to make it easier to add your own sources.  You will need to know how to write shell scripts to do this (but you might be able to figure it out by reading through this section).  This will be done by editing /usr/local/bin/gravity.sh  on your Raspberry Pi-hole.

Add Sources That Have A List Of Domains

Before actually editing the script, you should curl  the URL of the site that has the list of domains you want to block and figure out how to get just the domain names from it.  If the list you are downloading is already in hosts format (IP address, whitespace, domain name), then you only need to add your domain(s) into the sources array by putting it in single quotes–one domain per line–and the script will take care of the rest.

sources=('https://adaway.org/hosts.txt'
'http://adblock.gjtech.net/?format=unix-hosts'
'http://adblock.mahakala.is/'
'http://hosts-file.net/.%5Cad_servers.txt'
'http://www.malwaredomainlist.com/hostslist/hosts.txt'
'http://pgl.yoyo.org/adservers/serverlist.php?'
'http://someonewhocares.org/hosts/hosts'
'http://winhelp2002.mvps.org/hosts.txt'
'http://YOUR_DOMAIN_HERE.com'
'http://ANOTHER_DOMAIN_HERE.com')

If your list needs additional formatting to get just the domain name, you still need to put it in the sources arrary, but then you need to figure out how to parse out only the domain names (sed and awk will be your friends).

As an example, this list blocks spam domains, but it has a lot of characters to remove like exclamation points and carats.  So the following sample from the list found at http://spam404bl.com/spam404scamlist.txt

||unlimited-hacks.net^
||pakcircles.com^
||cracksplay.com^
||fbgamecheatz.info^
||linkz.it^

would need to be parsed down to:

unlimited-hacks.net
pakcircles.com
cracksplay.com
fbgamecheatz.info
linkz.it

Once you figured out the command to use, you will need to add an entry inside the case statement.  Put the domain name in quotes followed by a parenthesis.  After the parenthesis, put your curl command in but follow the same format as the other entries already in the statement.

case "$domain" in
        "adblock.mahakala.is")
            agent='Mozilla/5.0 (X11; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0'
            cmd="curl -e http://forum.xda-developers.com/"
            ;;

        "pgl.yoyo.org")
            cmd="curl -d mimetype=plaintext -d hostformat=hosts"
            ;;

        # Default is a simple curl request
        *) cmd="curl"
    esac

Now, just execute the gravity.sh  script again and it will download and append the new list!

Additional DNS Blacklists

Here are a few links to get you started.  I haven’t verified if any of these lists are accurate or still updated.  They just came up in my search while I was creating the Pi-hole.  So add them to script at your own risk.

  • http://cdn.files.trjlive.com/hosts/hosts-v8.txt
  • https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
  • https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt
  • https://zeustracker.abuse.ch/blocklist.php?download=domainblocklist
  • http://sysctl.org/cameleon/hosts
  • http://securemecca.com/Downloads/hosts.txt
  • https://hosts.neocities.org/
  • http://tcpdiag.dl.sourceforge.net/project/adzhosts/HOSTS.txt
  • http://optimate.dl.sourceforge.net/project/adzhosts/HOSTS.txt
  • http://mirror1.malwaredomains.com/files/justdomains
  • https://raw.githubusercontent.com/reek/anti-adblock-killer/master/anti-adblock-killer-filters.txt
  • http://spam404bl.com/spam404scamlist.txt
  • http://malwaredomains.lehigh.edu/files/domains.txt
  • http://download.networktoolbox.de/dnsmasq.conf.tail
  • http://jansal.googlecode.com/svn/trunk/adblock/hosts
  • http://www.sa-blacklist.stearns.org/sa-blacklist/sa-blacklist.current
  • https://easylist-downloads.adblockplus.org/malwaredomains_full.txt
  • https://easylist-downloads.adblockplus.org/easyprivacy.txt
  • https://easylist-downloads.adblockplus.org/easylist.txt
  • https://easylist-downloads.adblockplus.org/fanboy-annoyance.txt
  • http://www.fanboy.co.nz/adblock/opera/urlfilter.ini
  • http://www.fanboy.co.nz/adblock/fanboy-tracking.txt

Using Different DNS Servers

You can change the secondary and tertiary DNS servers to use.  In the automated setup, I just use Google’s public ones, but you might enjoy using OpenDNS, which offers more features.  They are found at 208.67.222.222 and 208.67.220.220.  These settings should be changed in /etc/dnsmasq.conf as shown below.

domain-needed
bogus-priv
no-resolv
server=208.67.222.222
server=208.67.220.220
interface=eth0
listen-address=127.0.0.1
cache-size=10000
log-queries

You can also use any DNS server of your choosing.  There might be a faster one based on your geographical location.

Setting It Up Manually (Building It From Scratch)

This section will give a detailed walkthrough of setting up a Pi-hole from scratch.  You can also reference the original article for more information but it might confuse you more since it is pretty primitive and outdated compared to this article.

Conceptual Overview

  1. Install Raspbian
  2. Set a static IP address
  3. Install the Web server
  4. Install the DNS server
  5. Configure DNS to use secondary and tertiary public DNS servers to look up legitimate sites
  6. Configure DNS to use the hosts file as the first location to query (so ad queries can be blocked)
  7. Configure the Web server to redirect ad domains back to the Pi
  8. Create a blank HTML page or a tiny GIF file to take the place of the ads
  9. Create a script to download the lists and save it has the hosts file
  10. (optional) Install a streaming server to attempt to skip video/audio ads (not just block them)
  11. Set your computer to use the Pi as its DNS server [Critical Step]
  12. Test it out!

Install Raspbian

There are plenty of guides out there on how to do this, so I won’t cover it here.  But you should update everything once it is installed using these commands:

sudo apt-get -y update
sudo apt-get -y upgrade

Set A Static IP Address

This is the file that configures how your Raspberry Pi connects to the network.  There are two ways to set the IP address.  The first is via DHCP, in which the IP address is automatically assigned–this is what your Pi will have set by default.  Second, is by using a static address, which is useful if you want to always know what address to find the Pi at (although a similar effect can be achieved via a DHCP reservation).

A static IP address will work better for this project, so I will show that configuration first.  This is because your Pi is acting as a server, so when you put entries into the hosts file with the Pi’s IP address, that address should not change, which is a possibility when using DHCP.

Static Configuration

You will need to know a little bit about your network including the subnet mask, gateway (IP of the router), and network address to set a static IP.  Below is a sample config that may work for most people, but you may need to adjust it depending on how your network is setup.

Your network addressing may begin with 192.168.0.x and not 192.168.1.x.  You may even be using a 172.16.0.0 or 10.x.x.x address.  But since I made this project to be used at home, your router most likely has the 192.168.x.x address.  Either way, you will need to modify the file to suit your environment.

The method to set a static address is different in Raspbian, “Jessie.”  You should now use /etc/dhcpcd.conf to do this.
Set A Static IP In Raspbian Jessie

sudo vi /etc/dhcpcd.conf

Append this to the bottom of the file, substituting the correct IP address you want.

interface eth0
static ip_address=192.168.1.141/24
static routers=192.168.1.1
static domain_name_servers=192.168.1.1
Set A Static IP In Raspbian Wheezy

Modify/add the highlighted lines in so the file looks like below. (the config below only works for “Wheezy.”  See above for the latest method to set a static IP.

sudo vi /etc/network/interfaces

auto lo
iface lo inet loopback

auto eth0
allow-hotplug eth0
#iface eth0 inet manual
iface eth0 inet static
        address 192.168.1.122
        netmask 255.255.255.0
        network 192.168.1.0
        broadcast 192.168.1.255
        gateway 192.168.1.1

auto wlan0
allow-hotplug wlan0
iface wlan0 inet manual
wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf

auto wlan1
allow-hotplug wlan1
iface wlan1 inet manual
wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf

DHCP Configuration

DHCP won’t work well for the project.  If your IP address happens to change (because it is not set statically) you wouldn’t be able to properly access Web pages and you would have to run the gravity.sh script again to re-create the hosts file with the new IP address.  While it is possible to use a DHCP reservation to always get the same IP, it’s usually better to just set a static IP address instead.

If you do decide to use DHCP, you do not need to change anything as the Pi already has this set.  But you will need to run the script anytime your IP address changes.

Install The Web Server

It is possible to use the Pi-hole without a Web server.  Requests will just return an error instead of a Webpage because nothing will exist at the loopback address.

Speed-wise, choosing to use a Web server or not is negligible.  But using the Web server gives you a lot more functionality including the ability to improve performance with the use of mod_expire , redirect stubborn types of ads that aren’t blocked just by domain name, and the ability to send out your own content in place of ads.  You also get some great logging features, which can be used for some fun stuff.

Install lighttpd

sudo apt-get -y install lighttpd

Adjust Permissions

Make www-data  the new owner of the directory that serves up the Webpage:

sudo chown www-data:www-data /var/www/html

Modify the permissions to make it writeable:

sudo chmod 775 /var/www/html

Add the pi user to the www-data  group:

sudo usermod -a -G www-data pi

Install The DNS server

sudo apt-get -y install dnsmasq

Once this finishes, all of the necessary components are installed, but they still need to be configured.

Start DNS When The System Boots

dnsmasq  won’t start at boot by default, so you need to tell it to do so.

sudo update-rc.d dnsmasq enable

Edit The DNS Config File

/etc/dnsmasq.conf  is the file that contains all of the settings for dnsmasq , so that is the file you need to edit.  Linux config files are nothing more than simple text files with specific options that a program reads.  So all you need to do is put those options in the file.

First, rename to original file so it’s available as a backup in case ever you want a clean version of the file.

sudo mv /etc/dnsmasq.conf /etc/dnsmasq.orig

Then create a new /etc/dnsmasq.conf as shown below:

sudo vi /etc/dnsmasq.conf

domain-needed
bogus-priv
no-resolv
server=8.8.8.8
server=8.8.4.4
interface=eth0
listen-address=127.0.0.1
cache-size=10000
log-queries
log-facility=/var/log/pihole.log
local-ttl=300
log-async

Take note of the server.  These are the secondary and tertiary DNS servers to use for lookups if the first DNS server (your Raspberry Pi) doesn’t have a record for the site.  These are necessary to look up legitimate Websites because when we are done, your Pi’s DNS service will only know about the ad domains you put in your hosts file; it won’t know about any real Webpages on the Internet unless you have those values set.

Configure DNS To Use Secondary And Tertiary Public DNS Servers To Look Up Legitimate Sites

The config above uses Google’s public DNS servers (8.8.8.8 and 8.8.4.4) but you may want to consider using OpenDNS, which might be a bit faster and also provides some other features such as parental controls and filtering.  They are found at 208.67.222.222 and 208.67.220.220.  To use them, just put those IP addresses in place of Google’s.

The options set in the config file above are as follows:

  • domain-needed:  never forward names without a dot or domain part.  This prevents single hostnames–like raspberrypi –from being forwarded to the upstream DNS servers.  There is no need to ask the Internet where that computer is, because it only exists on your local network.
  • bogus-priv: never forward reverse-lookup queries which have the local subnet’s IP range to upstream.  Your local network, probably 192.168.x.x, doesn’t exist on the Internet, so similar to domain-needed, these do not need to be forwarded upstream.
  • no-resolv: do not use /etc/resolv.conf  (see below–this method is a bit easier than using a separate file).  That file can get overwritten when using DHCP, so I use this in the Pi-hole by simply adding the server option shown below
  • server: upstream name servers to use.  Your Pi already uses itself, so you just need to put two public DNS servers here
  • interface: specifies that the ethernet port is used
  • listen-address: this tells what address to listen for queries on.  If using IPv6, you will need another entry that listens on ::1
  • cache-size: set as large as possible (10,000) to allow for super-fast DNS queries
  • log-queries: logs all name resolutions to /var/log/daemon.log (the default location–see below), which allows you to see what sites are being visited, which can also be viewed in real time while your Pi-hole is running.
  • log-facility=/var/log/pihole.log: chooses a different location for the logs to go to.  Since all DNS queries will pass through the Pi-hole, it should have it’s own log file (daemon.log is a shared log between any running daemons).  This make it easier to parse through and extract data.
  • log-async: this allows it to continue functioning without being blocked by syslog, and allows syslog to use dnsmasq for DNS queries without risking deadlock
The Resolver File (Not Used In The Pi-hole, But You Can If You Want)

/etc/resolv.conf  is a file that sets what servers to use to try to resolve domain names (this file does the same thing as the server= value we set in /etc/dnsmasq.conf).  dnsmasq  checks this file by default unless you use the no-resolv  option, which we already did above.  You can choose to use this file, but it is easier to have everything in one place.

Also, this file will cause problems if you are using DHCP (another reason to use a static IP).  Unless you have your DHCP server (usually the router) set to use the Pi-hole as its primary DNS, it will overwrite the file on the next reboot, wiping out your custom settings.

There can only be three entries in this file for name servers.  Add Google’s public DNS servers, OpenDNS ones, or others of your choosing.

nameserver 8.8.8.8
nameserver 8.8.4.4

As previously mentioned, the Pi will check with itself first, so the entries above will allow all queries to go through the Raspberry Pi first, and if it cannot resolve the name, it will use the servers you set.

Configure DNS To Use The Hosts File As The First Location To Query

For simplicity, I designed the Pi-hole to use /etc/pihole/gravity.list, which is just a file formatted like /etc/hosts  but lives in a different location (I still call it a “hosts” file).  It works fast with over 900,000 entries.

You actually do not need to do anything to configure dnsmasq  to read the real hosts file (/etc/hosts) first; your computer already does that by default.  By simply adding entries into the hosts file, it will just start working.  But there is actually another reason I included this section in the article.

There is a special option you can enable in the dnsmasq  config file, /etc/dnsmasq.conf , which allows you to have a file formatted as the hosts file, but it can exist anywhere on the system and be named anything.  If this an option you want to use, your dnsmasq.conf  file should contain a line like this:

addn-hosts=/path/to/hostsfile

So you can choose to use the option above if you want the file to exist somewhere else and leave the hosts file alone.

Configure The Web Server To Redirect Ad Domains To A Blank Page

Now that DNS is able to redirect advertisement queries back to itself using the hosts file, we can use the Web server to point them to a blank Webpage (or other resources like a tiny transparent GIF file).

Just like dnsmasq , the Web server has a text config file: /etc/lighttpd/lighttpd.conf .  Backup the original file in the same fashion as the dnsamsq  config file.

sudo mv /etc/lighttpd/lighttpd.conf /etc/lighttpd/lighttpd.orig

Create a new file as shown below.

sudo vi /etc/lighttpd/lighttpd.conf

server.modules = (
    "mod_expire",
    "mod_compress",
    "mod_redirect",
    "mod_rewrite"
)

server.document-root        = "/var/www"
server.upload-dirs          = ( "/var/cache/lighttpd/uploads" )
server.errorlog             = "/var/log/lighttpd/error.log"
server.pid-file             = "/var/run/lighttpd.pid"
server.username             = "www-data"
server.groupname            = "www-data"
server.port                 = 80


index-file.names            = ( "index.php", "index.html", "index.lighttpd.html" )
url.access-deny             = ( "~", ".inc" )
static-file.exclude-extensions = ( ".php", ".pl", ".fcgi" )

compress.cache-dir          = "/var/cache/lighttpd/compress/"
compress.filetype           = ( "application/javascript", "text/css", "text/html", "text/plain" )

# default listening port for IPv6 falls back to the IPv4 port
include_shell "/usr/share/lighttpd/use-ipv6.pl " + server.port
include_shell "/usr/share/lighttpd/create-mime.assign.pl"
include_shell "/usr/share/lighttpd/include-conf-enabled.pl"

# Set access to 1 day for better query performance when the list gets so large
# http://jacobsalmela.com/raspberry-pi-block-ads-adtrap/#comment-2013820434
$HTTP["url"] =~ "^/pihole/" {
        expire.url = ("" => "access plus 1 days")
}

# Rewrites all URLs to the /var/www/pihole/index.html
$HTTP["host"] =~ ".*" {
     url.rewrite = (".*" => "pihole/index.html")
}

Then, restart lighttpd  to apply everything.

sudo service lighttpd restart

Understanding The lighttpd Config File

This file is way more complex than the dnsmasq  one, so I will break each part down and explain what it does.  This should also help if you want to modify the file to redirect the requests somewhere else (i.e. a streaming server), which is what you probably need to do to block video ads like those that Hulu Plus has (skipping video ads is still a work in progress).

First, the additional modules are loaded.  This is like turning on a feature so you can use it.

server.modules = (
    "mod_expire",
    "mod_compress",
    "mod_redirect",
    "mod_rewrite"
)

Then, there are a bunch of server settings, like the port of the server and the folder where the Website resources are stored.

server.document-root        = "/var/www/html"
server.upload-dirs          = ( "/var/cache/lighttpd/uploads" )
server.errorlog             = "/var/log/lighttpd/error.log"
server.pid-file             = "/var/run/lighttpd.pid"
server.username             = "www-data"
server.groupname            = "www-data"
server.port                 = 80

These are some other basic settings, which I honestly don’t know if you need them or not.  But all the resources I researched had these in them, so I just did the same.

index-file.names            = ( "index.php", "index.html", "index.lighttpd.html" )
url.access-deny             = ( "~", ".inc" )
static-file.exclude-extensions = ( ".php", ".pl", ".fcgi" )

compress.cache-dir          = "/var/cache/lighttpd/compress/"
compress.filetype           = ( "application/javascript", "text/css", "text/html", "text/plain" )

The next lines say to use IPv6, and use some config files.

include_shell "/usr/share/lighttpd/use-ipv6.pl " + server.port
include_shell "/usr/share/lighttpd/create-mime.assign.pl"
include_shell "/usr/share/lighttpd/include-conf-enabled.pl"

Now we get into the custom settings that are specific to the Pi-hole.  This first part says that if the URL points to something in the /var/www/pihole  folder, it’s expiration time should be one day.

This is one of the ways the Pi-hole saves processing; the client doesn’t need to contact the Pi-hole about this URL for one day.  So when a query comes through, it doesn’t have to see if there is something new on the page.  And really, this could be set to any time frame because it’s all pointing to your fake Web server anyway.

$HTTP["url"] =~ "^/pihole/" {
        expire.url = ("" => "access plus 1 days")
}

Finally, this next line says that every domain should be sent to /var/www/pihole/index.html .  Since the only requests that will get routed back to the Pi are ad domains, we can use the star (*) as a wildcard to send it all into the “black hole”.

$HTTP["host"] =~ ".*" {
     url.rewrite = (".*" => "pihole/index.html")
}

You can use rewrite or redirect here.  The difference? url.rewrite  doesn’t change the URL client-side.  So if you went to doubleclick.net, it would show you went to doubleclick.net even though you really were being shown a page from /var/www/pihole/ .  url.redirect  still sends you somewhere else, but you would see the change on your end.  It’s kind of confusing, but this link explains the differences better than I can.

Create A Blank HTML Page Or A Tiny GIF File To Take The Place Of The Ads

The Web server now knows to send ad domains to /var/www/pihole/index.html , but that file doesn’t exist yet.  There are a few options you can choose from for this Webpage.  Some work better than others, so experiment with what works best for you.  I have found that a simple blank Webpage works the best in most situations, but you can really make it into whatever you want:

sudo mkdir /var/www/pihole
sudo vi /var/www/pihole/index.html

<html>
<body>
 </body>
</html>

There are a few other options you can try.  One is to just show a tiny 1×1 transparent image file (it should be save in /var/www/html/pihole ).  That image file can be replaced with a GIF of different sizes, with the smallest being 25 Bytes.

<html>
 <body>
  <img src="25Bytes.gif" alt="" />
 </body>
</html>

That seemed to work OK for me, but I found it worked better if I put it in a <div></div> and defined the height and width:

<html>
 <body>
  <div style="height: 1px; width: 1px;"><img src="25Bytes.gif" alt="" width="1" height="1" /></div> 
 </body>
</html>

Another fun option is to use a base64 encoded 1×1 GIF.  The cool part is that you do not even need an actual file to do this.  I didn’t have as much luck with this, but it’s still pretty cool.  Just create the index.html  page as shown below:

<html>
 <body>
  <img src="" alt="" />
 </body>
</html>

One last option is to not use an .html file at all and instead, rewrite/redirect the pages directly to the image file by setting it like this in /etc/lighttpd/lighttpd.conf :

$HTTP["host"] =~ ".*" {
     url.rewrite = (".*" => "pihole/25Bytes.gif")
}

It’s impossible to test every site, so mess around and choose the one that seems to work best for you.  As I mentioned previously, the simple blank page worked the best for me.

Create A Script To Download The Lists And Save It As The Hosts File

Download the script (reminder: the script found at the link is much different now, but the concepts are still the same) and make sure it is executable.

curl -o /usr/local/bin/gravity.sh https://raw.githubusercontent.com/jacobsalmela/pi-hole/master/gravity.sh
chmod 755 /usr/local/bin/gravity.sh

Now that everything is set up, you need to actually get the list of the domains by running the script (make sure your static IP address is set, or your hosts file will be misconfigured).

sudo /usr/local/bin/gravity.sh

I covered the script in detail earlier, so I will just cover a few lines you may want to change depending on your setup.  If you do edit it, you will just need to run the script again.

Changing Variables

First off, you may want to change the location of some of the files generated by the Pi-hole.  This will save the files somewhere else.  Leave off the trailing slash of directories.  Also, if you decided to use the addn-hosts= option in dnsmasq.conf , make sure that that path matches what you put in the $adList  variable below.

origin=~/etc/pihole
adList=/etc/hosts

(Optional) Install A Streaming Server To Attempt To Block Video/audio Ads

I will be covering this in a separate article, because this article is already long enough!  I will update it when I get better results.  For now, experiment with the link in the header of this section or check out this pull request.

Set Your Computer To Use The Pi As Its DNS Server [Critical Step]

If you don’t set your computer to use the Pi as it’s DNS server, you won’t be able to benefit from all of your hard work.  This is different for every operating system, but Google’s instructions are pretty helpful.

Set The DNS Server on Your Router Instead Of Configuring Each Device On Your Network

It is possible to have your router force your network devices to use the Pi-hole as their DNS server.  It involves changing the DNS server in the DHCP options section of your router.  To work properly, it should be the only DNS server you set, even though there will likely be fields for two more (don’t forget, the Pi-hole is already configured to use other upstream servers).

Unfortunately, this part is a bit difficult to describe as every router is different.  Also be aware that it might not work with some routers.

Test Functionality

Restart the Pi after making all these changes.  I have an FAQ setup for testing and troubleshooting, but I also explain some of the tests you can run below.

sudo shutdown -r now

Test One: Check DNS And The Hosts File

From a computer that is using the Pi as its DNS server, ping an ad domain and it should return the IP address of your Rpi (not the real IP of the domain).

ping 247media.com

PING 247media.com (192.168.1.117): 56 data bytes
64 bytes from 192.168.1.117: icmp_seq=0 ttl=64 time=0.060 ms
64 bytes from 192.168.1.117: icmp_seq=1 ttl=64 time=0.111 ms
64 bytes from 192.168.1.117: icmp_seq=2 ttl=64 time=0.120 ms
64 bytes from 192.168.1.117: icmp_seq=3 ttl=64 time=0.089 ms
64 bytes from 192.168.1.117: icmp_seq=4 ttl=64 time=0.067 ms

If you get replies, you know dnsmasq  and the hosts file and dnsmasq  are both working properly

Test Two: Check The Web Server

As long as your computer is on the same network as the Pi, you can view the blank Webpage by navigating to the Pi’s IP address from a browser.  If you are in Chrome or Firefox, you can right click > Inspect Element and verify that it is your HTML code.

Test Three: Check Name Resolution And The Web Server Together

Try going to a domain on the ad list in a browser.  If it loads your Pi’s Webpage, then both services are working correctly.  When originally setting this up, I had the hosts file on the RPi using 127.0.0.1, but this meant that when a client requested the domain, it would try to connect to its loopback address, not the Pi’s.  The Webpage would not load because the Web server wasn’t running on the client (127.0.0.1).  This also explained why I could access the Web page via IP address but not via domain name.

Test 4: Verify Upstream DNS

And Observe The Speed And Performance Benefits Of The Pi-hole

You can verify that the Pi-hole is actually making your DNS queries–thus browsing the Internet–faster.  Make sure you have your computer using the Pi as its DNS server and then run the dig  command on a domain you have never visited before.  It doesn’t need to be an advertising domain; it can be any domain as long as you haven’t been to it yet (this will ensure the response has not already been cached).

In the example below, I dig a site that I have not been to and is not an ad server.

dig pi-hole.net

Take note of two things in the output of the command below:

  1. the Website’s real address is shown
  2. the response time is 148ms.
; <<>> DiG 9.8.3-P1 <<>> pi-hole.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18944
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;pi-hole.net. IN A

;; ANSWER SECTION:
pi-hole.net. 247 IN A 184.168.47.225

;; Query time: 148 msec
;; SERVER: 192.168.1.200#53(192.168.1.200)
;; WHEN: Wed Nov 25 14:54:26 2015
;; MSG SIZE rcvd: 45

Now, take a look at the results when I run the same command again.  Thanks to the caching we enabled, the response time is now 6ms; much better!  This happens because we set the option cache=10000  in the /etc/dnsmasq.conf  file.  This cache is stored in RAM and will be flushed whenever the service is restarted.

; <<>> DiG 9.8.3-P1 <<>> pi-hole.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18944
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;pi-hole.net. IN A

;; ANSWER SECTION:
pi-hole.net. 247 IN A 184.168.47.225

;; Query time: 6 msec
;; SERVER: 192.168.1.200#53(192.168.1.200)
;; WHEN: Wed Nov 25 14:57:10 2015
;; MSG SIZE rcvd: 45

Next, I dig  a site that is an ad server to show that it get’s redirected back to the Pi and not the real IP address of the domain.

dig a-ads.com

; <<>> DiG 9.8.3-P1 <<>> a-ads.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18944
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;a-ads.com. IN A

;; ANSWER SECTION:
a-ads.com. 247 IN A 192.168.1.200

;; Query time: 305 msec
;; SERVER: 192.168.1.200#53(192.168.1.200)
;; WHEN: Wed Nov 25 16:22:38 2015
;; MSG SIZE rcvd: 45

And again, when I run it the second time (see the faster response time):

; <<>> DiG 9.8.3-P1 <<>> a-ads.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 18944
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;a-ads.com. IN A

;; ANSWER SECTION:
a-ads.com. 247 IN A 192.168.1.200

;; Query time: 8 msec
;; SERVER: 192.168.1.200#53(192.168.1.200)
;; WHEN: Wed Nov 25 16:23:11 2015
;; MSG SIZE rcvd: 45

Create A Recurring Task to Update the Ad List

If you installed the Pi-hole using the automated install command, a system-wide crontab is installed to /etc/crontab, which will run do this for you, but you can also make your own as described below.

In case any the Websitew that compile the lists of ad servers ever updates, set gravity.sh  to run once a week using crontab .  This is much more respectful to do now than the original Pi-hole was because this script will only download the list from the site if there are changes.

sudo crontab -e

Append the following line to the bottom of the file and save it:

@weekly /usr/local/bin/gravity.sh

Now the ad list should always stay up-to-date if the Website it is being pulled from ever updates it.  You can just leave your Pi-hole plugged in next to your router and always be blocking the most ads possible!

Alternative To /etc/hosts

You don’t have to use the hosts file (or addn-hosts ), but performance starts to suffer once the list of domains gets past 120,000.  Instead, you can use the /etc/dnsmasq.d  folder.  Any file in this directory is considered a config file and is loaded up when dnsmasq  starts.  The syntax of the example line below says “if the address queried is 101com.com (an ad server) send it to this IP address (the Pi–192.168.1.117 or whatever you set it to).”

address=/101com.com/192.168.1.117

This file is full of these rules, which are loaded when dnsmasq  is.  If a DNS query matches one of the rules, it will send the request to 192.168.1.117, which is the Raspberry Pi.  Since the request gets redirected there, it won’t reach your other device(s).

One problem with this method is that the file is a lot bigger than a hosts file because of the additional characters.  Because of this and the sheer quantity of domains to parse through, it can really slow things down.

How Websites Are Displayed Using The Pi-hole

Depending on the Website and the content you decided on for your index.html  file, you may see different results.  Sometimes, the ad-space is completely shrunk down as shown below.

But other times, a box or blank whitespace may still remain where the ads used to be.

Try changing the index.html  file to use different methods and see how things work for you.  It’s likely that the Pi-hole won’t be able to block every ad perfectly, but it does work for the most part.  There Internet is organic and advertisers will always find a way to inject ads.

Success!

You can test it at AdsBlocker Testing or Simple Ad Block’s test.

If you want a real-life test, I would suggest Cafemom or Adsreview (from the videos at the beginning of this post). Any article from Macworld or OSX Daily are also some of the sites I use to test as they have a lot of ads.

Some Ads Are Still Showing

Ads can be served from many different domains and the lists may not know about all of them.  The first thing to do is find out what domains are being used on a certain site, and then add/remove domains from your list.  The easiest way to do this is using DNSThingy’s Whitelist Assistant for Chrome.  Details on how to use it for this purpose can be found here.

Watching the Log File And Viewing Perfomance

When we edited the dnsmasq.conf  file, we added the debug option (log-queries ).  This is very useful for finding out what URLs ads might be coming from.  You can simply watch the log file as you navigate to a site on your computer.  To watch the log file in real time, use this command.  It is kind of fun to see what pops up when you navigate to a site.

tail -f /var/log/daemon.log

But since we added the line log-facility=/var/log/pihole.log , the path just needs to be adjusted.  If you installed the Pi-hole before August 2015, the command above will still work.

tail -f /var/log/pihole.log

Once you find a URL you want to block, you can just append it to your /etc/pihole/blacklist.txt.

An LCD Screen For Viewing In Real-time What Ads Are Being Blocked

Using this log file, you can make a real-time monitor for the Pi-hole that lets you see what domains are being blocked.  I am using a 2.8″ LCD screen so I can just peek at it while browsing.  This real-time monitor is also available over ssh .  See the video below of it in action.

Web Interface For Viewing Your Pi-hole’s Performance

The automated installer includes a nice Web interface, which shows stats from a Web page.  You can manually install it.

Donations For Continued Development

The Pi-hole is free, but powered by your donations.  I mostly work on it for fun in my free time, but when I do receive donations, I push myself to do some extra development.

pi-hole/pi-hole
A black hole for Internet advertisements. Contribute to pi-hole/pi-hole development by creating an account on GitHub.