Pi-hole: Raspberry Pi Ad-Blocker [Advanced Setup]

Roll-your-own Pi-hole Ad-blocker!

Automated Install

Run this command only on a clean install of the Raspberry Pi running Raspbian.

curl -s "https://raw.githubusercontent.com/jacobsalmela/pi-hole/master/automated%20install/basic-install.sh" | bash

Point your devices to use the Raspberry Pi as their DNS server and you are all set!

Advanced Setup

This guide assumes you have already set up a basic ad-blocker using the Raspberry Pi; you will basically just be replacing the gravity.sh script with the new one below.

For the advanced setup, you will be creating a script that pulls known ad servers from multiple locations, not just one that was set up in the original script.

Also check out my efforts to skip Hulu video ads

Below is the advanced gravity.sh  script, or you can view the fully-commented version on Github.  Be sure to modify the piholeIP  variable (highlighted below) to the IP address of your Raspberry Pi.  You may also want to use the commented version of the script because it contains some echo  commands so you can see how far along in the process it is.

#!/bin/bash
# The Pi-hole now blocks over 140,000 ad domains
# Address to send ads to (the RPi)
piholeIP="127.0.0.1"
# Optionally, uncomment to automatically detect the address.  Thanks Gregg
#piholeIP=$(ifconfig eth0 | awk '/inet addr/{print substr($2,6)}')

# Config file to hold URL rules
eventHorizion="/etc/dnsmasq.d/adList.conf"

echo "Getting yoyo ad list..." # Approximately 2452 domains at the time of writing
curl -s -d mimetype=plaintext -d hostformat=unixhosts http://pgl.yoyo.org/adservers/serverlist.php? | sort > /tmp/matter.txt
echo "Getting winhelp2002 ad list..." # 12985 domains
curl -s http://winhelp2002.mvps.org/hosts.txt | grep -v "#" | grep -v "127.0.0.1" | sed '/^$/d' | sed 's/\ /\\ /g' | awk '{print $2}' | sort >> /tmp/matter.txt
echo "Getting adaway ad list..." # 445 domains
curl -s https://adaway.org/hosts.txt | grep -v "#" | grep -v "::1" | sed '/^$/d' | sed 's/\ /\\ /g' | awk '{print $2}' | grep -v '^\\' | grep -v '\\$' | sort >> /tmp/matter.txt
echo "Getting hosts-file ad list..." # 28050 domains
curl -s http://hosts-file.net/.%5Cad_servers.txt | grep -v "#" | grep -v "::1" | sed '/^$/d' | sed 's/\ /\\ /g' | awk '{print $2}' | grep -v '^\\' | grep -v '\\$' | sort >> /tmp/matter.txt
echo "Getting malwaredomainlist ad list..." # 1352 domains
curl -s http://www.malwaredomainlist.com/hostslist/hosts.txt | grep -v "#" | sed '/^$/d' | sed 's/\ /\\ /g' | awk '{print $3}' | grep -v '^\\' | grep -v '\\$' | sort >> /tmp/matter.txt
echo "Getting adblock.gjtech ad list..." # 696 domains
curl -s http://adblock.gjtech.net/?format=unix-hosts | grep -v "#" | sed '/^$/d' | sed 's/\ /\\ /g' | awk '{print $2}' | grep -v '^\\' | grep -v '\\$' | sort >> /tmp/matter.txt
echo "Getting someone who cares ad list..." # 10600
curl -s http://someonewhocares.org/hosts/hosts | grep -v "#" | sed '/^$/d' | sed 's/\ /\\ /g' | grep -v '^\\' | grep -v '\\$' | awk '{print $2}' | grep -v '^\\' | grep -v '\\$' | sort >> /tmp/matter.txt
echo "Getting Mother of All Ad Blocks list..." # 102168 domains!! Thanks Kacy
curl -A 'Mozilla/5.0 (X11; Linux x86_64; rv:30.0) Gecko/20100101 Firefox/30.0' -e http://forum.xda-developers.com/ http://adblock.mahakala.is/ | grep -v "#" | awk '{print $2}' | sort >> /tmp/matter.txt

# Sort the aggregated results and remove any duplicates
echo "Removing duplicates and formatting to address=/<ad domain>/"$piholeIP
cat /tmp/matter.txt | sed $'s/\r$//' | sort | uniq | sed '/^$/d' | awk -v "IP=$piholeIP" '{sub(/\r$/,""); print "address=/"$0"/"IP}' > /tmp/andLight.txt

# Count how many domains were added so it can be displayed to the user
numberOfAdsBlocked=$(cat /tmp/andLight.txt | wc -l | sed 's/^[ \t]*//')
echo "$numberOfAdsBlocked ad domains added to the blacklist"

# Turn the file into a dnsmasq config file
mv /tmp/andLight.txt $eventHorizion

# Restart DNS
service dnsmasq restart

The original list of ad servers from pgl.yoyo.org was about 42KB but the aggregated list from the script above is 5.5MB with around 120,000 ad domains!  That is a lot more ad servers we can now block with the Raspberry Pi!

Depending on your network connection, this could take a while to download (but this has been drastically improved).  The first time you run it, you may want to send it to the background with this command:

sudo /usr/local/bin/gravity.sh &

Black hole image

51 Replies to “Pi-hole: Raspberry Pi Ad-Blocker [Advanced Setup]”

  1. I started the script and it seems to be just sitting here for past 15 mins doing nothing….I know you said it takes a while…

    1. It does take a while. It might be even slower depending on the speed of your SD card…
      One thing you can try is to add some echo commands after each line of code so you know where it is in the process, or you could also run the script using bash -x to see the debug output.

  2. Have used this script on a debian wheezy build. It works great, but certain web pages hang for ages before loading. Any ideas?

    1. Does the site load OK once it has been cached? I have noticed that some ads are tightly-integrated into some sites and have trouble loading unless you view them. There are so many ad URLs that get blocked, so maybe they are actually blocking something legitimate.

  3. I, too, saw a lot of extreme slowness with the advanced gravity. I reverted back to the original script and it works a lot better.

    1. I updated the script with some echo‘s so that you can see where it is in the script and it seems to work a lot better now. The gravity-adv.sh script has a lot more domains that it will block, but yes, it will take longer to download. There is a lot of parsing/formatting done by bash that will also take additional time. You probably don’t even need to run it once a week. I’m sure there are not that many domains that get changed/added.

  4. Jacob,

    I installed the advanced script. I realized, in my testing back and forth, I had forgotten to put in my correct pi hole IP! It is working now…sorta….I still get ads…I am not sure if it is a case of the server being busy and forwarding to other dns servers…I will continue to play with it.

    1. I’m sure there are still domains out there that are not on the list. If you do see an ad, you can always view the DNS log file using tail -f /var/log/daemon.log and try to figure out where it is coming from and then add that to the list manually.

  5. Well, here’s the dilemna – I want to use this in my corp env – if I set the blocker’s resolv.conf to 127.0.0.1, and the 2 google dns, then any machine using the blocker for dns cannot find local dns sources – like my exchange server – it forwards to the requests to google – even though the client device has the blocker as the first dns, BUT also the 2 AD DNS servers as 2nd and 3rd.

    If I set the resolv.conf on the blocker to 127.0.0.1, and then my 2 AD DNS servers, then local resource lookup is fine, but then the ads are not getting blocked consistently…

      1. On my windows test client, I have set my AD DNS as the first dns, the pi as 2nd, and the 2nd AD DNS as 3rd. Local resolution works fine, and the blocker works now too. I wouldn’t think it would, but I guess the pi “answers” the dns queries the fastest for WWW.

          1. I am cheating. I set up a small debian vm in my VMWare environment, using your guide. Works like a charm. I am also only testing with my account, so far.

  6. if you want, you can add this in to the list, its good to use

    echo “Getting someonewhocares.org ad list…”
    curl -s someonewhocares.org/hosts/hosts | grep -v “#” | sed ‘/^$/d’ | sed ‘s/ /\ /g’ | awk ‘{print $2}’ | sort >$

  7. Well, the deployment at home is going great, but the test in my corp environment isn’t as great. I find it is best to set the resolv.conf on the pi to itself only. Then, on the client machine, set the DNS as I said in my earlier posts. However, sometime local DNS resolution goes wonky and cannot find a resource. And often the ads still get served because the AD servers “aswer” first.

    Is there any way to turn this into an IE socks5 or other proxy, and take care of it in the web browser only instead?

      1. Any chance you could come up with a quick guide?

        I played around with the server and such you suggested and it wasn’t any better. I am sure I was doing it wrong.

          1. Yeah I took another look and I think I have it configured right now. I’ll let you know how it works out.

  8. thanks for the handy how-to, i’m just in the middle of setting up my own DNS server. it occurred to me, that it would be nice to be able to set up a ‘whitelist’ so I could keep a list of sites that I DONT want to be included in the adList.conf file.

    is there a simple way to do this?

    my bash skills are basic to say the least…

    great site though! 🙂

    1. If you didn’t set up the cron job, just go in and delete an entry from adList.conf. It actually isn’t that difficult to whitelist a site via the script and I plan to implement this in the script as a few other people have asked for the feature. However, I only blog as a side-project, so I haven’t made time to do that yet. Basically, you would need an if or case statement within the script that would say, “if one of the URLs being added equals this domain, do not add it to the list.”

      I will post the whitelist feature on the Github script, so you can watch there for the change, but it might not be for a little while.

      1. Something like:

        # Sort the aggregated results and remove any duplicates
        # AND Apply whitelist
        echo “Removing duplicates and formatting to address=//”$piholeIP
        cat /tmp/matter.txt | sed $’s/r$//’ | sort | uniq | sed ‘/^$/d’ | cat – /usr/local/etc/whitelist.txt | sort | uniq -u | awk -v “IP=$piholeIP” ‘{sub(/r$/,””); print “address=/”$0″/”IP}’ > /tmp/andLight.txt

          1. Thanks for the great work guys. on a similar vein, is there a way to
            keep manual DNS redirect entries after updating the adserver list?

            I
            think at the moment if I make a manual entry (say to point myserver.com
            to my home servers static LAN IP) it will be removed when running the
            update script. Am I correct?

          2. Affirmative. However, if you were to modify the script to simply append your server after adding all of the ad-domains, it will always add it back in.

            For example, add a like like this at the end of the script:

            echo "address=/myserver.com/10.10.20.20" >> /etc/dnsmasq.d/adList.conf

            but make sure you have the double >> and not a single one, otherwise, you will overwrite the entire file.

            I think that should work. Let me know.

  9. This works great, thanks. But when I enabled the ipv6 in my network, the advertisements are coming back, I know that the dnsmasq supports ipv6, but how do block them in ipv6 network?

    1. Good question. I know one person who set this up on a VM and simply enabled IPv6 and it worked…My guess if that you would need to enable IPv6 in /etc/interfaces, otherwise it is just communicating over IPv4. I’ll keep working on this myself, but if you figure it out, let me know; I would love to update this page with instructions for IPv6.

  10. I Installed this and it seems to be working, but I am getting large gray squares wherever an ad would be. I verified that lighttp was running and I’ve updated the IP in gravity.sh. Any ideas?

    1. It has been suggested to use a 1x1px transparent image, which is a good idea considering that’s basically what pixelserv does. I plan to implement this in the script.

  11. Some of the lists you pull from have DOS style line-endings and its causing duplicates in the resulting “andLight.txt” file. eg. Look for “www.spinbox.com” in — you will notice there are two, one of them has the DOS-style line-ending.

    This can be fixed by piping “matter.txt” through sed to remove the DOS line endings. Then sort/uniq will filter out the duplicates more of the duplicates.

    Change:
    cat /tmp/matter.txt | sort | uniq | sed ‘/^$/d’ | awk […]

    To:
    cat /tmp/matter.txt | sed $’s/r$//’ | sort | uniq | sed ‘/^$/d’ | awk […]

    The resulting “andLight.txt” file drops from 6.3MB to 5.6MB.

    1. Interesting. I did try this and it seems to remove the duplicates. I will add this to the script.

      Bummer, that means it doesn’t get to 140,000 ad domains like I thought…but 124,000 is still pretty good!

  12. Nice work! I implemented your script on NUC using Apache.

    Nevertheless, I am having trouble with iOS Devices on some pages (e.g. spiegel.de, a German magazine). It takes ages to load the page. I changed the response from a blank page to a 1×1 transparent pixel which had no effect. The page loads fine on my Notebook or a Windows Phone.

    Pointing to 127.0.0.1 instead of the NUCs IP address solves this problem on all devices.

    Anyone having similar problems?

    Regards,
    Arne

  13. Hi Jacob.

    Just wondering.. .Tested this on my iphone (wifi, no mobiledata)

    on http://ads-blocker.com/testing/

    Here I am seeing the ads.

    I set my router to use my adblocking device ( 🙂 ) as DNS.. so shouldn’t all devices connected to my router use the adblocker ?

    1. I think it should. I have never tried, but if my logic is correct, I think if you set the Pi as a primary DNS server within your router settings and then set your iPhone’s DNS server to be your router, it should work. So your iPhone would query your router, which would query the primary DNS server, and then any additional ones.

    1. I don’t have an automatic uninstaller. You would just need to remove dnsmasq and lighttpd like this:

      sudo apt-get remove dnsmasq
      sudo apt-get remove lighttpd

    1. The CPU usage is pretty low as only responding to queries and uploading HTML files isn’t very intensive. A SATA connection would drastically improve logging if you could redirect all of the logs though that device. It would also extend the life expectancy of your SD card.

      1. Banana Pi Pro installed Bananian v15.08 (Debian 8/Jessie) booting off Intel 120GB SSD running Pi-Hole. Gravity pulling in 75578 domains still blocking ads on sites I frequently visit can’t tell difference. I did installation via sources from your script executed one line at a time. Only one I didn’t execute was usermod -a -G www-data pi because I have to replace “pi” with “root” not sure if that’s safe. Thinking about it probably is because web interface is not opened in my firewall. Also had to omit sudo commands since Bananian logs in with root account. There is a Raspbian build for Banana Pi’s maybe I’ll try that in the coming weeks. Let me test this setup for a few weeks to make sure it’s stable.

        1. Sounds fun. The more platforms it works on, the better!

          There are plenty of other flavors that have root as the default user, which can be good or bad. I have just been a long time Raspbian fan, so that’s where I began the project.

Leave a Reply