Upcoming Features For The Ad-blocking Pi-hole: Back-end

The Pi-hole blocks ads for your entire network and it does so without any attention from you.  I get comments and emails every day of people saying that they like how well it works.  This is nice because since I know that it is operational, I can work on fun, new features.  I prefer to do things this way as there is no point in improving something that is already broken.  So thankfully, the Pi-hole continues to run!

In this post, I will tell you about some back-end improvements and then I will be making a another post about something really fun that I have been working on for the front-end.

Back-end Features

Debugging Using X-Headers

Prior to figuring it out, if you had the Pi-hole dashboard installed, you were not able to access it while connected to the Pi-hole because all requests–including the dashboard–would get redirected to a blank page.

I have now it setup so that you can run a curl  command to see if everything is set up properly.  Take a look at the commands below.  First, I download the header of the dashboard to verify it is working, which the X-Pi-hole header confirms.

me@MacBookAir:~$ curl -I http://192.168.1.141/admin/index.php
HTTP/1.1 200 OK
X-Pi-hole: The Pi-hole Web interface is working!
Content-type: text/html; charset=UTF-8
Date: Fri, 09 Oct 2015 22:16:55 GMT
Server: lighttpd/1.4.35

Next, I download an ad-serving domain to verify the blank page gets downloaded and that the cache control is working.  This time, the X-header has changed to show that the black hole is devouring advertisements.  The cache-control part already existed since this was part of the original Pi-hole, but it’s nice to have anyway.

fme@MacBookAir:~$ curl -I a-ads.com
HTTP/1.1 200 OK
X-Pi-hole: A black hole for Internet advertisements.
Expires: Sat, 10 Oct 2015 22:16:58 GMT
Cache-Control: max-age=86400
Content-Type: text/html
Accept-Ranges: bytes
ETag: "831319033"
Last-Modified: Sat, 03 Oct 2015 16:56:15 GMT
Content-Length: 49
Date: Fri, 09 Oct 2015 22:16:58 GMT
Server: lighttpd/1.4.35

Access Log

Another useful feature is to see an access log of what resources are requested by clients.  Here is a snippet of a log.  First, my own site was trying to track me via WordPress stats.  Then, you can see my curl  commands going through, and finally, me writing this post.

.....
pixel.wp.com 192.168.1.134 - - [09/Oct/2015:22:10:57 +0000] "GET /g.gif?v=wpcom-no-pv&x_notes-menu-impressions=nonzeroasync-jetpack&baba=0.1243435454 HTTP/1.1" 200 49 "http://pi-hole.net/install-the-web-interface-on-an-existing-pi-hole/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.7424.55 Safari/375.36"
192.168.1.141 192.168.1.134 - - [09/Oct/2015:22:11:37 +0000] "HEAD /admin/index.php HTTP/1.1" 200 0 "-" "curl/7.43.0"
192.168.1.141 192.168.1.134 - - [09/Oct/2015:22:15:04 +0000] "HEAD / HTTP/1.1" 200 0 "-" "curl/7.43.0"
pixel.wp.com 192.168.1.134 - - [09/Oct/2015:22:17:27 +0000] "GET /g.gif?v=wpcom-no-pv&x_notesmenu-impressions=zero-jetpack&baba=0.4221303828526288 HTTP/1.1" 200 49 "http://jacobsalmela.com/wp-admin/post-new.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.5.4123.15 Safari/537.36"
.....

You can use the access log for debugging, trending, or… my next fun, front-end project.  Here is a hint:

pi-hole-pong

19 Replies to “Upcoming Features For The Ad-blocking Pi-hole: Back-end”

  1. Great new about the backen!. I also checked the webinterface post you made earlier regarding the anon web interface (did not try it yet since there is no installation manual available). Installed the one you made and am very happy with the general stats. Noticed the index2.html in the folder, would be a blast if there would be more information available in the future from the logs 🙂

    1. The anon interface was a project someone else started and is pretty in depth. It translates the lists and files into a database for easier manipulation. Ultimately, it was too complex for me and I think most of my user base, which is why I went with AdminLTE.

      It’s certainly in the plans to do a lot more cool stats like those seen on index2.html.

      1. I will give it a whirl, I notices it searches for the log in a different place (there is some sort of documentation) and I am not sure if I need to install (and configure) a MySQL(-like) database for it to work. It’s nice to have some interface for the logs (when you want to trouble shoot why something does not work f.e. and might need to greenlist it).

        Looking forward to the new stats! I did notice the current web pi-hole mentions “today” everywhere but it does not seem to reset after 24 hrs? Is it a “total” or am I just incorrect?

  2. Still loving my pihole although my wife complains about wanting to click on some ads… It would be really great if it were easier to add URLs to the white list from the browser interface (even better through a browser extension a la Ad Block) and especially without having to rerun the whole script just to add one element to the list. Actually I haven’t tried the UI yet at all so some of this might already be there!

    I have found some of the lists to contain non ad URLs e.g. Microsoft.com and others so I’m somewhat frequently whitelisting domains.

    1. I have removed the (by far biggest) list of adblock.mahakala.is myself. The total domains dropped to “just” ~66.000 but I still don’t see adds, and now certain websites that failed work again. I did not know this list and searching for it I learned more people had false positives. As an extra bonus I think it keeps the system much faster (since obviously that list is huge). Your mileage may vary of course.

      1. I also removed the adblock.mahakala.is list. I understand why it’s in pihole as it pushes the domains blocked to 1,6 million which looks nicer than just a couple of hundred thousand blocked domains. But the list is crap as it blocks far too many normal websites. A new user of pihole who suddenly finds http://www.microsoft.com blocked and doesn’t know why it’s blocked and how to fix it easily probably won’t be a very happy user.

        @Jacob, the entry in the gravity.sh script for source “http://hosts-file.net/.%5Cad_servers.txt” is not valid anymore. It should be http://hosts-file.net/download/hosts.txt.

        @DonLexos if you correct the hosts-file.net entry in the script you should see the domains blocked go up to around 380000.

        1. @@godisard:disqus : thank you for the tip! I have changed that and I bumped it from 66.662 to 377.169

          Agree with the mahakala list, I did a google search and found more users with issues with this list.

          @jacobsalmela:disqus: is there an easy way to reset the stats on the Admin interface? Right now it says ‘today’ but it’s rather ‘total’ and it’s rising all fine, however I would like to make 1 nice cut now that everything is set correctly.

          Oh and btw I was not able to install the dnsgui interface since the script uses files that are not ina pi-hole installation and are not on github (yet) so we need to be patient untill the author finishes his how-to manual.

          1. Looking around on hosts-file.net I came across http://hosts-file.net/hphosts-partial.txt. According to the site it “contains a list of sites that have been added after the last full release of hpHosts” (which is the list at http://hosts-file.net/download/hosts.txt). Nice addition to the full list if you want to have the latest and can’t wait till the full list gets updated.

            For DietPi users (and something Jacob can maybe add to his instructions). If you want to add a cron job on DietPi to run the gravity.sh script automatically every week you have to do the following (it works different than on wheezy);

            Create and edit your weekly cron job:
            nano /etc/cron.weekly/my_cron_job

            Enter the following into your new cron job:
            #!/bin/bash

            #Run Gravity
            /usr/local/bin/gravity.sh

            #Restart dnsmasq
            service dnsmasq restart

            Give your cron job execute permissions so it can run:
            chmod +x /etc/cron.weekly/my_cron_job

            Test your cron job:
            /etc/cron.weekly/my_cron_job

            Thanks to the creator of DietPi for the instructions.

          2. Well, I cannot believe that I missed this, but if you look on line 145, 162, and 179-180, you will notice I created a variable called today, which was supposed to be included in the awk query so only results from today would be shown in the interface. I know I had it in there when I made the interface, but must have made a mistake when committing it to Github.

            Anyway, until I fix that, you can run this command once a day (probably via crontab):

            sudo truncate /var/log/pihole.log —size 0

            This will truncate the log file, which may also be beneficial to do for long time users because as the log file grows, the Web interface will have a harder and harder time parsing the logs because each time you refresh the page, it has to look through the entire log file.

            My plan for this is to move the logs somewhere else for long-term storage and then truncate the log daily. This would also be used for long-term statistics and trend data, which I hope to eventually include in the Web interface.

          3. Just FIY. It seems to work on DietPi correctly. Every new day, or after a reboot, all stats (except for the number of blocked domains) are at 0 in the web interface.

          4. Yeah, DietPi actually has it’s own log flushing mechanism, which is a feature of the OS (part of the slimmed-down version of Raspbian). So DietPi users are not seeing the same issue.

          5. I found the /var/log/pihole.log as the source and truncated it (differently though (mv /dev/null pihole.log), yours seems nicer). Would be nice to save the value first somewhere and then truncate the log, to keep ‘total’ stats available. For now I simply replaced the ‘daily’ to ‘total’ in the admin/index.php 😉

            Maybe in the future add a script that can be CRONed daily in the same manner as the the weekly update that does exactly this? (Get the totals, save it somewhere and then truncate the log). If it’s just a value (I have no idea how to do additions in text file records) you can add it quite simply in the admin interface by simply displaying the text in “/var/log/pihole-stats.log or something similar.

          6. Eventually, that is the plan–some sort of long term statistics that can be reviewed, with pretty graphs and all. ?

        2. Yes, it seems they have many different types of lists you can get. I suppose for the purposes of the Pi-hole (ad-blocking) the existing list is good. But the additional domains would also be nice to block, but maybe it is best left to people to add in themselves.

          As for the other problem, I think I will comment out the mahakala list and let users enable it themselves. As the user base has grown, there have been more and more reports of issues with the mahakala list.

          1. I would add the full list in the script if I were you as it also blocks sites “engaged in the distribution of malware”. Besides, the full list pushes the domains blocked to 380000 which is nicer for advertising pihole than 66000 (which is what you get if you disable the crappy mahakala list) ;-).

            My sources now looks like this:

            sources=(‘https://adaway.org/hosts.txt’
            ‘http://adblock.gjtech.net/?format=unix-hosts’
            ‘http://www.hostsfile.org/Downloads/hosts.txt’
            ‘http://hosts-file.net/download/hosts.txt’
            ‘http://www.malwaredomainlist.com/hostslist/hosts.txt’
            ‘http://pgl.yoyo.org/adservers/serverlist.php?’
            ‘http://someonewhocares.org/hosts/hosts’
            ‘http://winhelp2002.mvps.org/hosts.txt’
            ‘http://hosts-file.net/hphosts-partial.txt’)

            That blocks 389,877 ad and malicious domains but leaves valid domains like apple.com (which was once also blocked by the mahakala list) and http://www.microsoft.com alone.

    2. Maybe a proxy server that removes ads directly from certain major websites would be an idea for the next step in Pi ad blocking.
      Some major websites like Google place prominent ad links as their first search results. Normally those links would redirect through ad networks, and then the original link would finally load. I noticed, too, that women want to click on those links and get angry when a blank page appears.

    3. That feature is in the works. Progress on the whole project is slower than a dedicated software company since it is just me working on it. I also have a full-time job.

      I usually do a development boost when I get some donations that make the extra time worth it.

      Hang in there and the features will trickle in!

      As for the non-ad domains, most of them come from the mahakala list. I’m not sure why there are on there but you could simply add it to your /etc/pihole/whitelist.txt and then run /usr/local/bin/gravity.sh again.

  3. Hi anon,

    Thanks for your reply. I already played with your interface a bit and indeed learned that it needs additional settings to work with the pi-hole system. I was mainly curious about the reporting aspect (since pi-hole has the blocking part covered) and noticed that out of the box it was not working.

    I do believe that by changing a few items it could be used for the reporting part (provided you can fetch the data and put it into the database your system uses). As far as the blocklists are concerned there are many methods of doing exactly that, and by removing the big one (adblock.mahakala.is) that blocks a lof of real traffic the system works quite well.

    For me I am just eager to get more information (and statistics) from the system, also to purge older log files to keep the system lean, sticking them in a database seems like a perfect way to handle that.

    I appreciate the system you have setup and hope you will eventually find the time to finish the setup documents since I am eager to give it a try. For now I noticed that in a configuration file there were some references to scripts that are not available on github (or maybe just not available on a pi with pi-hole installed, I need to check that).

    Keep up the good work.

    Kind regards,

    Alex

    1. @Alex,
      Thanks for your interest. dnsgui is fundamentally incompatible with pi-hole for few reasons. I will try to outline some of the issues. its all about which system (dnsgui or pi-hole gui) purges the dnsmasque logs. with the help of cron job, dnsgui reads the logs every 30min. calculates hit count for that log then updates the databse hit count for existing urls, and adds new urls and their hit coutns. it then purges the dnsmasq logfile. at this point logs are gone! nothing left for pi-hole gui to work on. note that it is essential the that log gets purge after hit count calculated. if the logs are not purge at end of hit cout, calculations will be wrong as it continue to increase the hit counts for those entries. complications will also occure if the dnsmasq conf files are changed outside dnsgui web interface. as my database also keeps tracts which block list blocked a given url. this is a minor issue as it can be resolve by importing the conf file into the database again. My guess it eventually pi-hole have to adopt a similer system to store the processed log in some form of long-term storage. I do not know what jacob will be opting for (just simple text file vs .tar.bz2 vs sqlite). so believe it not worth the effort to change dnsgui to work on top of pi-hole. however I have few solution in mind how to get both pi-hole and dnsgui to work side by side. unfortunatly it involve alterations to both systems.

      p.s I will put the .sh script that you think is missing from git. the script will not do anyrhing with a proper visudo entry. see the readme on git.

      https://github.com/Anon135813/dnsblocker-webgui

      regards
      anon

Leave a Reply