Something on my phone is waiting for me to go to sleep to contact ad servers ATTACH
I realized last week that my phone's DNS wasn't going through pihole – part of this is Comcast xFI's fault, part of this is Android's fault, and some of it was simply my fault by making some overly-effective firewall rules to prevent my server from being an open DNS relay. Ultimately, I "fixed" it properly by pointing Tailscale's MagicDNS at one of my Tailscale nodes running a Pi Hole server and leaving Tailscale running on my phone. Bit of extra battery usage but not significant enough to care unless I'm using my server also as an Exit Node.
But then I looked at my traffic graphs:
i plugged my phone back in to my pihole and ... something is misbehaving :) making thousands of attempts to look up doubleclick and adsense.
— rrix (@rrrrrrrix) April 12, 2022pic.twitter.com/axIwC9FFfU
Something on my phone makes thousands of requests overnight to look
up googleadservices.com
and ad.doubleclick.net
… I smell an ad impression
fraud playbook, or at least someone doing too many DNS lookups. While my
pihole does have a nearly constant background chatter of ad-server
lookups on the order of 1-3 per minute, overnight these jump up to
nearly 25 per minute.
On the one hand, this is expected and accepted behavior on mobile devices despite being a huge freakin' problem, but this one in particular raised my hackles: those periods where it went crazy are more or less the exact times I was asleep, which is really disturbing that something might be monitoring me specifically to hide its activity. This is, at the very least spooky and at most a pretty blatant violation of Privacy norms. Or maybe it's only running when my phone is plugged in so that someone doesn't worry about battery usage.
But what is normal after all?
Can I figure out which app is doing this? ATTACH
I poked around in network data usage screens and battery usage screens, but of course that didn't elucidate much. Maybe tomorrow night i'll leave my phone unplugged and see if the battery usage reflects that.
But is there anything in the phone's debug logs? I ran adb logcat
and saved the output to a file for
analysis:
It's easy enough to slurp this file in to Python to analyze it. If I want to do more I can pull in Pandas and Numpy to do some jupyter style exploration, but I want to see some simple distributions of the messages. Logcat can give me some of this.
export LOG_FILE=~/org/data/20/220412T115028.569214/adb-output
adb logcat -d -v long > $LOG_FILE
adb logcat -S 2>/dev/null
None of this is particularly … exciting or elucidating is it? the biggest logging systems which jump out to me are my keyboard input method, the media scanner which gets confused trying to analyze all the files I plug in to the device via Syncthing, YouTube, my calendar app… hmm. not much "there there"
import pathlib
= pathlib.Path("data/20/220412T115028.569214/adb-output")
in_file assert in_file.exists()
= ''
output with open(in_file, "r") as f:
= f.read()
output assert len(output) > 0
= output.split("\n\n")
blocks for block in blocks[0:5]] [[block]
so using adb with -v long
like i did
breaks each message in to blocks separated by a blank line. good enough
to parse like this, especially since all I really care about is the
metadata header, usually the first line – the first record looks
corrupted but who knows.. let's see:
= []
headers import re
for block in blocks:
= re.split("\n+", block)
lines = list(filter(lambda line: line.startswith("[") and line.endswith("]"), lines))
header = ("\n".join(lines[2:]))
second_line + [second_line])
headers.append(header
# only print the uhh header
0]] for header in headers[0:10]] [[header[
Cool so now all my headers are extracted, let's parse this! it's not exactly trivial, of course these lines aren't regular, you can see that they are generally set to the same column, especially the fields I care about (time, and that last field, the emitting module). we'll be ugly about it.
import re
from datetime import datetime
= []
export
for header in headers:
try:
= header[0]
working_header except IndexError:
print("bad header: {}".format(header))
continue
= re.split(r"\s+", working_header)
parts try:
= "2022-{} {}000".format(parts[1], parts[2]) # (ref:burp)
burp = [datetime.strptime(burp, "%Y-%m-%d %H:%M:%S.%f"), parts[-2], header[-1]]
assembly
export.append(assembly)except IndexError:
continue
len(export)
I am lazy with intermediate variable names.. (burp) takes the
date and time and crams them in to a thing I can call datetime.strptime
on & parts[-2]
is the second to last element from
that header, the Android module which emitted it.
And so I want all the messages between 21:00 and 09:00… I could do
this within the logcat invocation, I probably should, but let's do some
more nasty shit with datetime
since i'm
already invoking strptime
…
= datetime(2022, 4, 11, 22, 0)
start_window = datetime(2022, 4, 12, 9, 0)
end_window def date_filter_fn(pair):
= pair[0]
dt if dt > start_window and dt < end_window:
return True
return False
= list(filter(date_filter_fn, export))
filtered len(filtered)
That's not as much I would have hoped!! Let's see where they come from
= {}
sum_map for msg in filtered:
= msg[1]
module = sum_map.get(module, 0) + 1
sum_map[module]
sum_map
So chatty is .. chatty? grr.
grep -A1 "I/chatty" $LOG_FILE | head
Well that's not so useful is it… some things are being too chatty and being elided from the logs
Same string extraction gymnastics… Tired of ugly python? so am i… enjoy an ugly shell pipeline! one day maybe i'll just learn awk outright.
grep -A1 "I/chatty" $LOG_FILE | grep expire | awk '{print $2}' | sed -e 's/:.*//' | sort | uniq -c | sort -nr
so this is a dead end to tracking this down, it seems – I'm sure I gave "informed consent" when I installed whatever app is tracking my sleep schedule to run advertising fraud schemes.
What's next, must I install network monitoring software directly on my phone to track this down?
What about dumpsys
output?
Okay that was a dead end … Poking around on some poorly stack
overflow questions and SEO'd android dev blog google results, i found
that there is a command in Android systems called dumpsys
which will … dump
system information to the terminal. In particular, it can dump
detailed network statistics broken down by the process UID responsible
for the traffic. Since my phone isn't doing a whole lot over night, in
theory those 20000+ DNS requests should wind up in here
somehow one hopes…
adb shell dumpsys netstats detail > /tmp/netstats
echo "[[file:/tmp/netstats]]"
The output of the dumpsys command is just barely structured…
what if i yaml load it lul:
import yaml
try:
with open("/tmp/netstats", "r") as f:
= yaml.safe_load(f.read())
dictmaybe except Exception:
= "lol no"
dictmaybe
dictmaybe
cool so i gotta parse this wordsoup myself
import pathlib
import re
from datetime import datetime
= pathlib.Path("/tmp/netstats").read_text().split("\n")
lines
= None
PARSE_STATE = None
curr_uid
= []
parsed_data
for line in lines:
if line.startswith("UID stats"):
= "UID"
PARSE_STATE continue
if PARSE_STATE is not None:
= re.search(r"^(?P<whitespace> +)", line)
match
if match is None:
print("nope")
= 0
level = None
PARSE_STATE = None
curr_uid continue
= int(len(match.group("whitespace")) / 2)
level if level == 1 and line.startswith(" ident"):
# extract UID
= re.search(r"uid=(?P<uid>\d+)", line)
m if not m:
pass
#print("line bad {}".format(line))
else:
= m.group("uid")
curr_uid
if level == 3:
# example line
# st=1648792800 rb=0 rp=0 tb=76 tp=1 op=0
= re.search(r"st=(?P<st>\d+) rb=(?P<rb>\d+).*tb=(?P<tb>\d+).*", line)
m if m is None:
print("uhh {}".format(line))
continue
= datetime.fromtimestamp(int(m.group("st")))
ts
"rb"), m.group("tb")])
parsed_data.append([ts, curr_uid, m.group(
len(parsed_data)
Now we can shove the dataframe in to Pandas and plot the output…
import pandas as pd
= [r[0] for r in parsed_data]
time_index = [r[1] for r in parsed_data ]
uid_index = pd.MultiIndex.from_arrays([time_index, uid_index], names=["ts", "uid"])
midx
= pd.DataFrame(data=parsed_data, columns=["ts", "uid", "rx", "tx"], index=time_index)
df = df.sort_index()
df 'ts'] = pd.to_datetime(df['ts'])
df['rx'] = pd.to_numeric(df['rx'])
df['tx'] = pd.to_numeric(df['tx'])
df['uid'] = pd.to_numeric(df['uid'])
df[ df.head()
= df.loc["2022/04/11 02:00":"2022/04/12 09:00"]
downsize downsize.head()
import matplotlib
import matplotlib.pyplot as plt
import seaborn
= downsize["uid"].unique()
uids = plt.subplots()
fig, ax "ticks")
seaborn.set_style(for uid in uids:
= downsize.loc[downsize["uid"]==uid]
this_uid_df = this_uid_df["tx"].sum()
total_bytes if total_bytes > 64000:
seaborn.lineplot(=ax,
ax=this_uid_df.groupby("ts").sum(),
data="full",
legend="tx", x="ts",
y="deep",
palette=f"{uid} ({total_bytes})",
label
)
= ax.get_legend_handles_labels()
h,l
ax.legend_.remove()
fig.autofmt_xdate()="center right", ncol=4, mode="tight", fontsize=6)
fig.legend(h, l, loc
= 'data/20/220412T115028.569214/plt1.png'
loc =matplotlib.transforms.Bbox([[0,0],[11,5]]))
fig.savefig(loc, bbox_inches loc
some interesting outliers, but no idea which… Let's see
= downsize.groupby("uid")["tx"].sum().sort_values().iloc[::-1]
maxis 0:19] maxis[
And the top 10 UIDs are….
= maxis.index[0:50].astype(str)
top10 "\n".join(top10)
10009
10316
10329
10263
10195
10282
10167
10276
10325
0
10123
10250
10125
10281
10202
1010123
1010125
10274
10218
1010228
10318
1010167
10247
10191
10264
10343
10286
10272
10307
1073
10185
10186
10285
1010205
10270
10278
1010185
10337
10287
10136
10279
10269
10204
10205
10254
10112
10459
10209
1000
1010223
Okay so … this gives me the maximum transmitted bytes in the 2 hour
sampling window, which is quite useful. Using a different dumpsys
command I can figure out which will
export information about the packages including the UID mapping.
run in shell:adb shell dumpsys package > packages_dump.txt
export PKG_FILE=data/20/220412T115028.569214/packages_dump.txt
function finduid() {
echo -ne "$1 "
(grep -F -B1 "userId=$1" $PKG_FILE || echo "unknown") | head -1
}
finduid <<top-10-network-uids()>>
Analyzing the list of "network-noisy" packages
Okay, so we've narrowed it down –
let's skim through this list and see if anything that shouldn't be there hops out at me…
com.urbandroid.sleep.addon.port
is the
sleep tracker i use, and it does make a shitload of network calls. This
isn't surprising as I have their cloud backup enabled with data going
back to 2012. I've been paying for it, so I'll be pretty sour if they're
running ad-fraud on my phone.
Tailscale, Nextcloud, Syncthing, Firefox (with ublock origin, etc addons installed), shouldn't be doing these things and will have a lot of natural network traffic. I sure hope I can trust these things as I can't feasibly replace them.
reddit.news
is Relay
for reddit which is the reddit mobile app I use , I wouldn't put it
past them or some JS loaded in a site's web frame, but it happens
regularly.
com.amtrak.rider
could be doing mean
things, their site already runs a bunch of spy-ing bullshit… but
background ad-fraud? idk.
net.daylio
is a mood/habit tracking app
which has a nightly automated backup function. I hope they're not doing
anything untoward
Some com.google
packages, they wouldn't
ad fraud themselves right?
com.jumboprivacy
which is a paid
service to run little privacy preserving scripts against web services on
your device… uh huh…
The google quick search is a bit surprising to me, I suppose.
And of course there are some things being run as root (uid 0), and some other high-level UIDs which aren't reflected in my package list….
Something approaching a conclusion
All of this leaves me where I started, confused and a bit disoriented, unsure of what is happening on my own device in my own home and basically powerless to do anything about it.
But it raises a lot of questions, of course the main one being "what did I learn"
I learned that:
- the android system environment is always doing "something", and is usually too chatty about what those "something"s are for you to be able to really see why.
adb dumpsys
has a lot of useful information about your phone including timeseries data.- there is always basically unsurprising data usage "at scale" in my phone.
Another question: did i actually learn anything useful in these stats? Remember that this all started with tracking down DNS traffic, which of course is miniscule compared to even a single JPG or /r/formula1 comment thread loaded by Relay. In reality, any app on my system could be running a campaign like this, and there are not a lot of tools to explore this data usage.
Of course, you gave your "informed consent" when you installed these apps!
What is there to do for someone who doesn't consent?
dark laughter
how do you feel about the unabomber manifesto?
…
I'll disable some of these apps or at least force close them tonight and see how many times we try to load ad-impressions overnight.
I also noticed within the dumpsys
service was a way to dump the active "activities" on an Android device.
I'll set up ADB
in a cron job and see what
sort of interesting things run overnight, perhaps. write some awful
plain-text parser again to extract the useful bits of data out of
it.
My Librem 5 will supposably ship soon, and while it'll be nice to have something approaching a libre or at least introspectable userspace, and a set of mostly functional free apps, there are parts of my life that won't fit on this device. Even if it's living in a desk drawer 18 hours a day, if it's just running weird ad campaigns while I sleep or while it charges, should I feel okay with that?
How should I?
Note to self on opening this doc:
Make sure to hack in a python for the session, don't feel like putting this somewhere for direnv
setq org-babel-python-command (concat (s-chomp (shell-command-to-string "nix-build ~/org/nix-shells/python-pandas.nix"))
("/bin/python")))