Description
The fluidity of application markets complicate smartphone security. Although recent efforts have shed light on particular security issues, there remains little insight into broader security characteristics of smartphone applications. This paper seeks to better understand smartphone application security by studying 1,100 popular free Android applications.
A Study of Android Application Security
William Enck, Damien Octeau, Patrick McDaniel, and Swarat Chaudhuri
Systems and Internet Infrastructure Security Laboratory
Department of Computer Science and Engineering
The Pennsylvania State University
{enck, octeau, mcdaniel, swarat}@cse.psu.edu
Abstract
The ?uidity of application markets complicate smart-
phone security. Although recent efforts have shed light
on particular security issues, there remains little insight
into broader security characteristics of smartphone ap-
plications. This paper seeks to better understand smart-
phone application security by studying 1,100 popular
free Android applications. We introduce the ded decom-
piler, which recovers Android application source code
directly from its installation image. We design and exe-
cute a horizontal study of smartphone applications based
on static analysis of 21 million lines of recovered code.
Our analysis uncovered pervasive use/misuse of person-
al/phone identi?ers, and deep penetration of advertising
and analytics networks. However, we did not ?nd ev-
idence of malware or exploitable vulnerabilities in the
studied applications. We conclude by considering the
implications of these preliminary ?ndings and offer di-
rections for future analysis.
1 Introduction
The rapid growth of smartphones has lead to a renais-
sance for mobile services. Go-anywhere applications
support a wide array of social, ?nancial, and enterprise
services for any user with a cellular data plan. Appli-
cation markets such as Apple’s App Store and Google’s
Android Market provide point and click access to hun-
dreds of thousands of paid and free applications. Mar-
kets streamline software marketing, installation, and
update—therein creating low barriers to bring applica-
tions to market, and even lower barriers for users to ob-
tain and use them.
The ?uidity of the markets also presents enormous se-
curity challenges. Rapidly developed and deployed ap-
plications [40], coarse permission systems [16], privacy-
invading behaviors [14, 12, 21], malware [20, 25, 38],
and limited security models [36, 37, 27] have led to ex-
ploitable phones and applications. Although users seem-
ingly desire it, markets are not in a position to provide
security in more than a super?cial way [30]. The lack of
a common de?nition for security and the volume of ap-
plications ensures that some malicious, questionable, and
vulnerable applications will ?nd their way to market.
In this paper, we broadly characterize the security of
applications in the Android Market. In contrast to past
studies with narrower foci, e.g., [14, 12], we consider a
breadth of concerns including both dangerous functional-
ity and vulnerabilities, and apply a wide range of analysis
techniques. In this, we make two primary contributions:
• We design and implement a Dalvik decompilier,
ded. ded recovers an application’s Java source
solely from its installation image by inferring lost
types, performing DVM-to-JVM bytecode retarget-
ing, and translating class and method structures.
• We analyze 21 million LOC retrieved from the top
1,100 free applications in the Android Market using
automated tests and manual inspection. Where pos-
sible, we identify root causes and posit the severity
of discovered vulnerabilities.
Our popularity-focused security analysis provides in-
sight into the most frequently used applications. Our
?ndings inform the following broad observations.
1. Similar to past studies, we found wide misuse of
privacy sensitive information—particularly phone
identi?ers and geographic location. Phone iden-
ti?ers, e.g., IMEI, IMSI, and ICC-ID, were used
for everything from “cookie-esque” tracking to ac-
counts numbers.
2. We found no evidence of telephony misuse, back-
ground recording of audio or video, abusive connec-
tions, or harvesting lists of installed applications.
3. Ad and analytic network libraries are integrated
with 51% of the applications studied, with Ad Mob
(appearing in 29.09% of apps) and Google Ads (ap-
pearing in 18.72% of apps) dominating. Many ap-
plications include more than one ad library.
4. Many developers fail to securely use Android APIs.
These failures generally fall into the classi?cation
of insuf?cient protection of privacy sensitive infor-
mation. However, we found no exploitable vulnera-
bilities that can lead malicious control of the phone.
This paper is an initial but not ?nal word on An-
droid application security. Thus, one should be cir-
cumspect about any interpretation of the following re-
sults as a de?nitive statement about how secure appli-
cations are today. Rather, we believe these results are
indicative of the current state, but there remain many
aspects of the applications that warrant deeper analy-
sis. We plan to continue with this analysis in the fu-
ture and have made the decompiler freely available at
http://siis.cse.psu.edu/ded/ to aid the broader
security community in understanding Android security.
The following sections re?ect the two thrusts of this
work: Sections 2 and 3 provide background and detail
our decompilation process, and Sections 4 and 5 detail
the application study. The remaining sections discuss our
limitations and interpret the results.
2 Background
Android: Android is an OS designed for smartphones.
Depicted in Figure 1, Android provides a sandboxed ap-
plication execution environment. A customized embed-
ded Linux system interacts with the phone hardware and
an off-processor cellular radio. The Binder middleware
and application API runs on top of Linux. To simplify,
an application’s only interface to the phone is through
these APIs. Each application is executed within a Dalvik
Virtual Machine (DVM) running under a unique UNIX
uid. The phone comes pre-installed with a selection of
system applications, e.g., phone dialer, address book.
Applications interact with each other and the phone
through different forms of IPC. Intents are typed inter-
process messages that are directed to particular appli-
cations or systems services, or broadcast to applications
subscribing to a particular intent type. Persistent content
provider data stores are queried through SQL-like inter-
faces. Background services provide RPC and callback
interfaces that applications use to trigger actions or ac-
cess data. Finally user interface activities receive named
action signals from the system and other applications.
Binder acts as a mediation point for all IPC. Access
to system resources (e.g., GPS receivers, text messag-
ing, phone services, and the Internet), data (e.g., address
books, email) and IPC is governed by permissions as-
signed at install time. The permissions requested by the
application and the permissions required to access the
application’s interfaces/data are de?ned in its manifest
?le. To simplify, an application is allowed to access a
resource or interface if the required permission allows
Installed Applications
Embedded Linux
Cellular
Radio
Binder
D
V
M
A
p
p
l
i
c
a
t
i
o
n
D
V
M
A
p
p
l
i
c
a
t
i
o
n
D
V
M
A
p
p
l
i
c
a
t
i
o
n
D
V
M
A
p
p
l
i
c
a
t
i
o
n
System
Applications
D
V
M
A
p
p
l
i
c
a
t
i
o
n
D
V
M
A
p
p
l
i
c
a
t
i
o
n
D
V
M
A
p
p
l
i
c
a
t
i
o
n
GPS
Receiver
Bluetooth
Display
Figure 1: The Android system architecture
it. Permission assignment—and indirectly the security
policy for the phone—is largely delegated to the phone’s
owner: the user is presented a screen listing the permis-
sions an application requests at install time, which they
can accept or reject.
Dalvik Virtual Machine: Android applications are writ-
ten in Java, but run in the DVM. The DVMand Java byte-
code run-time environments differ substantially:
Application Structure. Java applications are composed
of one or more .class ?les, one ?le per class. The JVM
loads the bytecode for a Java class from the associated
.class ?le as it is referenced at run time. Conversely, a
Dalvik application consists of a single .dex ?le contain-
ing all application classes.
Figure 2 provides a conceptual view of the compila-
tion process for DVM applications. After the Java com-
piler creates JVM bytecode, the Dalvik dx compiler con-
sumes the .class ?les, recompiles them to Dalvik byte-
code, and writes the resulting application into a single
.dex ?le. This process consists of the translation, recon-
struction, and interpretation of three basic elements of
the application: the constant pools, the class de?nitions,
and the data segment. A constant pool describes, not sur-
prisingly, the constants used by a class. This includes,
among other items, references to other classes, method
names, and numerical constants. The class de?nitions
consist in the basic information such as access ?ags and
class names. The data element contains the method code
executed by the target VM, as well as other information
related to methods (e.g., number of DVM registers used,
local variable table, and operand stack sizes) and to class
and instance variables.
Register architecture. The DVM is register-based,
whereas existing JVMs are stack-based. Java bytecode
can assign local variables to a local variable table before
pushing them onto an operand stack for manipulation by
opcodes, but it can also just work on the stack without
explicitly storing variables in the table. Dalvik bytecode
assigns local variables to any of the 2
16
available regis-
ters. The Dalvik opcodes directly manipulate registers,
rather than accessing elements on a program stack.
Instruction set. The Dalvik bytecode instruction set is
Class1.class
Data
Class Info
Constant Pool
ClassN.class
Data
Class Info
Constant Pool
.dex ?le
Header
Constant Pool
Data
ClassN de?nition
Class1 de?nition
Java
Compiler
dx
Java
Source Code
(.java ?les)
Figure 2: Compilation process for DVM applications
substantially different than that of Java. Dalvik has 218
opcodes while Java has 200; however, the nature of the
opcodes is very different. For example, Java has tens
of opcodes dedicated to moving elements between the
stack and local variable table. Dalvik instructions tend to
be longer than Java instructions; they often include the
source and destination registers. As a result, Dalvik ap-
plications require fewer instructions. In Dalvik bytecode,
applications have on average 30%fewer instructions than
in Java, but have a 35% larger code size (bytes) [9].
Constant pool structure. Java applications replicate ele-
ments in constant pools within the multiple .class ?les,
e.g., referrer and referent method names. The dx com-
piler eliminates much of this replication. Dalvik uses a
single pool that all classes simultaneously reference. Ad-
ditionally, dx eliminates some constants by inlining their
values directly into the bytecode. In practice, integers,
long integers, and single and double precision ?oating-
point elements disappear during this process.
Control ?ow Structure. Control ?ow elements such
as loops, switch statements and exception handlers are
structured differently in Dalvik and Java bytecode. Java
bytecode structure loosely mirrors the source code,
whereas Dalvik bytecode does not.
Ambiguous primitive types. Java bytecode vari-
able assignments distinguish between integer (int) and
single-precision ?oating-point (float) constants and be-
tween long integer (long) and double-precision ?oating-
point (double) constants. However, Dalvik assignments
(int/float and long/double) use the same opcodes for
integers and ?oats, e.g., the opcodes are untyped beyond
specifying precision.
Null references. The Dalvik bytecode does not specify
a null type, instead opting to use a zero value constant.
Thus, constant zero values present in the Dalvik byte-
code have ambiguous typing that must be recovered.
Comparison of object references. The Java bytecode
uses typed opcodes for the comparison of object refer-
ences (if acmpeq and if acmpne) and for null compar-
ison of object references (ifnull and ifnonnull). The
Dalvik bytecode uses a more simplistic integer compar-
ison for these purposes: a comparison between two in-
tegers, and a comparison of an integer and zero, respec-
tively. This requires the decompilation process to recover
types for integer comparisons used in DVM bytecode.
Storage of primitive types in arrays. The Dalvik byte-
code uses ambiguous opcodes to store and retrieve el-
ements in arrays of primitive types (e.g., aget for in-
t/?oat and aget-wide for long/double) whereas the cor-
responding Java bytecode is unambiguous. The array
type must be recovered for correct translation.
3 The ded decompiler
Building a decompiler from DEX to Java for the study
proved to be surprisingly challenging. On the one hand,
Java decompilation has been studied since the 1990s—
tools such as Mocha [5] date back over a decade, with
many other techniques being developed [39, 32, 31, 4,
3, 1]. Unfortunately, prior to our work, there existed no
functional tool for the Dalvik bytecode.
1
Because of the
vast differences between JVM and DVM, simple modi?-
cation of existing decompilers was not possible.
This choice to decompile the Java source rather than
operate on the DEX opcodes directly was grounded in
two reasons. First, we wanted to leverage existing tools
for code analysis. Second, we required access to source
code to identify false-positives resulting from automated
code analysis, e.g., perform manual con?rmation.
ded extraction occurs in three stages: a) retarget-
ing, b) optimization, and c) decompilation. This sec-
tion presents the challenges and process of ded, and con-
cludes with a brief discussion of its validation. Interested
readers are referred to [35] for a thorough treatment.
3.1 Application Retargeting
The initial stage of decompilation retargets the applica-
tion .dex ?le to Java classes. Figure 3 overviews this
process: (1) recovering typing information, (2) translat-
ing the constant pool, and (3) retargeting the bytecode.
Type Inference: The ?rst step in retargeting is to iden-
tify class and method constants and variables. However,
the Dalvik bytecode does not always provide enough in-
formation to determine the type of a variable or constant
from its register declaration. There are two generalized
cases where variable types are ambiguous: 1) constant
and variable declaration only speci?es the variable width
(e.g., 32 or 64 bits), but not whether it is a ?oat, integer,
or null reference; and 2) comparison operators do not
distinguish between integer and object reference compar-
ison (i.e., null reference checks).
Type inference has been widely studied [44]. The sem-
inal Hindley-Milner [33] algorithm provides the basis for
type inference algorithms used by many languages such
(1) DEX Parsing
(2) Java .class
Conversion
(3) Java .class
Optimization
Missing Type
Inference
Constant Pool
Conversion
Method Code
Retargeting
CFG
Construction
Type Inference
Processing
Constant
Identi?cation
Constant Pool
Translation
Bytecode
Reorganization
Instruction Set
Translation
Figure 3: Dalvik bytecode retargeting
as Haskell and ML. These approaches determine un-
known types by observing how variables are used in op-
erations with known type operands. Similar techniques
are used by languages with strong type inference, e.g.,
OCAML, as well weaker inference, e.g., Perl.
ded adopts the accepted approach: it infers register
types by observing how they are used in subsequent op-
erations with known type operands. Dalvik registers
loosely correspond to Java variables. Because Dalvik
bytecode reuses registers whose variables are no longer
in scope, we must evaluate the register type within its
context of the method control ?ow, i.e., inference must
be path-sensitive. Note further that ded type inference is
also method-local. Because the types of passed param-
eters and return values are identi?ed by method signa-
tures, there is no need to search outside the method.
There are three ways ded infers a register’s type. First,
any comparison of a variable or constant with a known
type exposes the type. Comparison of dissimilar types
requires type coercion in Java, which is propagated to
the Dalvik bytecode. Hence legal Dalvik comparisons al-
ways involve registers of the same type. Second, instruc-
tions such as add-int only operate on speci?c types,
manifestly exposing typing information. Third, instruc-
tions that pass registers to methods or use a return value
expose the type via the method signature.
The ded type inference algorithm proceeds as follows.
After reconstructing the control ?ow graph, ded identi-
?es any ambiguous register declaration. For each such
register, ded walks the instructions in the control ?ow
graph starting from its declaration. Each branch of the
control ?ow encountered is pushed onto an inference
stack, e.g., ded performs a depth-?rst search of the con-
trol ?ow graph looking for type-exposing instructions. If
a type-exposing instruction is encountered, the variable
is labeled and the process is complete for that variable.
2
There are three events that cause a branch search to ter-
minate: a) when the register is reassigned to another vari-
able (e.g., a new declaration is encountered), b) when a
return function is encountered, and c) when an exception
is thrown. After a branch is abandoned, the next branch
is popped off the stack and the search continues. Lastly,
type information is forward propagated, modulo register
reassignment, through the control ?ow graph from each
register declaration to all subsequent ambiguous uses.
This algorithm resolves all ambiguous primitive types,
except for one isolated case when all paths leading to
a type ambiguous instruction originate with ambiguous
constant instructions (e.g., all paths leading to an integer
comparison originate with registers assigned a constant
zero). In this case, the type does not impact decompila-
tion, and a default type (e.g., integer) can be assigned.
Constant Pool Conversion: The .dex and .class ?le
constant pools differ in that: a) Dalvik maintains a sin-
gle constant pool for the application and Java maintains
one for each class, and b) Dalvik bytecode places primi-
tive type constants directly in the bytecode, whereas Java
bytecode uses the constant pool for most references. We
convert constant pool information in two steps.
The ?rst step is to identify which constants are needed
for a .class ?le. Constants include references to
classes, methods, and instance variables. ded traverses
the bytecode for each method in a class, noting such ref-
erences. ded also identi?es all constant primitives.
Once ded identi?es the constants required by a class,
it adds them to the target .class ?le. For primitive type
constants, new entries are created. For class, method,
and instance variable references, the created Java con-
stant pool entries are based on the Dalvik constant pool
entries. The constant pool formats differ in complex-
ity. Speci?cally, Dalvik constant pool entries use sig-
ni?cantly more references to reduce memory overhead.
Method Code Retargeting: The ?nal stage of the re-
targeting process is the translation of the method code.
First, we preprocess the bytecode to reorganize structures
that cannot be directly retargeted. Second, we linearly
traverse the DVM bytecode and translate to the JVM.
The preprocessing phase addresses multidimensional
arrays. Both Dalvik and Java use blocks of bytecode
instructions to create multidimensional arrays; however,
the instructions have different semantics and layout. ded
reorders and annotates the bytecode with array size and
type information for translation.
The bytecode translation linearly processes each
Dalvik instruction. First, ded maps each referenced reg-
ister to a Java local variable table index. Second, ded
performs an instruction translation for each encountered
Dalvik instruction. As Dalvik bytecode is more compact
and takes more arguments, one Dalvik instruction fre-
quently expands to multiple Java instructions. Third, ded
patches the relative offsets used for branches based on
preprocessing annotations. Finally, ded de?nes excep-
tion tables that describe try/catch/finally blocks.
The resulting translated code is combined with the con-
stant pool to creates a legal Java .class ?le.
The following is an example translation for add-int:
Dalvik Java
add-int d
0
, s
0
, s
1
iload s
0
iload s
1
iadd
istore d
0
where ded creates a Java local variable for each regis-
ter, i.e., d
0
? d
0
, s
0
? s
0
, etc. The translation creates
four Java instructions: two to push the variables onto the
stack, one to add, and one to pop the result.
3.2 Optimization and Decompilation
At this stage, the retargeted .class ?les can be de-
compiled using existing tools, e.g., Fern?ower [1] or
Soot [45]. However, ded’s bytecode translation process
yields unoptimized Java code. For example, Java tools
often optimize out unnecessary assignments to the local
variable table, e.g., unneeded return values. Without op-
timization, decompiled code is complex and frustrates
analysis. Furthermore, artifacts of the retargeting pro-
cess can lead to decompilation errors in some decompil-
ers. The need for bytecode optimization is easily demon-
strated by considering decompiled loops. Most decom-
pilers convert for loops into in?nite loops with break
instructions. While the resulting source code is func-
tionally equivalent to the original, it is signi?cantly more
dif?cult to understand and analyze, especially for nested
loops. Thus, we use Soot as a post-retargeting optimizer.
While Soot is centrally an optimization tool with the abil-
ity to recover source code in most cases, it does not pro-
cess certain legal program idioms (bytecode structures)
generated by ded. In particular, we encountered two
central problems involving, 1) interactions between syn-
chronized blocks and exception handling, and 2) com-
plex control ?ows caused by break statements. While the
Java bytecode generated by ded is legal, the source code
failure rate reported in the following section is almost en-
tirely due to Soot’s inability to extract source code from
these two cases. We will consider other decompilers in
future work, e.g., Jad [4], JD [3], and Fern?ower [1].
3.3 Source Code Recovery Validation
We have performed extensive validation testing of
ded [35]. The included tests recovered the source code
for small, medium and large open source applications
and found no errors in recovery. In most cases the recov-
ered code was virtually indistinguishable from the origi-
nal source (modulo comments and method local-variable
names, which are not included in the bytecode).
Table 1: Studied Applications (from Android Market)
Total Retargeted Decompiled
Category Classes Classes Classes LOC
Comics 5627 99.54% 94.72% 415625
Communication 23000 99.12% 92.32% 1832514
Demo 8012 99.90% 94.75% 830471
Entertainment 10300 99.64% 95.39% 709915
Finance 18375 99.34% 94.29% 1556392
Games (Arcade) 8508 99.27% 93.16% 766045
Games (Puzzle) 9809 99.38% 94.58% 727642
Games (Casino) 10754 99.39% 93.38% 985423
Games (Casual) 8047 99.33% 93.69% 681429
Health 11438 99.55% 94.69% 847511
Lifestyle 9548 99.69% 95.30% 778446
Multimedia 15539 99.20% 93.46% 1323805
News/Weather 14297 99.41% 94.52% 1123674
Productivity 14751 99.25% 94.87% 1443600
Reference 10596 99.69% 94.87% 887794
Shopping 15771 99.64% 96.25% 1371351
Social 23188 99.57% 95.23% 2048177
Libraries 2748 99.45% 94.18% 182655
Sports 8509 99.49% 94.44% 651881
Themes 4806 99.04% 93.30% 310203
Tools 9696 99.28% 95.29% 839866
Travel 18791 99.30% 94.47% 1419783
Total 262110 99.41% 94.41% 21734202
We also used ded to recover the source code for the
top 50 free applications (as listed by the Android Market)
from each of the 22 application categories—1,100 in to-
tal. The application images were obtained from the mar-
ket using a custom retrieval tool on September 1, 2010.
Table 1 lists decompilation statistics. The decompilation
of all 1,100 applications took 497.7 hours (about 20.7
days) of compute time. Soot dominated the processing
time: 99.97% of the total time was devoted to Soot opti-
mization and decompilation. The decompilation process
was able to recover over 247 thousand classes spread
over 21.7 million lines of code. This represents about
94% of the total classes in the applications. All decom-
pilation errors are manifest during/after decompilation,
and thus are ignored for the study reported in the latter
sections. There are two categories of failures:
Retargeting Failures. 0.59% of classes were not retar-
geted. These errors fall into three classes: a) unresolved
references which prevent optimization by Soot, b) type
violations caused by Android’s dex compiler and c) ex-
tremely rare cases in which ded produces illegal byte-
code. Recent efforts have focused on improving opti-
mization, as well as redesigning ded with a formally de-
?ned type inference apparatus. Parallel work on improv-
ing ded has been able to reduce these errors by a third,
and we expect further improvements in the near future.
Decompilation Failures. 5% of the classes were suc-
cessfully retargeted, but Soot failed to recover the source
code. Here we are limited by the state of the art in de-
compilation. In order to understand the impact of de-
compiling ded retargeted classes verses ordinary Java
.class ?les, we performed a parallel study to evaluate
Soot on Java applications generated with traditional Java
compilers. Of 31,553 classes from a variety of packages,
Soot was able to decompile 94.59%, indicating we can-
not do better while using Soot for decompilation.
A possible way to improve this is to use a different de-
compiler. Since our study, Fern?ower [1] was available
for a short period as part of a beta test. We decompiled
the same 1,100 optimized applications using Fern?ower
and had a recovery rate of 98.04% of the 1.65 million
retargeted methods–a signi?cant improvement. Future
studies will investigate the ?delity of Fern?ower’s output
and its appropriateness as input for program analysis.
4 Evaluating Android Security
Our Android application study consisted of a broad range
of tests focused on three kinds of analysis: a) exploring
issues uncovered in previous studies and malware advi-
sories, b) searching for general coding security failures,
and c) exploring misuse/security failures in the use of
Android framework. The following discusses the pro-
cess of identifying and encoding the tests.
4.1 Analysis Speci?cation
We used four approaches to evaluate recovered source
code: control ?ow analysis, data ?ow analysis, struc-
tural analysis, and semantic analysis. Unless otherwise
speci?ed, all tests used the Fortify SCA [2] static anal-
ysis suite, which provides these four types of analysis.
The following discusses the general application of these
approaches. The details for our analysis speci?cations
can be found in the technical report [15].
Control ?ow analysis. Control ?ow analysis imposes
constraints on the sequences of actions executed by an
input program P, classifying some of them as errors. Es-
sentially, a control ?ow rule is an automaton A whose
input words are sequences of actions of P—i.e., the rule
monitors executions of P. An erroneous action sequence
is one that drives A into a prede?ned error state. To stat-
ically detect violations speci?ed by A, the program anal-
ysis traces each control ?ow path in the tool’s model of
P, synchronously “executing” A on the actions executed
along this path. Since not all control ?ow paths in the
model are feasible in concrete executions of P, false pos-
itives are possible. False negatives are also possible in
principle, though uncommon in practice. Figure 4 shows
an example automaton for sending intents. Here, the er-
ror state is reached if the intent contains data and is sent
unprotected without specifying the target component, re-
sulting in a potential unintended information leakage.
init
p1
p2
p3
p4
p5
p6
p1 = i.$new_class(...)
p2 = i.$new(...) |
i.$new_action(...)
p3 = i.$set_class(...) |
i.$set_component(...)
p4 = i.$put_extra(...)
p5 = i.$set_class(...) |
i.$set_component(...)
p6 = $unprotected_send(i) |
$protected_send(i, null)
targeted error
empty has_data
Figure 4: Example control ?ow speci?cation
Data ?ow analysis. Data ?ow analysis permits the
declarative speci?cation of problematic data ?ows in the
input program. For example, an Android phone contains
several pieces of private information that should never
leave the phone: the user’s phone number, IMEI (device
ID), IMSI (subscriber ID), and ICC-ID (SIM card serial
number). In our study, we wanted to check that this infor-
mation is not leaked to the network. While this property
can in principle be coded using automata, data ?ow spec-
i?cation allows for a much easier encoding. The speci?-
cation declaratively labels program statements matching
certain syntactic patterns as data ?ow sources and sinks.
Data ?ows between the sources and sinks are violations.
Structural analysis. Structural analysis allows for
declarative pattern matching on the abstract syntax of
the input source code. Structural analysis speci?cations
are not concerned with program executions or data ?ow,
therefore, analysis is local and straightforward. For ex-
ample, in our study, we wanted to specify a bug pattern
where an Android application mines the device ID of the
phone on which it runs. This pattern was de?ned using
a structural rule that stated that the input program called
a method getDeviceId() whose enclosing class was an-
droid.telephony.TelephonyManager.
Semantic analysis. Semantic analysis allows the speci?-
cation of a limited set of constraints on the values used by
the input program. For example, a property of interest in
our study was that an Android application does not send
SMS messages to hard-coded targets. To express this
property, we de?ned a pattern matching calls to Android
messaging methods such as sendTextMessage(). Seman-
tic speci?cations permit us to directly specify that the
?rst parameter in these calls (the phone number) is not
a constant. The analyzer detects violations to this prop-
erty using constant propagation techniques well known
in program analysis literature.
4.2 Analysis Overview
Our analysis covers both dangerous functionality and
vulnerabilities. Selecting the properties for study was a
signi?cant challenge. For brevity, we only provide an
overview of the speci?cations. The technical report [15]
provides a detailed discussion of speci?cations.
Misuse of Phone Identi?ers (Section 5.1.1). Previous
studies [14, 12] identi?ed phone identi?ers leaking to re-
mote network servers. We seek to identify not only the
existence of data ?ows, but understand why they occur.
Exposure of Physical Location (Section 5.1.2). Previous
studies [14] identi?ed location exposure to advertisement
servers. Many applications provide valuable location-
aware utility, which may be desired by the user. By man-
ually inspecting code, we seek to identify the portion of
the application responsible for the exposure.
Abuse of Telephony Services (Section 5.2.1). Smart-
phone malware has sent SMS messages to premium-rate
numbers. We study the use of hard-coded phone num-
bers to identify SMS and voice call abuse.
Eavesdropping on Audio/Video (Section 5.2.2). Audio
and video eavesdropping is a commonly discussed smart-
phone threat [41]. We examine cases where applications
record audio or video without control ?ows to UI code.
Botnet Characteristics (Sockets) (Section 5.2.3). PC
botnet clients historically use non-HTTP ports and pro-
tocols for command and control. Most applications use
HTTP client wrappers for network connections, there-
fore, we examine Socket use for suspicious behavior.
Harvesting Installed Applications (Section 5.2.4). The
list of installed applications is a valuable demographic
for marketing. We survey the use of APIs to retrieve this
list to identify harvesting of installed applications.
Use of Advertisement Libraries (Section 5.3.1). Pre-
vious studies [14, 12] identi?ed information exposure to
ad and analytics networks. We survey inclusion of ad and
analytics libraries and the information they access.
Dangerous Developer Libraries (Section 5.3.2). During
our manual source code inspection, we observed danger-
ous functionality replicated between applications. We re-
port on this replication and the implications.
Android-speci?c Vulnerabilities (Section 5.4). We
search for non-secure coding practices [17, 10], includ-
ing: writing sensitive information to logs, unprotected
broadcasts of information, IPC null checks, injection at-
tacks on intent actions, and delegation.
General Java Application Vulnerabilities. We look for
general Java application vulnerabilities, including mis-
use of passwords, misuse of cryptography, and tradi-
tional injection vulnerabilities. Due to space limitations,
individual results for the general vulnerability analysis
are reported in the technical report [15].
5 Application Analysis Results
In this section, we document the program analysis results
and manual inspection of identi?ed violations.
Table 2: Access of Phone Identi?er APIs
Identi?er # Calls # Apps # w/ Permission
?
Phone Number 167 129 105
IMEI 378 216 184
†
IMSI 38 30 27
ICC-ID 33 21 21
Total Unique - 246 210
†
?
De?ned as having the READ_PHONE_STATE permission.
†
Only 1 app did not also have the INTERNET permission.
5.1 Information Misuse
In this section, we explore how sensitive information is
being leaked [12, 14] through information sinks includ-
ing OutputStream objects retrieved from URLConnec-
tions, HTTP GET and POST parameters in HttpClient
connections, and the string used for URL objects. Future
work may also include SMS as a sink.
5.1.1 Phone Identi?ers
We studied four phone identi?ers: phone number, IMEI
(device identi?er), IMSI (subscriber identi?er), and ICC-
ID (SIM card serial number). We performed two types of
analysis: a) we scanned for APIs that access identi?ers,
and b) we used data ?ow analysis to identify code capa-
ble of sending the identi?ers to the network.
Table 2 summarizes APIs calls that receive phone
identi?ers. In total, 246 applications (22.4%) included
code to obtain a phone identi?er; however, only 210 of
these applications have the READ_PHONE_STATE permis-
sion required to obtain access. Section 5.3 discusses code
that probes for permissions. We observe from Table 2
that applications most frequently access the IMEI (216
applications, 19.6%). The phone number is used second
most (129 applications, 11.7%). Finally, the IMSI and
ICC-ID are very rarely used (less than 3%).
Table 3 indicates the data ?ows that ex?ltrate phone
identi?ers. The 33 applications have the INTERNET
permission, but 1 application does not have the READ_
PHONE_STATE permission. We found data ?ows for all
four identi?er types: 25 applications have IMEI data
?ows; 10 applications have phone number data ?ows;
5 applications have IMSI data ?ows; and 4 applications
have ICC-ID data ?ows.
To gain a better understanding of how phone identi-
?ers are used, we manually inspected all 33 identi?ed ap-
plications, as well as several additional applications that
contain calls to identi?er APIs. We con?rmed ex?ltration
for all but one application. In this case, code complexity
hindered manual con?rmation; however we identi?ed a
different data ?ow not found by program analysis. The
analysis informs the following ?ndings.
Finding 1 - Phone identi?ers are frequently leaked
through plaintext requests. Most sinks are HTTP
GET or POST parameters. HTTP parameter names
Table 3: Detected Data Flows to Network Sinks
Phone Identi?ers Location Info.
Sink # Flows # Apps # Flows # Apps
OutputStream 10 9 0 0
HttpClient Param 24 9 12 4
URL Object 59 19 49 10
Total Unique - 33 - 13
for the IMEI include: “uid,” “user-id,” “imei,” “devi-
ceId,” “deviceSerialNumber,” “devicePrint,” “X-DSN,”
and “uniquely code”; phone number names include
“phone” and “mdn”; and IMSI names include “did” and
“imsi.” In one case we identi?ed an HTTP parameter for
the ICC-ID, but the developer mislabeled it “imei.”
Finding 2 - Phone identi?ers are used as device ?n-
gerprints. Several data ?ows directed us towards code
that reports not only phone identi?ers, but also other
phone properties to a remote server. For example, a wall-
paper application (com.eoeandroid.eWallpapers.cartoon)
contains a class named SyncDeviceInfosService that col-
lects the IMEI and attributes such as the OS ver-
sion and device hardware. The method sendDevice-
Infos() sends this information to a server. In an-
other application (com.avantar.wny), the method Phon-
eStats.toUrlFormatedString() creates a URL parameter
string containing the IMEI, device model, platform, and
application name. While the intent is not clear, such ?n-
gerprinting indicates that phone identi?ers are used for
more than a unique identi?er.
Finding 3 - Phone identi?ers, speci?cally the IMEI,
are used to track individual users. Several
applications contain code that binds the IMEI as
a unique identi?er to network requests. For ex-
ample, some applications (e.g. com.Qunar and
com.nextmobileweb.craigsphone) appear to bundle the
IMEI in search queries; in a travel application
(com.visualit.tubeLondonCity), the method refreshLive-
Info() includes the IMEI in a URL; and a “keyring” appli-
cation (com.froogloid.kring.google.zxing.client.android)
appends the IMEI to a variable named retailer-
LookupCmd. We also found functionality that in-
cludes the IMEI when checking for updates (e.g.,
com.webascender.callerid, which also includes the
phone number) and retrieving advertisements (see Find-
ing 6). Furthermore, we found two applications
(com.taobo.tao and raker.duobao.store) with network ac-
cess wrapper methods that include the IMEI for all con-
nections. These behaviors indicate that the IMEI is used
as a form of “tracking cookie”.
Finding 4 - The IMEI is tied to personally identi?-
able information (PII). The common belief that the
IMEI to phone owner mapping is not visible outside
the cellular network is no longer true. In several
cases, we found code that bound the IMEI to account
information and other PII. For example, applications
(e.g. com.slacker.radio and com.statefarm.pocketagent)
include the IMEI in account registration and login re-
quests. In another application (com.amazon.mp3), the
method linkDevice() includes the IMEI. Code inspec-
tion indicated that this method is called when the user
chooses to “Enter a claim code” to redeem gift cards.
We also found IMEI use in code for sending comments
and reporting problems (e.g., com.morbe.guarder and
com.fm207.discount). Finally, we found one application
(com.andoop.highscore) that appears to bundle the IMEI
when submitting high scores for games. Thus, it seems
clear that databases containing mappings between phys-
ical users and IMEIs are being created.
Finding 5 - Not all phone identi?er use leads to ex?ltra-
tion. Several applications that access phone identi?ers
did not ex?ltrate the values. For example, one applica-
tion (com.amazon.kindle) creates a device ?ngerprint for
a veri?cation check. The ?ngerprint is kept in “secure
storage” and does not appear to leave the phone. An-
other application (com.match.android.matchmobile) as-
signs the phone number to a text ?eld used for account
registration. While the value is sent to the network dur-
ing registration, the user can easily change or remove it.
Finding 6 - Phone identi?ers are sent to advertise-
ment and analytics servers. Many applications have
custom ad and analytics functionality. For example,
in one application (com.accuweather.android), the class
ACCUWX AdRequest is an IMEI data ?owsink. Another
application (com.amazon.mp3) de?nes Android service
component AndroidMetricsManager, which is an IMEI
data ?ow sink. Phone identi?er data ?ows also occur
in ad libraries. For example, we found a phone num-
ber data ?ow sink in the com/wooboo/adlib_android
library used by several applications (e.g., cn.ecook,
com.superdroid.sqd, and com.superdroid.ewc). Sec-
tion 5.3 discusses ad libraries in more detail.
5.1.2 Location Information
Location information is accessed in two ways: (1) calling
getLastKnownLocation(), and (2) de?ning callbacks in
a LocationListener object passed to requestLocationUp-
dates(). Due to code recovery failures, not all Location-
Listener objects have corresponding requestLocationUp-
dates() calls. We scanned for all three constructs.
Table 4 summarizes the access of location informa-
tion. In total, 505 applications (45.9%) attempt to access
location, only 304 (27.6%) have the permission to do so.
This difference is likely due to libraries that probe for
permissions, as discussed in Section 5.3. The separa-
tion between LocationListener and requestLocationUp-
dates() is primarily due to the AdMob library, which de-
?ned the former but has no calls to the latter.
Table 4: Access of Location APIs
Identi?er # Uses # Apps # w/ Perm.
?
getLastKnownLocation 428 204 148
LocationListener 652 469 282
requestLocationUpdates 316 146 128
Total Unique - 505 304
†
?
De?ned as having a LOCATION permission.
†
In total, 5 apps did not also have the INTERNET permission.
Table 3 shows detected location data ?ows to the net-
work. To overcome missing code challenges, the data
?ow source was de?ned as the getLatitude() and getLon-
gitude() methods of the Location object retrieved from
the location APIs. We manually inspected the 13 appli-
cations with location data ?ows. Many data ?ows ap-
peared to re?ect legitimate uses of location for weather,
classi?eds, points of interest, and social networking ser-
vices. Inspection of the remaining applications informs
the following ?ndings:
Finding 7 - The granularity of location reporting may
not always be obvious to the user. In one applica-
tion (com.andoop.highscore) both the city/country and
geographic coordinates are sent along with high scores.
Users may be aware of regional geographic information
associated with scores, but it was unclear if users are
aware that precise coordinates are also used.
Finding 8 - Location information is sent to advertise-
ment servers. Several location data ?ows appeared to
terminate in network connections used to retrieve ads.
For example, two applications (com.avantar.wny and
com.avantar.yp) appended the location to the variable
webAdURLString. Motivated by [14], we inspected the
AdMob library to determine why no data ?ow was found
and determined that source code recovery failures led to
the false negatives. Section 5.3 expands on ad libraries.
5.2 Phone Misuse
This section explores misuse of the smartphone inter-
faces, including telephony services, background record-
ing of audio and video, sockets, and accessing the list of
installed applications.
5.2.1 Telephony Services
Smartphone malware can provide direct compensation
using phone calls or SMS messages to premium-rate
numbers [18, 25]. We de?ned three queries to identify
such malicious behavior: (1) a constant used for the SMS
destination number; (2) creation of URI objects with a
“tel:” pre?x (used for phone call intent messages) and
the string “900” (a premium-rate number pre?x in the
US); and (3) any URI objects with a “tel:” pre?x. The
analysis informs the following ?ndings.
Finding 9 - Applications do not appear to be using ?xed
phone number services. We found zero applications us-
ing a constant destination number for the SMS API.
Note that our analysis speci?cation is limited to constants
passed directly to the API and ?nal variables, and there-
fore may have false negatives. We found two applica-
tions creating URI objects with the “tel:” pre?x and
containing the string “900”. One application included
code to call “tel://0900-9292”, which is a premium-
rate number (e0.70 per minute) for travel advice in the
Netherlands. However, this did not appear malicious, as
the application (com.Planner9292) is designed to provide
travel advice. The other application contained several
hard-coded numbers with “900” in the last four digits
of the number. The SMS and premium-rate analysis re-
sults are promising indicators for non-existence of ma-
licious behavior. Future analysis should consider more
premium-rate pre?xes.
Finding 10 - Applications do not appear to be misus-
ing voice services. We found 468 URI objects with
the “tel:” pre?x in 358 applications. We manually
inspected a sample of applications to better understand
phone number use. We found: (1) applications fre-
quently include call functionality for customer service;
(2) the “CALL” and “DIAL” intent actions were used
equally for the same purpose (CALL calls immediately
and requires the CALL_PHONE permission, whereas DIAL
has user con?rmation the dialer and requires no permis-
sion); and (3) not all hard-coded telephone numbers are
used to make phone calls, e.g., the AdMob library had a
apparently unused phone number hard coded.
5.2.2 Background Audio/Video
Microphone and camera eavesdropping on smartphones
is a real concern [41]. We analyzed application eaves-
dropping behaviors, speci?cally: (1) recording video
without calling setPreviewDisplay() (this API is always
required for still image capture); (2) AudioRecord.read()
in code not reachable from an Android activity compo-
nent; and (3) MediaRecorder.start() in code not reach-
able from an activity component.
Finding 11 - Applications do not appear to be misusing
video recording. We found no applications that record
video without calling setPreviewDisplay(). The query
reasonably did not consider the value passed to the pre-
view display, and therefore may create false negatives.
For example, the “preview display” might be one pixel
in size. The MediaRecorder.start() query detects audio
recording, but it also detects video recording. This query
found two applications using video in code not reachable
from an activity; however the classes extended Surface-
View, which is used by setPreviewDisplay().
Finding 12 - Applications do not appear to be misus-
ing audio recording. We found eight uses in seven ap-
plications of AudioRecord.read() without a control ?ow
path to an activity component. Of these applications,
three provide VoIP functionality, two are games that re-
peat what the user says, and one provides voice search.
In these applications, audio recording is expected; the
lack of reachability was likely due to code recovery fail-
ures. The remaining application did not have the required
RECORD_AUDIO permission and the code most likely was
part of a developer toolkit. The MediaRecorder.start()
query identi?ed an additional ?ve applications recording
audio without reachability to an activity. Three of these
applications have legitimate reasons to record audio:
voice search, game interaction, and VoIP. Finally, two
games included audio recording in a developer toolkit,
but no record permission, which explains the lack of
reachability. Section 5.3.2 discusses developer toolkits.
5.2.3 Socket API Use
Java sockets represent an open interface to external ser-
vices, and thus are a potential source of malicious be-
havior. For example, smartphone-based botnets have
been found to exist on “jailbroken” iPhones [8]. We ob-
serve that most Internet-based smartphone applications
are HTTP clients. Android includes useful classes (e.g.,
HttpURLConnection and HttpClient) for communicating
with Web servers. Therefore, we queried for applications
that make network connections using the Socket class.
Finding 13 - A small number of applications include
code that uses the Socket class directly. We found
177 Socket connections in 75 applications (6.8%). Many
applications are ?agged for inclusion of well-known
network libraries such as org/apache/thrift, org/
apache/commons, and org/eclipse/jetty, which
use sockets directly. Socket factories were also detected.
Identi?ed factory names such as TrustAllSSLSocket-
Factory, AllTrustSSLSocketFactory, and NonValidat-
ingSSLSocketFactory are interesting as potential vulnera-
bilities, but we found no evidence of malicious use. Sev-
eral applications also included their own HTTP wrapper
methods that duplicate functionality in the Android li-
braries, but did not appear malicious. Among the appli-
cations including custom network connection wrappers
is a group of applications in the “Finance” category im-
plementing cryptographic network protocols (e.g., in the
com/lumensoft/ks library). We note that these appli-
cations use Asian character sets for their market descrip-
tions, and we could not determine their exact purpose.
Finding 14 - We found no evidence of malicious behav-
ior by applications using Socket directly. We manu-
ally inspected all 75 applications to determine if Socket
use seemed appropriate based on the application descrip-
tion. Our survey yielded a diverse array of Socket uses,
including: ?le transfer protocols, chat protocols, au-
dio and video streaming, and network connection tether-
ing, among other uses excluded for brevity. A handful
of applications have socket connections to hard-coded
IP address and non-standard ports. For example, one
application (com.eingrad.vintagecomicdroid) downloads
comics from 208.94.242.218 on port 2009. Addition-
ally, two of the aforementioned ?nancial applications
(com.miraeasset.mstock and kvp.jjy.MispAndroid320)
include the kr/co/shiftworks library that connects to
221.143.48.118 on port 9001. Furthermore, one applica-
tion (com.tf1.lci) connects to 209.85.227.147 on port 80
in a class named AdService and subsequently calls getLo-
calAddress() to retrieve the phone’s IP address. Overall,
we found no evidence of malicious behavior, but several
applications warrant deeper investigation.
5.2.4 Installed Applications
The list of installed applications provides valuable mar-
keting data. Android has two relevant APIs types: (1)
a set of get APIs returning the list of installed applica-
tions or package names; and (2) a set of query APIs that
mirrors Android’s runtime intent resolution, but can be
made generic. We found 54 uses of the get APIs in 45
applications, and 1015 uses of the query APIs in 361 ap-
plications. Sampling these applications, we observe:
Finding 15 - Applications do not appear to be har-
vesting information about which applications are in-
stalled on the phone. In all but two cases,
the sampled applications using the get APIs search
the results for a speci?c application. One applica-
tion (com.davidgoemans.simpleClockWidget) de?nes a
method that returns the list of all installed applications,
but the results were only displayed to the user. The
second application (raker.duobao.store) de?nes a simi-
lar method, but it only appears to be called by unused
debugging code. Our survey of the query APIs identi-
?ed three calls within the AdMob library duplicated in
many applications. These uses queried speci?c function-
ality and thus are not likely to harvest application infor-
mation. The one non-AdMob application we inspected
queried for speci?c functionality, e.g., speech recogni-
tion, and thus did not appear to attempt harvesting.
5.3 Included Libraries
Libraries included by applications are often easy to iden-
tify due to namespace conventions: i.e., the source
code for com.foo.appname typically exists in com/foo/
appname. During our manual inspection, we docu-
mented advertisement and analytics library paths. We
also found applications sharing what we term “developer
toolkits,” i.e., a common set of developer utilities.
5.3.1 Advertisement and Analytics Libraries
We identi?ed 22 library paths containing ad or analytics
functionality. Sampled applications frequently contained
Table 5: Identi?ed Ad and Analytics Library Paths
Library Path # Apps Format Obtains
?
com/admob/android/ads 320 Obf. L
com/google/ads 206 Plain -
com/?urry/android 98 Obf. -
com/qwapi/adclient/android 74 Plain L, P, E
com/google/android/apps/analytics 67 Plain -
com/adwhirl 60 Plain L
com/mobclix/android/sdk 58 Plain L, E
‡
com/millennialmedia/android 52 Plain -
com/zestadz/android 10 Plain -
com/admarvel/android/ads 8 Plain -
com/estsoft/adlocal 8 Plain L
com/adfonic/android 5 Obf. -
com/vdroid/ads 5 Obf. L, E
com/greystripe/android/sdk 4 Obf. E
com/medialets 4 Obf. L
com/wooboo/adlib android 4 Obf. L, P, I
†
com/adserver/adview 3 Obf. L
com/tapjoy 3 Plain -
com/inmobi/androidsdk 2 Plain E
‡
com/apegroup/ad 1 Plain -
com/casee/adsdk 1 Plain S
com/webtrends/mobile 1 Plain L, E, S, I
Total Unique Apps 561 - -
?
L = Location; P = Phone number; E = IMEI; S = IMSI; I = ICC-ID
†
In 1 app, the library included “L”, while the other 3 included “P, I”.
‡
Direct API use not decompiled, but wrapper .getDeviceId() called.
multiple of these libraries. Using the paths listed in Ta-
ble 5, we found: 1 app has 8 libraries; 10 apps have 7 li-
braries; 8 apps have 6 libraries; 15 apps have 5 libraries;
37 apps have 4 libraries; 32 apps have 3 libraries; 91 apps
have 2 libraries; and 367 apps have 1 library.
Table 5 shows advertisement and analytics library use.
In total, at least 561 applications (51%) include these
libraries; however, additional libraries may exist, and
some applications include custom ad and analytics func-
tionality. The AdMob library is used most pervasively,
existing in 320 applications (29.1%). Google Ads is used
by 206 applications (18.7%). We observe from Table 5
that only a handful of libraries are used pervasively.
Several libraries access phone identi?er and location
APIs. Given the library purpose, it is easy to specu-
late data ?ows to network APIs. However, many of
these ?ows were not detected by program analysis. This
is (likely) a result of code recovery failures and ?ows
through Android IPC. For example, AdMob has known
location to network data ?ows [14], and we identi?ed
a code recovery failure for the class implementing that
functionality. Several libraries are also obfuscated, as
mentioned in Section 6. Interesting, 6 of the 13 li-
braries accessing sensitive information are obfuscated.
The analysis informs the following additional ?ndings.
Finding 16 - Ad and analytics library use of phone iden-
ti?ers and location is sometimes con?gurable. The
com/webtrends/mobile analytics library (used by
com.statefarm.pocketagent), de?nes the WebtrendsId-
Method class specifying four identi?er types. Only one
type, “system id extended” uses phone identi?ers (IMEI,
IMSI, and ICC-ID). It is unclear which identi?er type
was used by the application. Other libraries provide sim-
ilar con?guration. For example, the AdMob SDK docu-
mentation [6] indicates that location information is only
included if a package manifest con?guration enables it.
Finding 17 - Analytics library reporting frequency is of-
ten con?gurable. During manual inspection, we encoun-
tered one application (com.handmark.mpp.news.reuters)
in which the phone number is passed to FlurryA-
gent.onEvent() as generic data. This method is called
throughout the application, specifying event labels such
as “GetMoreStories,” “StoryClickedFromList,” and “Im-
ageZoom.” Here, we observe the main application code
not only speci?es the phone number to be reported, but
also report frequency.
Finding 18 - Ad and analytics libraries probe for permis-
sions. The com/webtrends/mobile library accesses
the IMEI, IMSI, ICC-ID, and location. The (Webtrend-
sAndroidValueFetcher) class uses try/catch blocks that
catch the SecurityException that is thrown when an appli-
cation does not have the proper permission. Similar func-
tionality exists in the com/casee/adsdk library (used
by com.?sh.luny). In AdFetcher.getDeviceId(), An-
droid’s checkCallingOrSelfPermission() method is eval-
uated before accessing the IMSI.
5.3.2 Developer Toolkits
Several inspected applications use developer toolkits
containing common sets of utilities identi?able by class
name or library path. We observe the following.
Finding 19 - Some developer toolkits replicate dan-
gerous functionality. We found three wallpaper
applications by developer “callmejack” that include
utilities in the library path com/jackeeywu/apps/
eWallpaper (com.eoeandroid.eWallpapers.cartoon,
com.jackeey.wallpapers.all1.orange, and com.jackeey.
eWallpapers.gundam). This library has data ?ow sinks
for the phone number, IMEI, IMSI, and ICC-ID. In July
2010, Lookout, Inc. reported a wallpaper application
by developer “jackeey,wallpaper” as sending these
identi?ers to imnet.us [29]. This report also indicated
that the developer changed his name to “callmejack”.
While the original “jackeey,wallpaper” application was
removed from the Android Market, the applications by
“callmejack” remained as of September 2010.
3
Finding 20 - Some developer toolkits probe for permis-
sions. In one application (com.july.cbssports.activity),
we found code in the com/julysystems library that
evaluates Android’s checkPermission() method for the
READ_PHONE_STATE and ACCESS_FINE_LOCATION per-
missions before accessing the IMEI, phone number, and
last known location, respectively. A second application
(v00032.com.wordplayer) de?nes the CustomException-
Hander class to send an exception event to an HTTP
URL. The class attempts to retrieve the phone num-
ber within a try/catch block, catching a generic Ex-
ception. However, the application does not have the
READ_PHONE_STATE permission, indicating the class is
likely used in multiple applications.
Finding 21 - Well-known brands sometimes commis-
sion developers that include dangerous functional-
ity. The com/julysystems developer toolkit iden-
ti?ed as probing for permissions exists in two appli-
cations with reputable application providers. “CBS
Sports Pro Football” (com.july.cbssports.activity) is pro-
vided by “CBS Interactive, Inc.”, and “Univision F¨ utbol”
(com.july.univision) is provided by “Univision Interac-
tive Media, Inc.”. Both have location and phone state
permissions, and hence potentially misuse information.
Similarly, “USA TODAY” (com.usatoday.android.
news) provided by “USA TODAY” and “FOX News”
(com.foxnews.android) provided by “FOX News Net-
work, LLC” contain the com/mercuryintermedia
toolkit. Both applications contain an Android ac-
tivity component named MainActivity. In the ini-
tialization phase, the IMEI is retrieved and passed
to ProductCon?guration.initialize() (part of the com/
mecuryintermedia toolkit). Both applications have
IMEI to network data ?ows through this method.
5.4 Android-speci?c Vulnerabilities
This section explores Android-speci?c vulnerabilities.
The technical report [15] provides speci?cation details.
5.4.1 Leaking Information to Logs
Android provides centralized logging via the Log API,
which can displayed with the “logcat” command.
While logcat is a debugging tool, applications with the
READ_LOGS permission can read these log messages. The
Android documentation for this permission indicates that
“[the logs] can contain slightly private information about
what is happening on the device, but should never con-
tain the user’s private information.” We looked for data
?ows from phone identi?er and location APIs to the An-
droid logging interface and found the following.
Finding 22 - Private information is written to Android’s
general logging interface. We found 253 data ?ows in 96
applications for location information, and 123 ?ows in
90 applications for phone identi?ers. Frequently, URLs
containing this private information are logged just before
a network connection is made. Thus, the READ_LOGS
permission allows access to private information.
5.4.2 Leaking Information via IPC
Shown in Figure 5, any application can receive intent
broadcasts that do not specify the target component or
Partially Speci?ed Intent Message
- Action: "pkgname.intent.ACTION"
Fully Speci?ed Intent Message
- Action: "pkgname.intent.ACTION"
- Component: "pkgname.FooReceiver"
malicous.BarReceiver
- Filter: "pkgname.intent.ACTION"
pkgname.FooReceiver
- Filter: "pkgname.intent.ACTION"
Application: pkgname Application: malicous
Figure 5: Eavesdropping on unprotected intents
protect the broadcast with a permission (permission vari-
ant not shown). This is unsafe if the intent contains sensi-
tive information. We found 271 such unsafe intent broad-
casts with “extras” data in 92 applications (8.4%). Sam-
pling these applications, we found several such intents
used to install shortcuts to the home screen.
Finding 23 - Applications broadcast private informa-
tion in IPC accessible to all applications. We found
many cases of applications sending unsafe intents to
action strings containing the application’s namespace
(e.g., “pkgname.intent.ACTION” for application pkg-
name). The contents of the bundled information var-
ied. In some instances, the data was not sensitive,
e.g., widget and task identi?ers. However, we also
found sensitive information. For example one applica-
tion (com.ulocate) broadcasts the user’s location to the
“com.ulocate.service.LOCATION” intent action string
without protection. Another application (com.himsn)
broadcasts the instant messaging client’s status to the
“cm.mz.stS” action string. These vulnerabilities allow
malicious applications to eavesdrop on sensitive infor-
mation in IPC, and in some cases, gain access to infor-
mation that requires a permission (e.g., location).
5.4.3 Unprotected Broadcast Receivers
Applications use broadcast receiver components to re-
ceive intent messages. Broadcast receivers de?ne “intent
?lters” to subscribe to speci?c event types are public. If
the receiver is not protected by a permission, a malicious
application can forge messages.
Finding 24 - Few applications are vulnerable to forg-
ing attacks to dynamic broadcast receivers. We found
406 unprotected broadcast receivers in 154 applications
(14%). We found an large number of receivers sub-
scribed to system de?ned intent types. These receivers
are indirectly protected by Android’s “protected broad-
casts” introduced to eliminate forging. We found one
application with an unprotected broadcast receiver for a
custom intent type; however it appears to have limited
impact. Additional sampling may uncover more cases.
5.4.4 Intent Injection Attacks
Intent messages are also used to start activity and service
components. An intent injection attack occurs if the in-
tent address is derived from untrusted input.
We found 10 data ?ows from the network to an in-
tent address in 1 application. We could not con?rm
the data ?ow and classify it a false positive. The data
?ow sink exists in a class named ProgressBroadcasting-
FileInputStream. No decompiled code references this
class, and all data ?ow sources are calls to URLCon-
nection.getInputStream(), which is used to create Input-
StreamReader objects. We believe the false positives re-
sults from the program analysis modeling of classes ex-
tending InputStream.
We found 80 data ?ows from IPC to an intent address
in 37 applications. We classi?ed the data ?ows by the
sink: the Intent constructor is the sink for 13 applica-
tions; setAction() is the sink for 16 applications; and set-
Component() is the sink for 8 applications. These sets
are disjoint. Of the 37 applications, we found that 17
applications set the target component class explicitly (all
except 3 use the setAction() data ?ow sink), e.g., to relay
the action string from a broadcast receiver to a service.
We also found four false positives due to our assumption
that all Intent objects come from IPC (a few exceptions
exist). For the remaining 16 cases, we observe:
Finding 25 - Some applications de?ne intent addresses
based on IPC input. Three applications use IPC input
strings to specify the package and component names for
the setComponent() data ?ow sink. Similarly, one appli-
cation uses the IPC “extras” input to specify an action to
an Intent constructor. Two additional applications start
an activity based on the action string returned as a result
from a previously started activity. However, to exploit
this vulnerability, the applications must ?rst start a ma-
licious activity. In the remaining cases, the action string
used to start a component is copied directly into a new
intent object. A malicious application can exploit this
vulnerability by specifying the vulnerable component’s
name directly and controlling the action string.
5.4.5 Delegating Control
Applications can delegate actions to other applications
using a “pending intent.” An application ?rst creates an
intent message as if it was performing the action. It then
creates a reference to the intent based on the target com-
ponent type (restricting how it can be used). The pend-
ing intent recipient cannot change values, but it can ?ll in
missing ?elds. Therefore, if the intent address is unspec-
i?ed, the remote application can redirect an action that is
performed with the original application’s permissions.
Finding 26 - Few applications unsafely delegate actions.
We found 300 unsafe pending intent objects in 116 appli-
cations (10.5%). Sampling these applications, we found
an overwhelming number of pending intents used for ei-
ther: (1) Android’s UI noti?cation service; (2) Android’s
alarm service; or (3) communicating between a UI wid-
get and the main application. None of these cases allow
manipulation by a malicious application. We found two
applications that send unsafe pending intents via IPC.
However, exploiting these vulnerabilities appears to pro-
vides negligible adversarial advantage. We also note that
more a more sophisticated analysis framework could be
used to eliminate the aforementioned false positives.
5.4.6 Null Checks on IPC Input
Android applications frequently process information
from intent messages received from other applications.
Null dereferences cause an application to crash, and can
thus be used to as a denial of service.
Finding 27 - Applications frequently do not perform null
checks on IPC input. We found 3,925 potential null
dereferences on IPC input in 591 applications (53.7%).
Most occur in classes for activity components (2,484
dereferences in 481 applications). Null dereferences in
activity components have minimal impact, as the appli-
cation crash is obvious to the user. We found 746 poten-
tial null dereferences in 230 applications within classes
de?ning broadcast receiver components. Applications
commonly use broadcast receivers to start background
services, therefore it is unclear what effect a null deref-
erence in a broadcast receiver will have. Finally, we
found 72 potential null dereferences in 36 applications
within classes de?ning service components. Applica-
tions crashes corresponding to these null dereferences
have a higher probability of going unnoticed. The re-
maining potential null dereferences are not easily associ-
ated with a component type.
5.4.7 SDcard Use
Any application that has access to read or write data on
the SDcard can read or write any other application’s data
on the SDcard. We found 657 references to the SDcard in
251 applications (22.8%). Sampling these applications,
we found a few unexpected uses. For example, the com/
tapjoy ad library (used by com.jnj.mocospace.android)
determines the free space available on the SDcard. An-
other application (com.rent) obtains a URL from a ?le
named connRentInfo.dat at the root of the SDcard.
5.4.8 JNI Use
Applications can include functionality in native libraries
using the Java Native Interface (JNI). As these methods
are not written in Java, they have inherent dangers. We
found 2,762 calls to native methods in 69 applications
(6.3%). Investigating the application package ?les, we
found that 71 applications contain .so ?les. This indi-
cates two applications with an .so ?le either do not call
any native methods, or the code calling the native meth-
ods was not decompiled. Across these 71 applications,
we found 95 .so ?les, 82 of which have unique names.
6 Study Limitations
Our study section was limited in three ways: a) the stud-
ied applications were selected with a bias towards popu-
larity; b) the program analysis tool cannot compute data
and control ?ows for IPC between components; and c)
source code recovery failures interrupt data and control
?ows. Missing data and control ?ows may lead to false
negatives. In addition to the recovery failures, the pro-
gram analysis tool could not parse 8,042 classes, reduc-
ing coverage to 91.34% of the classes.
Additionally, a portion of the recovered source code
was obfuscated before distribution. Code obfuscation
signi?cantly impedes manual inspection. It likely exists
to protect intellectual property; Google suggests obfus-
cation using ProGuard (proguard.sf.net) for applica-
tions using its licensing service [23]. ProGuard protects
against readability and does not obfuscate control ?ow.
Therefore it has limited impact on program analysis.
Many forms of obfuscated code are easily recogniz-
able: e.g., class, method, and ?eld names are converted
to single letters, producing single letter Java ?lenames
(e.g., a.java). For a rough estimate on the use of obfus-
cation, we searched applications containing a.java. In
total, 396 of the 1,100 applications contain this ?le. As
discussed in Section 5.3, several advertisement and ana-
lytics libraries are obfuscated. To obtain a closer estimate
of the number of applications whose main code is obfus-
cated, we searched for a.java within a ?le path equiva-
lent to the package name (e.g., com/foo/appname for
com.foo.appname). Only 20 applications (1.8%) have
this obfuscation property, which is expected for free ap-
plications (as opposed to paid applications). However,
we stress that the a.java heuristic is not intended to be
a ?rm characterization of the percentage of obfuscated
code, but rather a means of acquiring insight.
7 What This All Means
Identifying a singular take-away from a broad study such
as this is non-obvious. We come away from the study
with two central thoughts; one having to do with the
study apparatus, and the other regarding the applications.
ded and the program analysis speci?cations are en-
abling technologies that open a new door for application
certi?cation. We found the approach rather effective de-
spite existing limitations. In addition to further studies of
this kind, we see the potential to integrate these tools into
an application certi?cation process. We leave such dis-
cussions for future work, noting that such integration is
challenging for both logistical and technical reasons [30].
On a technical level, we found the security character-
istics of the top 1,100 free popular applications to be con-
sistent with smaller studies (e.g., Enck et al. [14]). Our
?ndings indicate an overwhelming concern for misuse of
privacy sensitive information such as phone identi?ers
and location information. One might speculate this oc-
cur due to the dif?culty in assigning malicious intent.
Arguably more important than identifying the exis-
tence the information misuse, our manual source code
inspection sheds more light on how information is mis-
used. We found phone identi?ers, e.g., phone number,
IMEI, IMSI, and ICC-ID, were used for everything from
“cookie-esque” tracking to account numbers. Our ?nd-
ings also support the existence of databases external to
cellular providers that link identi?ers such as the IMEI
to personally identi?able information.
Our analysis also identi?ed signi?cant penetration of
ad and analytic libraries, occurring in 51% of the studied
applications. While this might not be surprising for free
applications, the number of ad and analytics libraries in-
cluded per application was unexpected. One application
included as many as eight different libraries. It is unclear
why an application needs more than one advertisement
and one analytics library.
From a vulnerability perspective, we found that many
developers fail to take necessary security precautions.
For example, sensitive information is frequently writ-
ten to Android’s centralized logs, as well as occasionally
broadcast to unprotected IPC. We also identi?ed the po-
tential for IPC injection attacks; however, no cases were
readily exploitable.
Finally, our study only characterized one edge of the
application space. While we found no evidence of tele-
phony misuse, background recording of audio or video,
or abusive network connections, one might argue that
such malicious functionality is less likely to occur in
popular applications. We focused our study on popular
applications to characterize those most frequently used.
Future studies should take samples that span application
popularity. However, even these samples may miss the
existence of truly malicious applications. Future studies
should also consider several additional attacks, including
installing new applications [43], JNI execution [34], ad-
dress book ex?ltration, destruction of SDcard contents,
and phishing [20].
8 Related Work
Many tools and techniques have been designed to iden-
tify security concerns in software. Software written in
C is particularly susceptible to programming errors that
result in vulnerabilities. Ashcraft and Engler [7] use
compiler extensions to identify errors in range checks.
MOPS [11] uses model checking to scale to large
amounts of source code [42]. Java applications are in-
herently safer than C applications and avoid simple vul-
nerabilities such as buffer over?ows. Ware and Fox [46]
compare eight different open source and commercially
available Java source code analysis tools, ?nding that
no one tool detects all vulnerabilities. Hovemeyer and
Pugh [22] study six popular Java applications and li-
braries using FindBugs extended with additional checks.
While analysis included non-security bugs, the results
motivate a strong need for automated analysis by all de-
velopers. Livshits and Lam [28] focus on Java-based
Web applications. In the Web server environment, inputs
are easily controlled by an adversary, and left unchecked
can lead to SQL injection, cross-site scripting, HTTP re-
sponse splitting, path traversal, and command injection.
Felmetsger et al. [19] also study Java-based web applica-
tions; they advance vulnerability analysis by providing
automatic detection of application-speci?c logic errors.
Spyware and privacy breaching software have also
been studied. Kirda et al. [26] consider behavioral prop-
erties of BHOs and toolbars. Egele et al. [13] target
information leaks by browser-based spyware explicitly
using dynamic taint analysis. Panaorama [47] consid-
ers privacy-breaching malware in general using whole-
system, ?ne-grained taint tracking. Privacy Oracle [24]
uses differential black box fuzz testing to ?nd privacy
leaks in applications.
On smartphones, TaintDroid [14] uses system-wide
dynamic taint tracking to identify privacy leaks in An-
droid applications. By using static analysis, we were able
to study a far greater number of applications (1,100 vs.
30). However, TaintDroid’s analysis con?rms the ex?l-
tration of information, while our static analysis only con-
?rms the potential for it. Kirin [16] also uses static anal-
ysis, but focuses on permissions and other application
con?guration data, whereas our study analyzes source
code. Finally, PiOS [12] performs static analysis on iOS
applications for the iPhone. The PiOS study found the
majority of analyzed applications to leak the device ID
and over half of the applications include advertisement
and analytics libraries.
9 Conclusions
Smartphones are rapidly becoming a dominant comput-
ing platform. Low barriers of entry for application de-
velopers increases the security risk for end users. In this
paper, we described the ded decompiler for Android ap-
plications and used decompiled source code to perform a
breadth study of both dangerous functionality and vul-
nerabilities. While our ?ndings of exposure of phone
identi?ers and location are consistent with previous stud-
ies, our analysis framework allows us to observe not only
the existence of dangerous functionality, but also how it
occurs within the context of the application.
Moving forward, we foresee ded and our analysis
speci?cations as enabling technologies that will open
new doors for application certi?cation. However, the in-
tegration of these technologies into an application certi?-
cation process requires overcoming logistical and techni-
cal challenges. Our future work will consider these chal-
lenges, and broaden our analysis to new areas, including
application installation, malicious JNI, and phishing.
Acknowledgments
We would like to thank Fortify Software Inc. for pro-
viding us with a complementary copy of Fortify SCA
to perform the study. We also thank Suneel Sundar
and Joy Marie Forsythe at Fortify for helping us debug
custom rules. Finally, we thank Kevin Butler, Stephen
McLaughlin, Patrick Traynor, and the SIIS lab for their
editorial comments during the writing of this paper. This
material is based upon work supported by the National
Science Foundation Grant No. CNS-0905447, CNS-
0721579, and CNS-0643907. Any opinions, ?ndings,
and conclusions or recommendations expressed in this
material are those of the author(s) and do not necessarily
re?ect the views of the National Science Foundation.
References
[1] Fern?ower - java decompiler. http://www.reversed-java.
com/fernflower/.
[2] Fortify 360 Source Code Analyzer (SCA). https:
//www.fortify.com/products/fortify360/
source-code-analyzer.html.
[3] Jad. http://www.kpdus.com/jad.html.
[4] Jd java decompiler. http://java.decompiler.free.fr/.
[5] Mocha, the java decompiler. http://www.brouhaha.com/
~
eric/software/mocha/.
[6] ADMOB. AdMob Android SDK: Installation Instruc-
tions. http://www.admob.com/docs/AdMob_Android_SDK_
Instructions.pdf. Accessed November 2010.
[7] ASHCRAFT, K., AND ENGLER, D. Using Programmer-Written
Compiler Extensions to Catch Security Holes. In Proceedings of
the IEEE Symposium on Security and Privacy (2002).
[8] BBC NEWS. New iPhone worm can act like botnet
say experts. http://news.bbc.co.uk/2/hi/technology/
8373739.stm, November 23, 2009.
[9] BORNSTEIN, D. Google i/o 2008 - dalvik virtual machine inter-
nals. http://www.youtube.com/watch?v=ptjedOZEXPM.
[10] BURNS, J. Developing Secure Mobile Applications for Android.
iSEC Partners, October 2008. http://www.isecpartners.
com/files/iSEC_Securing_Android_Apps.pdf.
[11] CHEN, H., DEAN, D., AND WAGNER, D. Model Checking One
Million Lines of C Code. In Proceedings of the 11th Annual Net-
work and Distributed System Security Symposium (Feb. 2004).
[12] EGELE, M., KRUEGEL, C., KIRDA, E., AND VIGNA, G. PiOS:
Detecting Privacy Leaks in iOS Applications. In Proceedings of
the Network and Distributed System Security Symposium (2011).
[13] EGELE, M., KRUEGEL, C., KIRDA, E., YIN, H., AND SONG,
D. Dynamic Spyware Analysis. In Proceedings of the USENIX
Annual Technical Conference (June 2007), pp. 233–246.
[14] ENCK, W., GILBERT, P., CHUN, B.-G., COX, L. P., JUNG,
J., MCDANIEL, P., AND SHETH, A. N. TaintDroid: An
Information-Flow Tracking System for Realtime Privacy Moni-
toring on Smartphones. In Proceedings of the USENIX Sympo-
sium on Operating Systems Design and Implementation (2010).
[15] ENCK, W., OCTEAU, D., MCDANIEL, P., AND CHAUDHURI,
S. A Study of Android Application Security. Tech. Rep. NAS-
TR-0144-2011, Network and Security Research Center, Depart-
ment of Computer Science and Engineering, Pennsylvania State
University, University Park, PA, USA, January 2011.
[16] ENCK, W., ONGTANG, M., AND MCDANIEL, P. On
Lightweight Mobile Phone Application Certi?cation. In Proceed-
ings of the 16th ACM Conference on Computer and Communica-
tions Security (CCS) (Nov. 2009).
[17] ENCK, W., ONGTANG, M., AND MCDANIEL, P. Understand-
ing Android Security. IEEE Security & Privacy Magazine 7, 1
(January/February 2009), 50–57.
[18] F-SECURE CORPORATION. Virus Description: Viver.A.
http://www.f-secure.com/v-descs/trojan_symbos_
viver_a.shtml.
[19] FELMETSGER, V., CAVEDON, L., KRUEGEL, C., AND VIGNA,
G. Toward Automated Detection of Logic Vulnerabilities in Web
Applications. In Proceedings of the USENIX Security Symposium
(2010).
[20] FIRST TECH CREDIT UNION. Security Fraud: Rogue Android
Smartphone app created. http://www.firsttechcu.com/
home/security/fraud/security_fraud.html, Dec. 2009.
[21] GOODIN, D. Backdoor in top iphone games stole
user data, suit claims. The Register, November 2009.
http://www.theregister.co.uk/2009/11/06/iphone_
games_storm8_lawsuit/.
[22] HOVEMEYER, D., AND PUGH, W. Finding Bugs is Easy. In Pro-
ceedings of the ACM conference on Object-Oriented Program-
ming Systems, Languages, and Applications (2004).
[23] JOHNS, T. Securing Android LVL Applications.
http://android-developers.blogspot.com/2010/
09/securing-android-lvl-applications.html, 2010.
[24] JUNG, J., SHETH, A., GREENSTEIN, B., WETHERALL, D.,
MAGANIS, G., AND KOHNO, T. Privacy Oracle: A System for
Finding Application Leaks with Black Box Differential Testing.
In Proceedings of the ACM conference on Computer and Com-
munications Security (2008).
[25] KASPERSKEY LAB. First SMS Trojan detected for smartphones
running Android. http://www.kaspersky.com/news?id=
207576158, August 2010.
[26] KIRDA, E., KRUEGEL, C., BANKS, G., VIGNA, G., AND KEM-
MERER, R. A. Behavior-based Spyware Detection. In Proceed-
ings of the 15th USENIX Security Symposium (Aug. 2006).
[27] KRALEVICH, N. Best Practices for Handling Android User
Data. http://android-developers.blogspot.com/2010/
08/best-practices-for-handling-android.html, 2010.
[28] LIVSHITS, V. B., AND LAM, M. S. Finding Security Vulnera-
bilities in Java Applications with Static Analysis. In Proceedings
of the 14th USENIX Security Symposium (2005).
[29] LOOKOUT. Update and Clari?cation of Analysis of Mobile Ap-
plications at Blackhat 2010. http://blog.mylookout.com/
2010/07/mobile-application-analysis-blackhat/,
July 2010.
[30] MCDANIEL, P., AND ENCK, W. Not So Great Expectations:
Why Application Markets Haven’t Failed Security. IEEE Secu-
rity & Privacy Magazine 8, 5 (September/October 2010), 76–78.
[31] MIECZNIKOWSKI, J., AND HENDREN, L. Decompiling java us-
ing staged encapsulation. In Proceedings of the Eighth Working
Conference on Reverse Engineering (2001).
[32] MIECZNIKOWSKI, J., AND HENDREN, L. J. Decompiling java
bytecode: Problems, traps and pitfalls. In Proceedings of the 11th
International Conference on Compiler Construction (2002).
[33] MILNER, R. A theory of type polymorphism in programming.
Journal of Computer and System Sciences 17 (August 1978).
[34] OBERHEIDE, J. Android Hax. In Proceedings of SummerCon
(June 2010).
[35] OCTEAU, D., ENCK, W., AND MCDANIEL, P. The ded Decom-
piler. Tech. Rep. NAS-TR-0140-2010, Network and Security Re-
search Center, Department of Computer Science and Engineer-
ing, Pennsylvania State University, University Park, PA, USA,
Sept. 2010.
[36] ONGTANG, M., BUTLER, K., AND MCDANIEL, P. Porscha:
Policy Oriented Secure Content Handling in Android. In Proc. of
the Annual Computer Security Applications Conference (2010).
[37] ONGTANG, M., MCLAUGHLIN, S., ENCK, W., AND MC-
DANIEL, P. Semantically Rich Application-Centric Security in
Android. In Proceedings of the Annual Computer Security Appli-
cations Conference (2009).
[38] PORRAS, P., SAIDI, H., AND YEGNESWARAN, V. An Analysis
of the Ikee.B (Duh) iPhone Botnet. Tech. rep., SRI International,
Dec. 2009. http://mtc.sri.com/iPhone/.
[39] PROEBSTING, T. A., AND WATTERSON, S. A. Krakatoa: De-
compilation in java (does bytecode reveal source?). In Proceed-
ings of the USENIX Conference on Object-Oriented Technologies
and Systems (1997).
[40] RAPHEL, J. Google: Android wallpaper apps were not security
threats. Computerworld (August 2010).
[41] SCHLEGEL, R., ZHANG, K., ZHOU, X., INTWALA, M., KAPA-
DIA, A., AND WANG, X. Soundcomber: AStealthy and Context-
Aware Sound Trojan for Smartphones. In Proceedings of the Net-
work and Distributed System Security Symposium (2011).
[42] SCHWARZ, B., CHEN, H., WAGNER, D., MORRISON, G.,
WEST, J., LIN, J., AND TU, W. Model Checking an Entire
Linux Distribution for Security Violations. In Proceedings of the
Annual Computer Security Applications Conference (2005).
[43] STORM, D. Zombies and Angry Birds attack: mobile phone mal-
ware. Computerworld (November 2010).
[44] TIURYN, J. Type inference problems: A survey. In Proceedings
of the Mathematical Foundations of Computer Science (1990).
[45] VALLEE-RAI, R., GAGNON, E., HENDREN, L., LAM, P., POM-
INVILLE, P., AND SUNDARESAN, V. Optimizing java bytecode
using the soot framework: Is it feasible? In International Confer-
ence on Compiler Construction, LNCS 1781 (2000), pp. 18–34.
[46] WARE, M. S., AND FOX, C. J. Securing Java Code: Heuristics
and an Evaluation of Static Analysis Tools. In Proceedings of the
Workshop on Static Analysis (SAW) (2008).
[47] YIN, H., SONG, D., EGELE, M., KRUEGEL, C., AND KIRDA,
E. Panorama: Capturing System-wide Information Flow for Mal-
ware Detection and Analysis. In Proceedings of the ACM confer-
ence on Computer and Communications Security (2007).
Notes
1
The undx and dex2jar tools attempt to decompile .dex ?les, but
were non-functional at the time of this writing.
2
Note that it is suf?cient to ?nd any type-exposing instruction for
a register assignment. Any code that could result in different types for
the same register would be illegal. If this were to occur, the primitive
type would be dependent on the path taken at run time, a clear violation
of Java’s type system.
3
Fortunately, these dangerous applications are now nonfunc-
tional, as the imnet.us NS entry is NS1.SUSPENDED-FOR.
SPAM-AND-ABUSE.COM.
doc_288733385.pdf
The fluidity of application markets complicate smartphone security. Although recent efforts have shed light on particular security issues, there remains little insight into broader security characteristics of smartphone applications. This paper seeks to better understand smartphone application security by studying 1,100 popular free Android applications.
A Study of Android Application Security
William Enck, Damien Octeau, Patrick McDaniel, and Swarat Chaudhuri
Systems and Internet Infrastructure Security Laboratory
Department of Computer Science and Engineering
The Pennsylvania State University
{enck, octeau, mcdaniel, swarat}@cse.psu.edu
Abstract
The ?uidity of application markets complicate smart-
phone security. Although recent efforts have shed light
on particular security issues, there remains little insight
into broader security characteristics of smartphone ap-
plications. This paper seeks to better understand smart-
phone application security by studying 1,100 popular
free Android applications. We introduce the ded decom-
piler, which recovers Android application source code
directly from its installation image. We design and exe-
cute a horizontal study of smartphone applications based
on static analysis of 21 million lines of recovered code.
Our analysis uncovered pervasive use/misuse of person-
al/phone identi?ers, and deep penetration of advertising
and analytics networks. However, we did not ?nd ev-
idence of malware or exploitable vulnerabilities in the
studied applications. We conclude by considering the
implications of these preliminary ?ndings and offer di-
rections for future analysis.
1 Introduction
The rapid growth of smartphones has lead to a renais-
sance for mobile services. Go-anywhere applications
support a wide array of social, ?nancial, and enterprise
services for any user with a cellular data plan. Appli-
cation markets such as Apple’s App Store and Google’s
Android Market provide point and click access to hun-
dreds of thousands of paid and free applications. Mar-
kets streamline software marketing, installation, and
update—therein creating low barriers to bring applica-
tions to market, and even lower barriers for users to ob-
tain and use them.
The ?uidity of the markets also presents enormous se-
curity challenges. Rapidly developed and deployed ap-
plications [40], coarse permission systems [16], privacy-
invading behaviors [14, 12, 21], malware [20, 25, 38],
and limited security models [36, 37, 27] have led to ex-
ploitable phones and applications. Although users seem-
ingly desire it, markets are not in a position to provide
security in more than a super?cial way [30]. The lack of
a common de?nition for security and the volume of ap-
plications ensures that some malicious, questionable, and
vulnerable applications will ?nd their way to market.
In this paper, we broadly characterize the security of
applications in the Android Market. In contrast to past
studies with narrower foci, e.g., [14, 12], we consider a
breadth of concerns including both dangerous functional-
ity and vulnerabilities, and apply a wide range of analysis
techniques. In this, we make two primary contributions:
• We design and implement a Dalvik decompilier,
ded. ded recovers an application’s Java source
solely from its installation image by inferring lost
types, performing DVM-to-JVM bytecode retarget-
ing, and translating class and method structures.
• We analyze 21 million LOC retrieved from the top
1,100 free applications in the Android Market using
automated tests and manual inspection. Where pos-
sible, we identify root causes and posit the severity
of discovered vulnerabilities.
Our popularity-focused security analysis provides in-
sight into the most frequently used applications. Our
?ndings inform the following broad observations.
1. Similar to past studies, we found wide misuse of
privacy sensitive information—particularly phone
identi?ers and geographic location. Phone iden-
ti?ers, e.g., IMEI, IMSI, and ICC-ID, were used
for everything from “cookie-esque” tracking to ac-
counts numbers.
2. We found no evidence of telephony misuse, back-
ground recording of audio or video, abusive connec-
tions, or harvesting lists of installed applications.
3. Ad and analytic network libraries are integrated
with 51% of the applications studied, with Ad Mob
(appearing in 29.09% of apps) and Google Ads (ap-
pearing in 18.72% of apps) dominating. Many ap-
plications include more than one ad library.
4. Many developers fail to securely use Android APIs.
These failures generally fall into the classi?cation
of insuf?cient protection of privacy sensitive infor-
mation. However, we found no exploitable vulnera-
bilities that can lead malicious control of the phone.
This paper is an initial but not ?nal word on An-
droid application security. Thus, one should be cir-
cumspect about any interpretation of the following re-
sults as a de?nitive statement about how secure appli-
cations are today. Rather, we believe these results are
indicative of the current state, but there remain many
aspects of the applications that warrant deeper analy-
sis. We plan to continue with this analysis in the fu-
ture and have made the decompiler freely available at
http://siis.cse.psu.edu/ded/ to aid the broader
security community in understanding Android security.
The following sections re?ect the two thrusts of this
work: Sections 2 and 3 provide background and detail
our decompilation process, and Sections 4 and 5 detail
the application study. The remaining sections discuss our
limitations and interpret the results.
2 Background
Android: Android is an OS designed for smartphones.
Depicted in Figure 1, Android provides a sandboxed ap-
plication execution environment. A customized embed-
ded Linux system interacts with the phone hardware and
an off-processor cellular radio. The Binder middleware
and application API runs on top of Linux. To simplify,
an application’s only interface to the phone is through
these APIs. Each application is executed within a Dalvik
Virtual Machine (DVM) running under a unique UNIX
uid. The phone comes pre-installed with a selection of
system applications, e.g., phone dialer, address book.
Applications interact with each other and the phone
through different forms of IPC. Intents are typed inter-
process messages that are directed to particular appli-
cations or systems services, or broadcast to applications
subscribing to a particular intent type. Persistent content
provider data stores are queried through SQL-like inter-
faces. Background services provide RPC and callback
interfaces that applications use to trigger actions or ac-
cess data. Finally user interface activities receive named
action signals from the system and other applications.
Binder acts as a mediation point for all IPC. Access
to system resources (e.g., GPS receivers, text messag-
ing, phone services, and the Internet), data (e.g., address
books, email) and IPC is governed by permissions as-
signed at install time. The permissions requested by the
application and the permissions required to access the
application’s interfaces/data are de?ned in its manifest
?le. To simplify, an application is allowed to access a
resource or interface if the required permission allows
Installed Applications
Embedded Linux
Cellular
Radio
Binder
D
V
M
A
p
p
l
i
c
a
t
i
o
n
D
V
M
A
p
p
l
i
c
a
t
i
o
n
D
V
M
A
p
p
l
i
c
a
t
i
o
n
D
V
M
A
p
p
l
i
c
a
t
i
o
n
System
Applications
D
V
M
A
p
p
l
i
c
a
t
i
o
n
D
V
M
A
p
p
l
i
c
a
t
i
o
n
D
V
M
A
p
p
l
i
c
a
t
i
o
n
GPS
Receiver
Bluetooth
Display
Figure 1: The Android system architecture
it. Permission assignment—and indirectly the security
policy for the phone—is largely delegated to the phone’s
owner: the user is presented a screen listing the permis-
sions an application requests at install time, which they
can accept or reject.
Dalvik Virtual Machine: Android applications are writ-
ten in Java, but run in the DVM. The DVMand Java byte-
code run-time environments differ substantially:
Application Structure. Java applications are composed
of one or more .class ?les, one ?le per class. The JVM
loads the bytecode for a Java class from the associated
.class ?le as it is referenced at run time. Conversely, a
Dalvik application consists of a single .dex ?le contain-
ing all application classes.
Figure 2 provides a conceptual view of the compila-
tion process for DVM applications. After the Java com-
piler creates JVM bytecode, the Dalvik dx compiler con-
sumes the .class ?les, recompiles them to Dalvik byte-
code, and writes the resulting application into a single
.dex ?le. This process consists of the translation, recon-
struction, and interpretation of three basic elements of
the application: the constant pools, the class de?nitions,
and the data segment. A constant pool describes, not sur-
prisingly, the constants used by a class. This includes,
among other items, references to other classes, method
names, and numerical constants. The class de?nitions
consist in the basic information such as access ?ags and
class names. The data element contains the method code
executed by the target VM, as well as other information
related to methods (e.g., number of DVM registers used,
local variable table, and operand stack sizes) and to class
and instance variables.
Register architecture. The DVM is register-based,
whereas existing JVMs are stack-based. Java bytecode
can assign local variables to a local variable table before
pushing them onto an operand stack for manipulation by
opcodes, but it can also just work on the stack without
explicitly storing variables in the table. Dalvik bytecode
assigns local variables to any of the 2
16
available regis-
ters. The Dalvik opcodes directly manipulate registers,
rather than accessing elements on a program stack.
Instruction set. The Dalvik bytecode instruction set is
Class1.class
Data
Class Info
Constant Pool
ClassN.class
Data
Class Info
Constant Pool
.dex ?le
Header
Constant Pool
Data
ClassN de?nition
Class1 de?nition
Java
Compiler
dx
Java
Source Code
(.java ?les)
Figure 2: Compilation process for DVM applications
substantially different than that of Java. Dalvik has 218
opcodes while Java has 200; however, the nature of the
opcodes is very different. For example, Java has tens
of opcodes dedicated to moving elements between the
stack and local variable table. Dalvik instructions tend to
be longer than Java instructions; they often include the
source and destination registers. As a result, Dalvik ap-
plications require fewer instructions. In Dalvik bytecode,
applications have on average 30%fewer instructions than
in Java, but have a 35% larger code size (bytes) [9].
Constant pool structure. Java applications replicate ele-
ments in constant pools within the multiple .class ?les,
e.g., referrer and referent method names. The dx com-
piler eliminates much of this replication. Dalvik uses a
single pool that all classes simultaneously reference. Ad-
ditionally, dx eliminates some constants by inlining their
values directly into the bytecode. In practice, integers,
long integers, and single and double precision ?oating-
point elements disappear during this process.
Control ?ow Structure. Control ?ow elements such
as loops, switch statements and exception handlers are
structured differently in Dalvik and Java bytecode. Java
bytecode structure loosely mirrors the source code,
whereas Dalvik bytecode does not.
Ambiguous primitive types. Java bytecode vari-
able assignments distinguish between integer (int) and
single-precision ?oating-point (float) constants and be-
tween long integer (long) and double-precision ?oating-
point (double) constants. However, Dalvik assignments
(int/float and long/double) use the same opcodes for
integers and ?oats, e.g., the opcodes are untyped beyond
specifying precision.
Null references. The Dalvik bytecode does not specify
a null type, instead opting to use a zero value constant.
Thus, constant zero values present in the Dalvik byte-
code have ambiguous typing that must be recovered.
Comparison of object references. The Java bytecode
uses typed opcodes for the comparison of object refer-
ences (if acmpeq and if acmpne) and for null compar-
ison of object references (ifnull and ifnonnull). The
Dalvik bytecode uses a more simplistic integer compar-
ison for these purposes: a comparison between two in-
tegers, and a comparison of an integer and zero, respec-
tively. This requires the decompilation process to recover
types for integer comparisons used in DVM bytecode.
Storage of primitive types in arrays. The Dalvik byte-
code uses ambiguous opcodes to store and retrieve el-
ements in arrays of primitive types (e.g., aget for in-
t/?oat and aget-wide for long/double) whereas the cor-
responding Java bytecode is unambiguous. The array
type must be recovered for correct translation.
3 The ded decompiler
Building a decompiler from DEX to Java for the study
proved to be surprisingly challenging. On the one hand,
Java decompilation has been studied since the 1990s—
tools such as Mocha [5] date back over a decade, with
many other techniques being developed [39, 32, 31, 4,
3, 1]. Unfortunately, prior to our work, there existed no
functional tool for the Dalvik bytecode.
1
Because of the
vast differences between JVM and DVM, simple modi?-
cation of existing decompilers was not possible.
This choice to decompile the Java source rather than
operate on the DEX opcodes directly was grounded in
two reasons. First, we wanted to leverage existing tools
for code analysis. Second, we required access to source
code to identify false-positives resulting from automated
code analysis, e.g., perform manual con?rmation.
ded extraction occurs in three stages: a) retarget-
ing, b) optimization, and c) decompilation. This sec-
tion presents the challenges and process of ded, and con-
cludes with a brief discussion of its validation. Interested
readers are referred to [35] for a thorough treatment.
3.1 Application Retargeting
The initial stage of decompilation retargets the applica-
tion .dex ?le to Java classes. Figure 3 overviews this
process: (1) recovering typing information, (2) translat-
ing the constant pool, and (3) retargeting the bytecode.
Type Inference: The ?rst step in retargeting is to iden-
tify class and method constants and variables. However,
the Dalvik bytecode does not always provide enough in-
formation to determine the type of a variable or constant
from its register declaration. There are two generalized
cases where variable types are ambiguous: 1) constant
and variable declaration only speci?es the variable width
(e.g., 32 or 64 bits), but not whether it is a ?oat, integer,
or null reference; and 2) comparison operators do not
distinguish between integer and object reference compar-
ison (i.e., null reference checks).
Type inference has been widely studied [44]. The sem-
inal Hindley-Milner [33] algorithm provides the basis for
type inference algorithms used by many languages such
(1) DEX Parsing
(2) Java .class
Conversion
(3) Java .class
Optimization
Missing Type
Inference
Constant Pool
Conversion
Method Code
Retargeting
CFG
Construction
Type Inference
Processing
Constant
Identi?cation
Constant Pool
Translation
Bytecode
Reorganization
Instruction Set
Translation
Figure 3: Dalvik bytecode retargeting
as Haskell and ML. These approaches determine un-
known types by observing how variables are used in op-
erations with known type operands. Similar techniques
are used by languages with strong type inference, e.g.,
OCAML, as well weaker inference, e.g., Perl.
ded adopts the accepted approach: it infers register
types by observing how they are used in subsequent op-
erations with known type operands. Dalvik registers
loosely correspond to Java variables. Because Dalvik
bytecode reuses registers whose variables are no longer
in scope, we must evaluate the register type within its
context of the method control ?ow, i.e., inference must
be path-sensitive. Note further that ded type inference is
also method-local. Because the types of passed param-
eters and return values are identi?ed by method signa-
tures, there is no need to search outside the method.
There are three ways ded infers a register’s type. First,
any comparison of a variable or constant with a known
type exposes the type. Comparison of dissimilar types
requires type coercion in Java, which is propagated to
the Dalvik bytecode. Hence legal Dalvik comparisons al-
ways involve registers of the same type. Second, instruc-
tions such as add-int only operate on speci?c types,
manifestly exposing typing information. Third, instruc-
tions that pass registers to methods or use a return value
expose the type via the method signature.
The ded type inference algorithm proceeds as follows.
After reconstructing the control ?ow graph, ded identi-
?es any ambiguous register declaration. For each such
register, ded walks the instructions in the control ?ow
graph starting from its declaration. Each branch of the
control ?ow encountered is pushed onto an inference
stack, e.g., ded performs a depth-?rst search of the con-
trol ?ow graph looking for type-exposing instructions. If
a type-exposing instruction is encountered, the variable
is labeled and the process is complete for that variable.
2
There are three events that cause a branch search to ter-
minate: a) when the register is reassigned to another vari-
able (e.g., a new declaration is encountered), b) when a
return function is encountered, and c) when an exception
is thrown. After a branch is abandoned, the next branch
is popped off the stack and the search continues. Lastly,
type information is forward propagated, modulo register
reassignment, through the control ?ow graph from each
register declaration to all subsequent ambiguous uses.
This algorithm resolves all ambiguous primitive types,
except for one isolated case when all paths leading to
a type ambiguous instruction originate with ambiguous
constant instructions (e.g., all paths leading to an integer
comparison originate with registers assigned a constant
zero). In this case, the type does not impact decompila-
tion, and a default type (e.g., integer) can be assigned.
Constant Pool Conversion: The .dex and .class ?le
constant pools differ in that: a) Dalvik maintains a sin-
gle constant pool for the application and Java maintains
one for each class, and b) Dalvik bytecode places primi-
tive type constants directly in the bytecode, whereas Java
bytecode uses the constant pool for most references. We
convert constant pool information in two steps.
The ?rst step is to identify which constants are needed
for a .class ?le. Constants include references to
classes, methods, and instance variables. ded traverses
the bytecode for each method in a class, noting such ref-
erences. ded also identi?es all constant primitives.
Once ded identi?es the constants required by a class,
it adds them to the target .class ?le. For primitive type
constants, new entries are created. For class, method,
and instance variable references, the created Java con-
stant pool entries are based on the Dalvik constant pool
entries. The constant pool formats differ in complex-
ity. Speci?cally, Dalvik constant pool entries use sig-
ni?cantly more references to reduce memory overhead.
Method Code Retargeting: The ?nal stage of the re-
targeting process is the translation of the method code.
First, we preprocess the bytecode to reorganize structures
that cannot be directly retargeted. Second, we linearly
traverse the DVM bytecode and translate to the JVM.
The preprocessing phase addresses multidimensional
arrays. Both Dalvik and Java use blocks of bytecode
instructions to create multidimensional arrays; however,
the instructions have different semantics and layout. ded
reorders and annotates the bytecode with array size and
type information for translation.
The bytecode translation linearly processes each
Dalvik instruction. First, ded maps each referenced reg-
ister to a Java local variable table index. Second, ded
performs an instruction translation for each encountered
Dalvik instruction. As Dalvik bytecode is more compact
and takes more arguments, one Dalvik instruction fre-
quently expands to multiple Java instructions. Third, ded
patches the relative offsets used for branches based on
preprocessing annotations. Finally, ded de?nes excep-
tion tables that describe try/catch/finally blocks.
The resulting translated code is combined with the con-
stant pool to creates a legal Java .class ?le.
The following is an example translation for add-int:
Dalvik Java
add-int d
0
, s
0
, s
1
iload s
0
iload s
1
iadd
istore d
0
where ded creates a Java local variable for each regis-
ter, i.e., d
0
? d
0
, s
0
? s
0
, etc. The translation creates
four Java instructions: two to push the variables onto the
stack, one to add, and one to pop the result.
3.2 Optimization and Decompilation
At this stage, the retargeted .class ?les can be de-
compiled using existing tools, e.g., Fern?ower [1] or
Soot [45]. However, ded’s bytecode translation process
yields unoptimized Java code. For example, Java tools
often optimize out unnecessary assignments to the local
variable table, e.g., unneeded return values. Without op-
timization, decompiled code is complex and frustrates
analysis. Furthermore, artifacts of the retargeting pro-
cess can lead to decompilation errors in some decompil-
ers. The need for bytecode optimization is easily demon-
strated by considering decompiled loops. Most decom-
pilers convert for loops into in?nite loops with break
instructions. While the resulting source code is func-
tionally equivalent to the original, it is signi?cantly more
dif?cult to understand and analyze, especially for nested
loops. Thus, we use Soot as a post-retargeting optimizer.
While Soot is centrally an optimization tool with the abil-
ity to recover source code in most cases, it does not pro-
cess certain legal program idioms (bytecode structures)
generated by ded. In particular, we encountered two
central problems involving, 1) interactions between syn-
chronized blocks and exception handling, and 2) com-
plex control ?ows caused by break statements. While the
Java bytecode generated by ded is legal, the source code
failure rate reported in the following section is almost en-
tirely due to Soot’s inability to extract source code from
these two cases. We will consider other decompilers in
future work, e.g., Jad [4], JD [3], and Fern?ower [1].
3.3 Source Code Recovery Validation
We have performed extensive validation testing of
ded [35]. The included tests recovered the source code
for small, medium and large open source applications
and found no errors in recovery. In most cases the recov-
ered code was virtually indistinguishable from the origi-
nal source (modulo comments and method local-variable
names, which are not included in the bytecode).
Table 1: Studied Applications (from Android Market)
Total Retargeted Decompiled
Category Classes Classes Classes LOC
Comics 5627 99.54% 94.72% 415625
Communication 23000 99.12% 92.32% 1832514
Demo 8012 99.90% 94.75% 830471
Entertainment 10300 99.64% 95.39% 709915
Finance 18375 99.34% 94.29% 1556392
Games (Arcade) 8508 99.27% 93.16% 766045
Games (Puzzle) 9809 99.38% 94.58% 727642
Games (Casino) 10754 99.39% 93.38% 985423
Games (Casual) 8047 99.33% 93.69% 681429
Health 11438 99.55% 94.69% 847511
Lifestyle 9548 99.69% 95.30% 778446
Multimedia 15539 99.20% 93.46% 1323805
News/Weather 14297 99.41% 94.52% 1123674
Productivity 14751 99.25% 94.87% 1443600
Reference 10596 99.69% 94.87% 887794
Shopping 15771 99.64% 96.25% 1371351
Social 23188 99.57% 95.23% 2048177
Libraries 2748 99.45% 94.18% 182655
Sports 8509 99.49% 94.44% 651881
Themes 4806 99.04% 93.30% 310203
Tools 9696 99.28% 95.29% 839866
Travel 18791 99.30% 94.47% 1419783
Total 262110 99.41% 94.41% 21734202
We also used ded to recover the source code for the
top 50 free applications (as listed by the Android Market)
from each of the 22 application categories—1,100 in to-
tal. The application images were obtained from the mar-
ket using a custom retrieval tool on September 1, 2010.
Table 1 lists decompilation statistics. The decompilation
of all 1,100 applications took 497.7 hours (about 20.7
days) of compute time. Soot dominated the processing
time: 99.97% of the total time was devoted to Soot opti-
mization and decompilation. The decompilation process
was able to recover over 247 thousand classes spread
over 21.7 million lines of code. This represents about
94% of the total classes in the applications. All decom-
pilation errors are manifest during/after decompilation,
and thus are ignored for the study reported in the latter
sections. There are two categories of failures:
Retargeting Failures. 0.59% of classes were not retar-
geted. These errors fall into three classes: a) unresolved
references which prevent optimization by Soot, b) type
violations caused by Android’s dex compiler and c) ex-
tremely rare cases in which ded produces illegal byte-
code. Recent efforts have focused on improving opti-
mization, as well as redesigning ded with a formally de-
?ned type inference apparatus. Parallel work on improv-
ing ded has been able to reduce these errors by a third,
and we expect further improvements in the near future.
Decompilation Failures. 5% of the classes were suc-
cessfully retargeted, but Soot failed to recover the source
code. Here we are limited by the state of the art in de-
compilation. In order to understand the impact of de-
compiling ded retargeted classes verses ordinary Java
.class ?les, we performed a parallel study to evaluate
Soot on Java applications generated with traditional Java
compilers. Of 31,553 classes from a variety of packages,
Soot was able to decompile 94.59%, indicating we can-
not do better while using Soot for decompilation.
A possible way to improve this is to use a different de-
compiler. Since our study, Fern?ower [1] was available
for a short period as part of a beta test. We decompiled
the same 1,100 optimized applications using Fern?ower
and had a recovery rate of 98.04% of the 1.65 million
retargeted methods–a signi?cant improvement. Future
studies will investigate the ?delity of Fern?ower’s output
and its appropriateness as input for program analysis.
4 Evaluating Android Security
Our Android application study consisted of a broad range
of tests focused on three kinds of analysis: a) exploring
issues uncovered in previous studies and malware advi-
sories, b) searching for general coding security failures,
and c) exploring misuse/security failures in the use of
Android framework. The following discusses the pro-
cess of identifying and encoding the tests.
4.1 Analysis Speci?cation
We used four approaches to evaluate recovered source
code: control ?ow analysis, data ?ow analysis, struc-
tural analysis, and semantic analysis. Unless otherwise
speci?ed, all tests used the Fortify SCA [2] static anal-
ysis suite, which provides these four types of analysis.
The following discusses the general application of these
approaches. The details for our analysis speci?cations
can be found in the technical report [15].
Control ?ow analysis. Control ?ow analysis imposes
constraints on the sequences of actions executed by an
input program P, classifying some of them as errors. Es-
sentially, a control ?ow rule is an automaton A whose
input words are sequences of actions of P—i.e., the rule
monitors executions of P. An erroneous action sequence
is one that drives A into a prede?ned error state. To stat-
ically detect violations speci?ed by A, the program anal-
ysis traces each control ?ow path in the tool’s model of
P, synchronously “executing” A on the actions executed
along this path. Since not all control ?ow paths in the
model are feasible in concrete executions of P, false pos-
itives are possible. False negatives are also possible in
principle, though uncommon in practice. Figure 4 shows
an example automaton for sending intents. Here, the er-
ror state is reached if the intent contains data and is sent
unprotected without specifying the target component, re-
sulting in a potential unintended information leakage.
init
p1
p2
p3
p4
p5
p6
p1 = i.$new_class(...)
p2 = i.$new(...) |
i.$new_action(...)
p3 = i.$set_class(...) |
i.$set_component(...)
p4 = i.$put_extra(...)
p5 = i.$set_class(...) |
i.$set_component(...)
p6 = $unprotected_send(i) |
$protected_send(i, null)
targeted error
empty has_data
Figure 4: Example control ?ow speci?cation
Data ?ow analysis. Data ?ow analysis permits the
declarative speci?cation of problematic data ?ows in the
input program. For example, an Android phone contains
several pieces of private information that should never
leave the phone: the user’s phone number, IMEI (device
ID), IMSI (subscriber ID), and ICC-ID (SIM card serial
number). In our study, we wanted to check that this infor-
mation is not leaked to the network. While this property
can in principle be coded using automata, data ?ow spec-
i?cation allows for a much easier encoding. The speci?-
cation declaratively labels program statements matching
certain syntactic patterns as data ?ow sources and sinks.
Data ?ows between the sources and sinks are violations.
Structural analysis. Structural analysis allows for
declarative pattern matching on the abstract syntax of
the input source code. Structural analysis speci?cations
are not concerned with program executions or data ?ow,
therefore, analysis is local and straightforward. For ex-
ample, in our study, we wanted to specify a bug pattern
where an Android application mines the device ID of the
phone on which it runs. This pattern was de?ned using
a structural rule that stated that the input program called
a method getDeviceId() whose enclosing class was an-
droid.telephony.TelephonyManager.
Semantic analysis. Semantic analysis allows the speci?-
cation of a limited set of constraints on the values used by
the input program. For example, a property of interest in
our study was that an Android application does not send
SMS messages to hard-coded targets. To express this
property, we de?ned a pattern matching calls to Android
messaging methods such as sendTextMessage(). Seman-
tic speci?cations permit us to directly specify that the
?rst parameter in these calls (the phone number) is not
a constant. The analyzer detects violations to this prop-
erty using constant propagation techniques well known
in program analysis literature.
4.2 Analysis Overview
Our analysis covers both dangerous functionality and
vulnerabilities. Selecting the properties for study was a
signi?cant challenge. For brevity, we only provide an
overview of the speci?cations. The technical report [15]
provides a detailed discussion of speci?cations.
Misuse of Phone Identi?ers (Section 5.1.1). Previous
studies [14, 12] identi?ed phone identi?ers leaking to re-
mote network servers. We seek to identify not only the
existence of data ?ows, but understand why they occur.
Exposure of Physical Location (Section 5.1.2). Previous
studies [14] identi?ed location exposure to advertisement
servers. Many applications provide valuable location-
aware utility, which may be desired by the user. By man-
ually inspecting code, we seek to identify the portion of
the application responsible for the exposure.
Abuse of Telephony Services (Section 5.2.1). Smart-
phone malware has sent SMS messages to premium-rate
numbers. We study the use of hard-coded phone num-
bers to identify SMS and voice call abuse.
Eavesdropping on Audio/Video (Section 5.2.2). Audio
and video eavesdropping is a commonly discussed smart-
phone threat [41]. We examine cases where applications
record audio or video without control ?ows to UI code.
Botnet Characteristics (Sockets) (Section 5.2.3). PC
botnet clients historically use non-HTTP ports and pro-
tocols for command and control. Most applications use
HTTP client wrappers for network connections, there-
fore, we examine Socket use for suspicious behavior.
Harvesting Installed Applications (Section 5.2.4). The
list of installed applications is a valuable demographic
for marketing. We survey the use of APIs to retrieve this
list to identify harvesting of installed applications.
Use of Advertisement Libraries (Section 5.3.1). Pre-
vious studies [14, 12] identi?ed information exposure to
ad and analytics networks. We survey inclusion of ad and
analytics libraries and the information they access.
Dangerous Developer Libraries (Section 5.3.2). During
our manual source code inspection, we observed danger-
ous functionality replicated between applications. We re-
port on this replication and the implications.
Android-speci?c Vulnerabilities (Section 5.4). We
search for non-secure coding practices [17, 10], includ-
ing: writing sensitive information to logs, unprotected
broadcasts of information, IPC null checks, injection at-
tacks on intent actions, and delegation.
General Java Application Vulnerabilities. We look for
general Java application vulnerabilities, including mis-
use of passwords, misuse of cryptography, and tradi-
tional injection vulnerabilities. Due to space limitations,
individual results for the general vulnerability analysis
are reported in the technical report [15].
5 Application Analysis Results
In this section, we document the program analysis results
and manual inspection of identi?ed violations.
Table 2: Access of Phone Identi?er APIs
Identi?er # Calls # Apps # w/ Permission
?
Phone Number 167 129 105
IMEI 378 216 184
†
IMSI 38 30 27
ICC-ID 33 21 21
Total Unique - 246 210
†
?
De?ned as having the READ_PHONE_STATE permission.
†
Only 1 app did not also have the INTERNET permission.
5.1 Information Misuse
In this section, we explore how sensitive information is
being leaked [12, 14] through information sinks includ-
ing OutputStream objects retrieved from URLConnec-
tions, HTTP GET and POST parameters in HttpClient
connections, and the string used for URL objects. Future
work may also include SMS as a sink.
5.1.1 Phone Identi?ers
We studied four phone identi?ers: phone number, IMEI
(device identi?er), IMSI (subscriber identi?er), and ICC-
ID (SIM card serial number). We performed two types of
analysis: a) we scanned for APIs that access identi?ers,
and b) we used data ?ow analysis to identify code capa-
ble of sending the identi?ers to the network.
Table 2 summarizes APIs calls that receive phone
identi?ers. In total, 246 applications (22.4%) included
code to obtain a phone identi?er; however, only 210 of
these applications have the READ_PHONE_STATE permis-
sion required to obtain access. Section 5.3 discusses code
that probes for permissions. We observe from Table 2
that applications most frequently access the IMEI (216
applications, 19.6%). The phone number is used second
most (129 applications, 11.7%). Finally, the IMSI and
ICC-ID are very rarely used (less than 3%).
Table 3 indicates the data ?ows that ex?ltrate phone
identi?ers. The 33 applications have the INTERNET
permission, but 1 application does not have the READ_
PHONE_STATE permission. We found data ?ows for all
four identi?er types: 25 applications have IMEI data
?ows; 10 applications have phone number data ?ows;
5 applications have IMSI data ?ows; and 4 applications
have ICC-ID data ?ows.
To gain a better understanding of how phone identi-
?ers are used, we manually inspected all 33 identi?ed ap-
plications, as well as several additional applications that
contain calls to identi?er APIs. We con?rmed ex?ltration
for all but one application. In this case, code complexity
hindered manual con?rmation; however we identi?ed a
different data ?ow not found by program analysis. The
analysis informs the following ?ndings.
Finding 1 - Phone identi?ers are frequently leaked
through plaintext requests. Most sinks are HTTP
GET or POST parameters. HTTP parameter names
Table 3: Detected Data Flows to Network Sinks
Phone Identi?ers Location Info.
Sink # Flows # Apps # Flows # Apps
OutputStream 10 9 0 0
HttpClient Param 24 9 12 4
URL Object 59 19 49 10
Total Unique - 33 - 13
for the IMEI include: “uid,” “user-id,” “imei,” “devi-
ceId,” “deviceSerialNumber,” “devicePrint,” “X-DSN,”
and “uniquely code”; phone number names include
“phone” and “mdn”; and IMSI names include “did” and
“imsi.” In one case we identi?ed an HTTP parameter for
the ICC-ID, but the developer mislabeled it “imei.”
Finding 2 - Phone identi?ers are used as device ?n-
gerprints. Several data ?ows directed us towards code
that reports not only phone identi?ers, but also other
phone properties to a remote server. For example, a wall-
paper application (com.eoeandroid.eWallpapers.cartoon)
contains a class named SyncDeviceInfosService that col-
lects the IMEI and attributes such as the OS ver-
sion and device hardware. The method sendDevice-
Infos() sends this information to a server. In an-
other application (com.avantar.wny), the method Phon-
eStats.toUrlFormatedString() creates a URL parameter
string containing the IMEI, device model, platform, and
application name. While the intent is not clear, such ?n-
gerprinting indicates that phone identi?ers are used for
more than a unique identi?er.
Finding 3 - Phone identi?ers, speci?cally the IMEI,
are used to track individual users. Several
applications contain code that binds the IMEI as
a unique identi?er to network requests. For ex-
ample, some applications (e.g. com.Qunar and
com.nextmobileweb.craigsphone) appear to bundle the
IMEI in search queries; in a travel application
(com.visualit.tubeLondonCity), the method refreshLive-
Info() includes the IMEI in a URL; and a “keyring” appli-
cation (com.froogloid.kring.google.zxing.client.android)
appends the IMEI to a variable named retailer-
LookupCmd. We also found functionality that in-
cludes the IMEI when checking for updates (e.g.,
com.webascender.callerid, which also includes the
phone number) and retrieving advertisements (see Find-
ing 6). Furthermore, we found two applications
(com.taobo.tao and raker.duobao.store) with network ac-
cess wrapper methods that include the IMEI for all con-
nections. These behaviors indicate that the IMEI is used
as a form of “tracking cookie”.
Finding 4 - The IMEI is tied to personally identi?-
able information (PII). The common belief that the
IMEI to phone owner mapping is not visible outside
the cellular network is no longer true. In several
cases, we found code that bound the IMEI to account
information and other PII. For example, applications
(e.g. com.slacker.radio and com.statefarm.pocketagent)
include the IMEI in account registration and login re-
quests. In another application (com.amazon.mp3), the
method linkDevice() includes the IMEI. Code inspec-
tion indicated that this method is called when the user
chooses to “Enter a claim code” to redeem gift cards.
We also found IMEI use in code for sending comments
and reporting problems (e.g., com.morbe.guarder and
com.fm207.discount). Finally, we found one application
(com.andoop.highscore) that appears to bundle the IMEI
when submitting high scores for games. Thus, it seems
clear that databases containing mappings between phys-
ical users and IMEIs are being created.
Finding 5 - Not all phone identi?er use leads to ex?ltra-
tion. Several applications that access phone identi?ers
did not ex?ltrate the values. For example, one applica-
tion (com.amazon.kindle) creates a device ?ngerprint for
a veri?cation check. The ?ngerprint is kept in “secure
storage” and does not appear to leave the phone. An-
other application (com.match.android.matchmobile) as-
signs the phone number to a text ?eld used for account
registration. While the value is sent to the network dur-
ing registration, the user can easily change or remove it.
Finding 6 - Phone identi?ers are sent to advertise-
ment and analytics servers. Many applications have
custom ad and analytics functionality. For example,
in one application (com.accuweather.android), the class
ACCUWX AdRequest is an IMEI data ?owsink. Another
application (com.amazon.mp3) de?nes Android service
component AndroidMetricsManager, which is an IMEI
data ?ow sink. Phone identi?er data ?ows also occur
in ad libraries. For example, we found a phone num-
ber data ?ow sink in the com/wooboo/adlib_android
library used by several applications (e.g., cn.ecook,
com.superdroid.sqd, and com.superdroid.ewc). Sec-
tion 5.3 discusses ad libraries in more detail.
5.1.2 Location Information
Location information is accessed in two ways: (1) calling
getLastKnownLocation(), and (2) de?ning callbacks in
a LocationListener object passed to requestLocationUp-
dates(). Due to code recovery failures, not all Location-
Listener objects have corresponding requestLocationUp-
dates() calls. We scanned for all three constructs.
Table 4 summarizes the access of location informa-
tion. In total, 505 applications (45.9%) attempt to access
location, only 304 (27.6%) have the permission to do so.
This difference is likely due to libraries that probe for
permissions, as discussed in Section 5.3. The separa-
tion between LocationListener and requestLocationUp-
dates() is primarily due to the AdMob library, which de-
?ned the former but has no calls to the latter.
Table 4: Access of Location APIs
Identi?er # Uses # Apps # w/ Perm.
?
getLastKnownLocation 428 204 148
LocationListener 652 469 282
requestLocationUpdates 316 146 128
Total Unique - 505 304
†
?
De?ned as having a LOCATION permission.
†
In total, 5 apps did not also have the INTERNET permission.
Table 3 shows detected location data ?ows to the net-
work. To overcome missing code challenges, the data
?ow source was de?ned as the getLatitude() and getLon-
gitude() methods of the Location object retrieved from
the location APIs. We manually inspected the 13 appli-
cations with location data ?ows. Many data ?ows ap-
peared to re?ect legitimate uses of location for weather,
classi?eds, points of interest, and social networking ser-
vices. Inspection of the remaining applications informs
the following ?ndings:
Finding 7 - The granularity of location reporting may
not always be obvious to the user. In one applica-
tion (com.andoop.highscore) both the city/country and
geographic coordinates are sent along with high scores.
Users may be aware of regional geographic information
associated with scores, but it was unclear if users are
aware that precise coordinates are also used.
Finding 8 - Location information is sent to advertise-
ment servers. Several location data ?ows appeared to
terminate in network connections used to retrieve ads.
For example, two applications (com.avantar.wny and
com.avantar.yp) appended the location to the variable
webAdURLString. Motivated by [14], we inspected the
AdMob library to determine why no data ?ow was found
and determined that source code recovery failures led to
the false negatives. Section 5.3 expands on ad libraries.
5.2 Phone Misuse
This section explores misuse of the smartphone inter-
faces, including telephony services, background record-
ing of audio and video, sockets, and accessing the list of
installed applications.
5.2.1 Telephony Services
Smartphone malware can provide direct compensation
using phone calls or SMS messages to premium-rate
numbers [18, 25]. We de?ned three queries to identify
such malicious behavior: (1) a constant used for the SMS
destination number; (2) creation of URI objects with a
“tel:” pre?x (used for phone call intent messages) and
the string “900” (a premium-rate number pre?x in the
US); and (3) any URI objects with a “tel:” pre?x. The
analysis informs the following ?ndings.
Finding 9 - Applications do not appear to be using ?xed
phone number services. We found zero applications us-
ing a constant destination number for the SMS API.
Note that our analysis speci?cation is limited to constants
passed directly to the API and ?nal variables, and there-
fore may have false negatives. We found two applica-
tions creating URI objects with the “tel:” pre?x and
containing the string “900”. One application included
code to call “tel://0900-9292”, which is a premium-
rate number (e0.70 per minute) for travel advice in the
Netherlands. However, this did not appear malicious, as
the application (com.Planner9292) is designed to provide
travel advice. The other application contained several
hard-coded numbers with “900” in the last four digits
of the number. The SMS and premium-rate analysis re-
sults are promising indicators for non-existence of ma-
licious behavior. Future analysis should consider more
premium-rate pre?xes.
Finding 10 - Applications do not appear to be misus-
ing voice services. We found 468 URI objects with
the “tel:” pre?x in 358 applications. We manually
inspected a sample of applications to better understand
phone number use. We found: (1) applications fre-
quently include call functionality for customer service;
(2) the “CALL” and “DIAL” intent actions were used
equally for the same purpose (CALL calls immediately
and requires the CALL_PHONE permission, whereas DIAL
has user con?rmation the dialer and requires no permis-
sion); and (3) not all hard-coded telephone numbers are
used to make phone calls, e.g., the AdMob library had a
apparently unused phone number hard coded.
5.2.2 Background Audio/Video
Microphone and camera eavesdropping on smartphones
is a real concern [41]. We analyzed application eaves-
dropping behaviors, speci?cally: (1) recording video
without calling setPreviewDisplay() (this API is always
required for still image capture); (2) AudioRecord.read()
in code not reachable from an Android activity compo-
nent; and (3) MediaRecorder.start() in code not reach-
able from an activity component.
Finding 11 - Applications do not appear to be misusing
video recording. We found no applications that record
video without calling setPreviewDisplay(). The query
reasonably did not consider the value passed to the pre-
view display, and therefore may create false negatives.
For example, the “preview display” might be one pixel
in size. The MediaRecorder.start() query detects audio
recording, but it also detects video recording. This query
found two applications using video in code not reachable
from an activity; however the classes extended Surface-
View, which is used by setPreviewDisplay().
Finding 12 - Applications do not appear to be misus-
ing audio recording. We found eight uses in seven ap-
plications of AudioRecord.read() without a control ?ow
path to an activity component. Of these applications,
three provide VoIP functionality, two are games that re-
peat what the user says, and one provides voice search.
In these applications, audio recording is expected; the
lack of reachability was likely due to code recovery fail-
ures. The remaining application did not have the required
RECORD_AUDIO permission and the code most likely was
part of a developer toolkit. The MediaRecorder.start()
query identi?ed an additional ?ve applications recording
audio without reachability to an activity. Three of these
applications have legitimate reasons to record audio:
voice search, game interaction, and VoIP. Finally, two
games included audio recording in a developer toolkit,
but no record permission, which explains the lack of
reachability. Section 5.3.2 discusses developer toolkits.
5.2.3 Socket API Use
Java sockets represent an open interface to external ser-
vices, and thus are a potential source of malicious be-
havior. For example, smartphone-based botnets have
been found to exist on “jailbroken” iPhones [8]. We ob-
serve that most Internet-based smartphone applications
are HTTP clients. Android includes useful classes (e.g.,
HttpURLConnection and HttpClient) for communicating
with Web servers. Therefore, we queried for applications
that make network connections using the Socket class.
Finding 13 - A small number of applications include
code that uses the Socket class directly. We found
177 Socket connections in 75 applications (6.8%). Many
applications are ?agged for inclusion of well-known
network libraries such as org/apache/thrift, org/
apache/commons, and org/eclipse/jetty, which
use sockets directly. Socket factories were also detected.
Identi?ed factory names such as TrustAllSSLSocket-
Factory, AllTrustSSLSocketFactory, and NonValidat-
ingSSLSocketFactory are interesting as potential vulnera-
bilities, but we found no evidence of malicious use. Sev-
eral applications also included their own HTTP wrapper
methods that duplicate functionality in the Android li-
braries, but did not appear malicious. Among the appli-
cations including custom network connection wrappers
is a group of applications in the “Finance” category im-
plementing cryptographic network protocols (e.g., in the
com/lumensoft/ks library). We note that these appli-
cations use Asian character sets for their market descrip-
tions, and we could not determine their exact purpose.
Finding 14 - We found no evidence of malicious behav-
ior by applications using Socket directly. We manu-
ally inspected all 75 applications to determine if Socket
use seemed appropriate based on the application descrip-
tion. Our survey yielded a diverse array of Socket uses,
including: ?le transfer protocols, chat protocols, au-
dio and video streaming, and network connection tether-
ing, among other uses excluded for brevity. A handful
of applications have socket connections to hard-coded
IP address and non-standard ports. For example, one
application (com.eingrad.vintagecomicdroid) downloads
comics from 208.94.242.218 on port 2009. Addition-
ally, two of the aforementioned ?nancial applications
(com.miraeasset.mstock and kvp.jjy.MispAndroid320)
include the kr/co/shiftworks library that connects to
221.143.48.118 on port 9001. Furthermore, one applica-
tion (com.tf1.lci) connects to 209.85.227.147 on port 80
in a class named AdService and subsequently calls getLo-
calAddress() to retrieve the phone’s IP address. Overall,
we found no evidence of malicious behavior, but several
applications warrant deeper investigation.
5.2.4 Installed Applications
The list of installed applications provides valuable mar-
keting data. Android has two relevant APIs types: (1)
a set of get APIs returning the list of installed applica-
tions or package names; and (2) a set of query APIs that
mirrors Android’s runtime intent resolution, but can be
made generic. We found 54 uses of the get APIs in 45
applications, and 1015 uses of the query APIs in 361 ap-
plications. Sampling these applications, we observe:
Finding 15 - Applications do not appear to be har-
vesting information about which applications are in-
stalled on the phone. In all but two cases,
the sampled applications using the get APIs search
the results for a speci?c application. One applica-
tion (com.davidgoemans.simpleClockWidget) de?nes a
method that returns the list of all installed applications,
but the results were only displayed to the user. The
second application (raker.duobao.store) de?nes a simi-
lar method, but it only appears to be called by unused
debugging code. Our survey of the query APIs identi-
?ed three calls within the AdMob library duplicated in
many applications. These uses queried speci?c function-
ality and thus are not likely to harvest application infor-
mation. The one non-AdMob application we inspected
queried for speci?c functionality, e.g., speech recogni-
tion, and thus did not appear to attempt harvesting.
5.3 Included Libraries
Libraries included by applications are often easy to iden-
tify due to namespace conventions: i.e., the source
code for com.foo.appname typically exists in com/foo/
appname. During our manual inspection, we docu-
mented advertisement and analytics library paths. We
also found applications sharing what we term “developer
toolkits,” i.e., a common set of developer utilities.
5.3.1 Advertisement and Analytics Libraries
We identi?ed 22 library paths containing ad or analytics
functionality. Sampled applications frequently contained
Table 5: Identi?ed Ad and Analytics Library Paths
Library Path # Apps Format Obtains
?
com/admob/android/ads 320 Obf. L
com/google/ads 206 Plain -
com/?urry/android 98 Obf. -
com/qwapi/adclient/android 74 Plain L, P, E
com/google/android/apps/analytics 67 Plain -
com/adwhirl 60 Plain L
com/mobclix/android/sdk 58 Plain L, E
‡
com/millennialmedia/android 52 Plain -
com/zestadz/android 10 Plain -
com/admarvel/android/ads 8 Plain -
com/estsoft/adlocal 8 Plain L
com/adfonic/android 5 Obf. -
com/vdroid/ads 5 Obf. L, E
com/greystripe/android/sdk 4 Obf. E
com/medialets 4 Obf. L
com/wooboo/adlib android 4 Obf. L, P, I
†
com/adserver/adview 3 Obf. L
com/tapjoy 3 Plain -
com/inmobi/androidsdk 2 Plain E
‡
com/apegroup/ad 1 Plain -
com/casee/adsdk 1 Plain S
com/webtrends/mobile 1 Plain L, E, S, I
Total Unique Apps 561 - -
?
L = Location; P = Phone number; E = IMEI; S = IMSI; I = ICC-ID
†
In 1 app, the library included “L”, while the other 3 included “P, I”.
‡
Direct API use not decompiled, but wrapper .getDeviceId() called.
multiple of these libraries. Using the paths listed in Ta-
ble 5, we found: 1 app has 8 libraries; 10 apps have 7 li-
braries; 8 apps have 6 libraries; 15 apps have 5 libraries;
37 apps have 4 libraries; 32 apps have 3 libraries; 91 apps
have 2 libraries; and 367 apps have 1 library.
Table 5 shows advertisement and analytics library use.
In total, at least 561 applications (51%) include these
libraries; however, additional libraries may exist, and
some applications include custom ad and analytics func-
tionality. The AdMob library is used most pervasively,
existing in 320 applications (29.1%). Google Ads is used
by 206 applications (18.7%). We observe from Table 5
that only a handful of libraries are used pervasively.
Several libraries access phone identi?er and location
APIs. Given the library purpose, it is easy to specu-
late data ?ows to network APIs. However, many of
these ?ows were not detected by program analysis. This
is (likely) a result of code recovery failures and ?ows
through Android IPC. For example, AdMob has known
location to network data ?ows [14], and we identi?ed
a code recovery failure for the class implementing that
functionality. Several libraries are also obfuscated, as
mentioned in Section 6. Interesting, 6 of the 13 li-
braries accessing sensitive information are obfuscated.
The analysis informs the following additional ?ndings.
Finding 16 - Ad and analytics library use of phone iden-
ti?ers and location is sometimes con?gurable. The
com/webtrends/mobile analytics library (used by
com.statefarm.pocketagent), de?nes the WebtrendsId-
Method class specifying four identi?er types. Only one
type, “system id extended” uses phone identi?ers (IMEI,
IMSI, and ICC-ID). It is unclear which identi?er type
was used by the application. Other libraries provide sim-
ilar con?guration. For example, the AdMob SDK docu-
mentation [6] indicates that location information is only
included if a package manifest con?guration enables it.
Finding 17 - Analytics library reporting frequency is of-
ten con?gurable. During manual inspection, we encoun-
tered one application (com.handmark.mpp.news.reuters)
in which the phone number is passed to FlurryA-
gent.onEvent() as generic data. This method is called
throughout the application, specifying event labels such
as “GetMoreStories,” “StoryClickedFromList,” and “Im-
ageZoom.” Here, we observe the main application code
not only speci?es the phone number to be reported, but
also report frequency.
Finding 18 - Ad and analytics libraries probe for permis-
sions. The com/webtrends/mobile library accesses
the IMEI, IMSI, ICC-ID, and location. The (Webtrend-
sAndroidValueFetcher) class uses try/catch blocks that
catch the SecurityException that is thrown when an appli-
cation does not have the proper permission. Similar func-
tionality exists in the com/casee/adsdk library (used
by com.?sh.luny). In AdFetcher.getDeviceId(), An-
droid’s checkCallingOrSelfPermission() method is eval-
uated before accessing the IMSI.
5.3.2 Developer Toolkits
Several inspected applications use developer toolkits
containing common sets of utilities identi?able by class
name or library path. We observe the following.
Finding 19 - Some developer toolkits replicate dan-
gerous functionality. We found three wallpaper
applications by developer “callmejack” that include
utilities in the library path com/jackeeywu/apps/
eWallpaper (com.eoeandroid.eWallpapers.cartoon,
com.jackeey.wallpapers.all1.orange, and com.jackeey.
eWallpapers.gundam). This library has data ?ow sinks
for the phone number, IMEI, IMSI, and ICC-ID. In July
2010, Lookout, Inc. reported a wallpaper application
by developer “jackeey,wallpaper” as sending these
identi?ers to imnet.us [29]. This report also indicated
that the developer changed his name to “callmejack”.
While the original “jackeey,wallpaper” application was
removed from the Android Market, the applications by
“callmejack” remained as of September 2010.
3
Finding 20 - Some developer toolkits probe for permis-
sions. In one application (com.july.cbssports.activity),
we found code in the com/julysystems library that
evaluates Android’s checkPermission() method for the
READ_PHONE_STATE and ACCESS_FINE_LOCATION per-
missions before accessing the IMEI, phone number, and
last known location, respectively. A second application
(v00032.com.wordplayer) de?nes the CustomException-
Hander class to send an exception event to an HTTP
URL. The class attempts to retrieve the phone num-
ber within a try/catch block, catching a generic Ex-
ception. However, the application does not have the
READ_PHONE_STATE permission, indicating the class is
likely used in multiple applications.
Finding 21 - Well-known brands sometimes commis-
sion developers that include dangerous functional-
ity. The com/julysystems developer toolkit iden-
ti?ed as probing for permissions exists in two appli-
cations with reputable application providers. “CBS
Sports Pro Football” (com.july.cbssports.activity) is pro-
vided by “CBS Interactive, Inc.”, and “Univision F¨ utbol”
(com.july.univision) is provided by “Univision Interac-
tive Media, Inc.”. Both have location and phone state
permissions, and hence potentially misuse information.
Similarly, “USA TODAY” (com.usatoday.android.
news) provided by “USA TODAY” and “FOX News”
(com.foxnews.android) provided by “FOX News Net-
work, LLC” contain the com/mercuryintermedia
toolkit. Both applications contain an Android ac-
tivity component named MainActivity. In the ini-
tialization phase, the IMEI is retrieved and passed
to ProductCon?guration.initialize() (part of the com/
mecuryintermedia toolkit). Both applications have
IMEI to network data ?ows through this method.
5.4 Android-speci?c Vulnerabilities
This section explores Android-speci?c vulnerabilities.
The technical report [15] provides speci?cation details.
5.4.1 Leaking Information to Logs
Android provides centralized logging via the Log API,
which can displayed with the “logcat” command.
While logcat is a debugging tool, applications with the
READ_LOGS permission can read these log messages. The
Android documentation for this permission indicates that
“[the logs] can contain slightly private information about
what is happening on the device, but should never con-
tain the user’s private information.” We looked for data
?ows from phone identi?er and location APIs to the An-
droid logging interface and found the following.
Finding 22 - Private information is written to Android’s
general logging interface. We found 253 data ?ows in 96
applications for location information, and 123 ?ows in
90 applications for phone identi?ers. Frequently, URLs
containing this private information are logged just before
a network connection is made. Thus, the READ_LOGS
permission allows access to private information.
5.4.2 Leaking Information via IPC
Shown in Figure 5, any application can receive intent
broadcasts that do not specify the target component or
Partially Speci?ed Intent Message
- Action: "pkgname.intent.ACTION"
Fully Speci?ed Intent Message
- Action: "pkgname.intent.ACTION"
- Component: "pkgname.FooReceiver"
malicous.BarReceiver
- Filter: "pkgname.intent.ACTION"
pkgname.FooReceiver
- Filter: "pkgname.intent.ACTION"
Application: pkgname Application: malicous
Figure 5: Eavesdropping on unprotected intents
protect the broadcast with a permission (permission vari-
ant not shown). This is unsafe if the intent contains sensi-
tive information. We found 271 such unsafe intent broad-
casts with “extras” data in 92 applications (8.4%). Sam-
pling these applications, we found several such intents
used to install shortcuts to the home screen.
Finding 23 - Applications broadcast private informa-
tion in IPC accessible to all applications. We found
many cases of applications sending unsafe intents to
action strings containing the application’s namespace
(e.g., “pkgname.intent.ACTION” for application pkg-
name). The contents of the bundled information var-
ied. In some instances, the data was not sensitive,
e.g., widget and task identi?ers. However, we also
found sensitive information. For example one applica-
tion (com.ulocate) broadcasts the user’s location to the
“com.ulocate.service.LOCATION” intent action string
without protection. Another application (com.himsn)
broadcasts the instant messaging client’s status to the
“cm.mz.stS” action string. These vulnerabilities allow
malicious applications to eavesdrop on sensitive infor-
mation in IPC, and in some cases, gain access to infor-
mation that requires a permission (e.g., location).
5.4.3 Unprotected Broadcast Receivers
Applications use broadcast receiver components to re-
ceive intent messages. Broadcast receivers de?ne “intent
?lters” to subscribe to speci?c event types are public. If
the receiver is not protected by a permission, a malicious
application can forge messages.
Finding 24 - Few applications are vulnerable to forg-
ing attacks to dynamic broadcast receivers. We found
406 unprotected broadcast receivers in 154 applications
(14%). We found an large number of receivers sub-
scribed to system de?ned intent types. These receivers
are indirectly protected by Android’s “protected broad-
casts” introduced to eliminate forging. We found one
application with an unprotected broadcast receiver for a
custom intent type; however it appears to have limited
impact. Additional sampling may uncover more cases.
5.4.4 Intent Injection Attacks
Intent messages are also used to start activity and service
components. An intent injection attack occurs if the in-
tent address is derived from untrusted input.
We found 10 data ?ows from the network to an in-
tent address in 1 application. We could not con?rm
the data ?ow and classify it a false positive. The data
?ow sink exists in a class named ProgressBroadcasting-
FileInputStream. No decompiled code references this
class, and all data ?ow sources are calls to URLCon-
nection.getInputStream(), which is used to create Input-
StreamReader objects. We believe the false positives re-
sults from the program analysis modeling of classes ex-
tending InputStream.
We found 80 data ?ows from IPC to an intent address
in 37 applications. We classi?ed the data ?ows by the
sink: the Intent constructor is the sink for 13 applica-
tions; setAction() is the sink for 16 applications; and set-
Component() is the sink for 8 applications. These sets
are disjoint. Of the 37 applications, we found that 17
applications set the target component class explicitly (all
except 3 use the setAction() data ?ow sink), e.g., to relay
the action string from a broadcast receiver to a service.
We also found four false positives due to our assumption
that all Intent objects come from IPC (a few exceptions
exist). For the remaining 16 cases, we observe:
Finding 25 - Some applications de?ne intent addresses
based on IPC input. Three applications use IPC input
strings to specify the package and component names for
the setComponent() data ?ow sink. Similarly, one appli-
cation uses the IPC “extras” input to specify an action to
an Intent constructor. Two additional applications start
an activity based on the action string returned as a result
from a previously started activity. However, to exploit
this vulnerability, the applications must ?rst start a ma-
licious activity. In the remaining cases, the action string
used to start a component is copied directly into a new
intent object. A malicious application can exploit this
vulnerability by specifying the vulnerable component’s
name directly and controlling the action string.
5.4.5 Delegating Control
Applications can delegate actions to other applications
using a “pending intent.” An application ?rst creates an
intent message as if it was performing the action. It then
creates a reference to the intent based on the target com-
ponent type (restricting how it can be used). The pend-
ing intent recipient cannot change values, but it can ?ll in
missing ?elds. Therefore, if the intent address is unspec-
i?ed, the remote application can redirect an action that is
performed with the original application’s permissions.
Finding 26 - Few applications unsafely delegate actions.
We found 300 unsafe pending intent objects in 116 appli-
cations (10.5%). Sampling these applications, we found
an overwhelming number of pending intents used for ei-
ther: (1) Android’s UI noti?cation service; (2) Android’s
alarm service; or (3) communicating between a UI wid-
get and the main application. None of these cases allow
manipulation by a malicious application. We found two
applications that send unsafe pending intents via IPC.
However, exploiting these vulnerabilities appears to pro-
vides negligible adversarial advantage. We also note that
more a more sophisticated analysis framework could be
used to eliminate the aforementioned false positives.
5.4.6 Null Checks on IPC Input
Android applications frequently process information
from intent messages received from other applications.
Null dereferences cause an application to crash, and can
thus be used to as a denial of service.
Finding 27 - Applications frequently do not perform null
checks on IPC input. We found 3,925 potential null
dereferences on IPC input in 591 applications (53.7%).
Most occur in classes for activity components (2,484
dereferences in 481 applications). Null dereferences in
activity components have minimal impact, as the appli-
cation crash is obvious to the user. We found 746 poten-
tial null dereferences in 230 applications within classes
de?ning broadcast receiver components. Applications
commonly use broadcast receivers to start background
services, therefore it is unclear what effect a null deref-
erence in a broadcast receiver will have. Finally, we
found 72 potential null dereferences in 36 applications
within classes de?ning service components. Applica-
tions crashes corresponding to these null dereferences
have a higher probability of going unnoticed. The re-
maining potential null dereferences are not easily associ-
ated with a component type.
5.4.7 SDcard Use
Any application that has access to read or write data on
the SDcard can read or write any other application’s data
on the SDcard. We found 657 references to the SDcard in
251 applications (22.8%). Sampling these applications,
we found a few unexpected uses. For example, the com/
tapjoy ad library (used by com.jnj.mocospace.android)
determines the free space available on the SDcard. An-
other application (com.rent) obtains a URL from a ?le
named connRentInfo.dat at the root of the SDcard.
5.4.8 JNI Use
Applications can include functionality in native libraries
using the Java Native Interface (JNI). As these methods
are not written in Java, they have inherent dangers. We
found 2,762 calls to native methods in 69 applications
(6.3%). Investigating the application package ?les, we
found that 71 applications contain .so ?les. This indi-
cates two applications with an .so ?le either do not call
any native methods, or the code calling the native meth-
ods was not decompiled. Across these 71 applications,
we found 95 .so ?les, 82 of which have unique names.
6 Study Limitations
Our study section was limited in three ways: a) the stud-
ied applications were selected with a bias towards popu-
larity; b) the program analysis tool cannot compute data
and control ?ows for IPC between components; and c)
source code recovery failures interrupt data and control
?ows. Missing data and control ?ows may lead to false
negatives. In addition to the recovery failures, the pro-
gram analysis tool could not parse 8,042 classes, reduc-
ing coverage to 91.34% of the classes.
Additionally, a portion of the recovered source code
was obfuscated before distribution. Code obfuscation
signi?cantly impedes manual inspection. It likely exists
to protect intellectual property; Google suggests obfus-
cation using ProGuard (proguard.sf.net) for applica-
tions using its licensing service [23]. ProGuard protects
against readability and does not obfuscate control ?ow.
Therefore it has limited impact on program analysis.
Many forms of obfuscated code are easily recogniz-
able: e.g., class, method, and ?eld names are converted
to single letters, producing single letter Java ?lenames
(e.g., a.java). For a rough estimate on the use of obfus-
cation, we searched applications containing a.java. In
total, 396 of the 1,100 applications contain this ?le. As
discussed in Section 5.3, several advertisement and ana-
lytics libraries are obfuscated. To obtain a closer estimate
of the number of applications whose main code is obfus-
cated, we searched for a.java within a ?le path equiva-
lent to the package name (e.g., com/foo/appname for
com.foo.appname). Only 20 applications (1.8%) have
this obfuscation property, which is expected for free ap-
plications (as opposed to paid applications). However,
we stress that the a.java heuristic is not intended to be
a ?rm characterization of the percentage of obfuscated
code, but rather a means of acquiring insight.
7 What This All Means
Identifying a singular take-away from a broad study such
as this is non-obvious. We come away from the study
with two central thoughts; one having to do with the
study apparatus, and the other regarding the applications.
ded and the program analysis speci?cations are en-
abling technologies that open a new door for application
certi?cation. We found the approach rather effective de-
spite existing limitations. In addition to further studies of
this kind, we see the potential to integrate these tools into
an application certi?cation process. We leave such dis-
cussions for future work, noting that such integration is
challenging for both logistical and technical reasons [30].
On a technical level, we found the security character-
istics of the top 1,100 free popular applications to be con-
sistent with smaller studies (e.g., Enck et al. [14]). Our
?ndings indicate an overwhelming concern for misuse of
privacy sensitive information such as phone identi?ers
and location information. One might speculate this oc-
cur due to the dif?culty in assigning malicious intent.
Arguably more important than identifying the exis-
tence the information misuse, our manual source code
inspection sheds more light on how information is mis-
used. We found phone identi?ers, e.g., phone number,
IMEI, IMSI, and ICC-ID, were used for everything from
“cookie-esque” tracking to account numbers. Our ?nd-
ings also support the existence of databases external to
cellular providers that link identi?ers such as the IMEI
to personally identi?able information.
Our analysis also identi?ed signi?cant penetration of
ad and analytic libraries, occurring in 51% of the studied
applications. While this might not be surprising for free
applications, the number of ad and analytics libraries in-
cluded per application was unexpected. One application
included as many as eight different libraries. It is unclear
why an application needs more than one advertisement
and one analytics library.
From a vulnerability perspective, we found that many
developers fail to take necessary security precautions.
For example, sensitive information is frequently writ-
ten to Android’s centralized logs, as well as occasionally
broadcast to unprotected IPC. We also identi?ed the po-
tential for IPC injection attacks; however, no cases were
readily exploitable.
Finally, our study only characterized one edge of the
application space. While we found no evidence of tele-
phony misuse, background recording of audio or video,
or abusive network connections, one might argue that
such malicious functionality is less likely to occur in
popular applications. We focused our study on popular
applications to characterize those most frequently used.
Future studies should take samples that span application
popularity. However, even these samples may miss the
existence of truly malicious applications. Future studies
should also consider several additional attacks, including
installing new applications [43], JNI execution [34], ad-
dress book ex?ltration, destruction of SDcard contents,
and phishing [20].
8 Related Work
Many tools and techniques have been designed to iden-
tify security concerns in software. Software written in
C is particularly susceptible to programming errors that
result in vulnerabilities. Ashcraft and Engler [7] use
compiler extensions to identify errors in range checks.
MOPS [11] uses model checking to scale to large
amounts of source code [42]. Java applications are in-
herently safer than C applications and avoid simple vul-
nerabilities such as buffer over?ows. Ware and Fox [46]
compare eight different open source and commercially
available Java source code analysis tools, ?nding that
no one tool detects all vulnerabilities. Hovemeyer and
Pugh [22] study six popular Java applications and li-
braries using FindBugs extended with additional checks.
While analysis included non-security bugs, the results
motivate a strong need for automated analysis by all de-
velopers. Livshits and Lam [28] focus on Java-based
Web applications. In the Web server environment, inputs
are easily controlled by an adversary, and left unchecked
can lead to SQL injection, cross-site scripting, HTTP re-
sponse splitting, path traversal, and command injection.
Felmetsger et al. [19] also study Java-based web applica-
tions; they advance vulnerability analysis by providing
automatic detection of application-speci?c logic errors.
Spyware and privacy breaching software have also
been studied. Kirda et al. [26] consider behavioral prop-
erties of BHOs and toolbars. Egele et al. [13] target
information leaks by browser-based spyware explicitly
using dynamic taint analysis. Panaorama [47] consid-
ers privacy-breaching malware in general using whole-
system, ?ne-grained taint tracking. Privacy Oracle [24]
uses differential black box fuzz testing to ?nd privacy
leaks in applications.
On smartphones, TaintDroid [14] uses system-wide
dynamic taint tracking to identify privacy leaks in An-
droid applications. By using static analysis, we were able
to study a far greater number of applications (1,100 vs.
30). However, TaintDroid’s analysis con?rms the ex?l-
tration of information, while our static analysis only con-
?rms the potential for it. Kirin [16] also uses static anal-
ysis, but focuses on permissions and other application
con?guration data, whereas our study analyzes source
code. Finally, PiOS [12] performs static analysis on iOS
applications for the iPhone. The PiOS study found the
majority of analyzed applications to leak the device ID
and over half of the applications include advertisement
and analytics libraries.
9 Conclusions
Smartphones are rapidly becoming a dominant comput-
ing platform. Low barriers of entry for application de-
velopers increases the security risk for end users. In this
paper, we described the ded decompiler for Android ap-
plications and used decompiled source code to perform a
breadth study of both dangerous functionality and vul-
nerabilities. While our ?ndings of exposure of phone
identi?ers and location are consistent with previous stud-
ies, our analysis framework allows us to observe not only
the existence of dangerous functionality, but also how it
occurs within the context of the application.
Moving forward, we foresee ded and our analysis
speci?cations as enabling technologies that will open
new doors for application certi?cation. However, the in-
tegration of these technologies into an application certi?-
cation process requires overcoming logistical and techni-
cal challenges. Our future work will consider these chal-
lenges, and broaden our analysis to new areas, including
application installation, malicious JNI, and phishing.
Acknowledgments
We would like to thank Fortify Software Inc. for pro-
viding us with a complementary copy of Fortify SCA
to perform the study. We also thank Suneel Sundar
and Joy Marie Forsythe at Fortify for helping us debug
custom rules. Finally, we thank Kevin Butler, Stephen
McLaughlin, Patrick Traynor, and the SIIS lab for their
editorial comments during the writing of this paper. This
material is based upon work supported by the National
Science Foundation Grant No. CNS-0905447, CNS-
0721579, and CNS-0643907. Any opinions, ?ndings,
and conclusions or recommendations expressed in this
material are those of the author(s) and do not necessarily
re?ect the views of the National Science Foundation.
References
[1] Fern?ower - java decompiler. http://www.reversed-java.
com/fernflower/.
[2] Fortify 360 Source Code Analyzer (SCA). https:
//www.fortify.com/products/fortify360/
source-code-analyzer.html.
[3] Jad. http://www.kpdus.com/jad.html.
[4] Jd java decompiler. http://java.decompiler.free.fr/.
[5] Mocha, the java decompiler. http://www.brouhaha.com/
~
eric/software/mocha/.
[6] ADMOB. AdMob Android SDK: Installation Instruc-
tions. http://www.admob.com/docs/AdMob_Android_SDK_
Instructions.pdf. Accessed November 2010.
[7] ASHCRAFT, K., AND ENGLER, D. Using Programmer-Written
Compiler Extensions to Catch Security Holes. In Proceedings of
the IEEE Symposium on Security and Privacy (2002).
[8] BBC NEWS. New iPhone worm can act like botnet
say experts. http://news.bbc.co.uk/2/hi/technology/
8373739.stm, November 23, 2009.
[9] BORNSTEIN, D. Google i/o 2008 - dalvik virtual machine inter-
nals. http://www.youtube.com/watch?v=ptjedOZEXPM.
[10] BURNS, J. Developing Secure Mobile Applications for Android.
iSEC Partners, October 2008. http://www.isecpartners.
com/files/iSEC_Securing_Android_Apps.pdf.
[11] CHEN, H., DEAN, D., AND WAGNER, D. Model Checking One
Million Lines of C Code. In Proceedings of the 11th Annual Net-
work and Distributed System Security Symposium (Feb. 2004).
[12] EGELE, M., KRUEGEL, C., KIRDA, E., AND VIGNA, G. PiOS:
Detecting Privacy Leaks in iOS Applications. In Proceedings of
the Network and Distributed System Security Symposium (2011).
[13] EGELE, M., KRUEGEL, C., KIRDA, E., YIN, H., AND SONG,
D. Dynamic Spyware Analysis. In Proceedings of the USENIX
Annual Technical Conference (June 2007), pp. 233–246.
[14] ENCK, W., GILBERT, P., CHUN, B.-G., COX, L. P., JUNG,
J., MCDANIEL, P., AND SHETH, A. N. TaintDroid: An
Information-Flow Tracking System for Realtime Privacy Moni-
toring on Smartphones. In Proceedings of the USENIX Sympo-
sium on Operating Systems Design and Implementation (2010).
[15] ENCK, W., OCTEAU, D., MCDANIEL, P., AND CHAUDHURI,
S. A Study of Android Application Security. Tech. Rep. NAS-
TR-0144-2011, Network and Security Research Center, Depart-
ment of Computer Science and Engineering, Pennsylvania State
University, University Park, PA, USA, January 2011.
[16] ENCK, W., ONGTANG, M., AND MCDANIEL, P. On
Lightweight Mobile Phone Application Certi?cation. In Proceed-
ings of the 16th ACM Conference on Computer and Communica-
tions Security (CCS) (Nov. 2009).
[17] ENCK, W., ONGTANG, M., AND MCDANIEL, P. Understand-
ing Android Security. IEEE Security & Privacy Magazine 7, 1
(January/February 2009), 50–57.
[18] F-SECURE CORPORATION. Virus Description: Viver.A.
http://www.f-secure.com/v-descs/trojan_symbos_
viver_a.shtml.
[19] FELMETSGER, V., CAVEDON, L., KRUEGEL, C., AND VIGNA,
G. Toward Automated Detection of Logic Vulnerabilities in Web
Applications. In Proceedings of the USENIX Security Symposium
(2010).
[20] FIRST TECH CREDIT UNION. Security Fraud: Rogue Android
Smartphone app created. http://www.firsttechcu.com/
home/security/fraud/security_fraud.html, Dec. 2009.
[21] GOODIN, D. Backdoor in top iphone games stole
user data, suit claims. The Register, November 2009.
http://www.theregister.co.uk/2009/11/06/iphone_
games_storm8_lawsuit/.
[22] HOVEMEYER, D., AND PUGH, W. Finding Bugs is Easy. In Pro-
ceedings of the ACM conference on Object-Oriented Program-
ming Systems, Languages, and Applications (2004).
[23] JOHNS, T. Securing Android LVL Applications.
http://android-developers.blogspot.com/2010/
09/securing-android-lvl-applications.html, 2010.
[24] JUNG, J., SHETH, A., GREENSTEIN, B., WETHERALL, D.,
MAGANIS, G., AND KOHNO, T. Privacy Oracle: A System for
Finding Application Leaks with Black Box Differential Testing.
In Proceedings of the ACM conference on Computer and Com-
munications Security (2008).
[25] KASPERSKEY LAB. First SMS Trojan detected for smartphones
running Android. http://www.kaspersky.com/news?id=
207576158, August 2010.
[26] KIRDA, E., KRUEGEL, C., BANKS, G., VIGNA, G., AND KEM-
MERER, R. A. Behavior-based Spyware Detection. In Proceed-
ings of the 15th USENIX Security Symposium (Aug. 2006).
[27] KRALEVICH, N. Best Practices for Handling Android User
Data. http://android-developers.blogspot.com/2010/
08/best-practices-for-handling-android.html, 2010.
[28] LIVSHITS, V. B., AND LAM, M. S. Finding Security Vulnera-
bilities in Java Applications with Static Analysis. In Proceedings
of the 14th USENIX Security Symposium (2005).
[29] LOOKOUT. Update and Clari?cation of Analysis of Mobile Ap-
plications at Blackhat 2010. http://blog.mylookout.com/
2010/07/mobile-application-analysis-blackhat/,
July 2010.
[30] MCDANIEL, P., AND ENCK, W. Not So Great Expectations:
Why Application Markets Haven’t Failed Security. IEEE Secu-
rity & Privacy Magazine 8, 5 (September/October 2010), 76–78.
[31] MIECZNIKOWSKI, J., AND HENDREN, L. Decompiling java us-
ing staged encapsulation. In Proceedings of the Eighth Working
Conference on Reverse Engineering (2001).
[32] MIECZNIKOWSKI, J., AND HENDREN, L. J. Decompiling java
bytecode: Problems, traps and pitfalls. In Proceedings of the 11th
International Conference on Compiler Construction (2002).
[33] MILNER, R. A theory of type polymorphism in programming.
Journal of Computer and System Sciences 17 (August 1978).
[34] OBERHEIDE, J. Android Hax. In Proceedings of SummerCon
(June 2010).
[35] OCTEAU, D., ENCK, W., AND MCDANIEL, P. The ded Decom-
piler. Tech. Rep. NAS-TR-0140-2010, Network and Security Re-
search Center, Department of Computer Science and Engineer-
ing, Pennsylvania State University, University Park, PA, USA,
Sept. 2010.
[36] ONGTANG, M., BUTLER, K., AND MCDANIEL, P. Porscha:
Policy Oriented Secure Content Handling in Android. In Proc. of
the Annual Computer Security Applications Conference (2010).
[37] ONGTANG, M., MCLAUGHLIN, S., ENCK, W., AND MC-
DANIEL, P. Semantically Rich Application-Centric Security in
Android. In Proceedings of the Annual Computer Security Appli-
cations Conference (2009).
[38] PORRAS, P., SAIDI, H., AND YEGNESWARAN, V. An Analysis
of the Ikee.B (Duh) iPhone Botnet. Tech. rep., SRI International,
Dec. 2009. http://mtc.sri.com/iPhone/.
[39] PROEBSTING, T. A., AND WATTERSON, S. A. Krakatoa: De-
compilation in java (does bytecode reveal source?). In Proceed-
ings of the USENIX Conference on Object-Oriented Technologies
and Systems (1997).
[40] RAPHEL, J. Google: Android wallpaper apps were not security
threats. Computerworld (August 2010).
[41] SCHLEGEL, R., ZHANG, K., ZHOU, X., INTWALA, M., KAPA-
DIA, A., AND WANG, X. Soundcomber: AStealthy and Context-
Aware Sound Trojan for Smartphones. In Proceedings of the Net-
work and Distributed System Security Symposium (2011).
[42] SCHWARZ, B., CHEN, H., WAGNER, D., MORRISON, G.,
WEST, J., LIN, J., AND TU, W. Model Checking an Entire
Linux Distribution for Security Violations. In Proceedings of the
Annual Computer Security Applications Conference (2005).
[43] STORM, D. Zombies and Angry Birds attack: mobile phone mal-
ware. Computerworld (November 2010).
[44] TIURYN, J. Type inference problems: A survey. In Proceedings
of the Mathematical Foundations of Computer Science (1990).
[45] VALLEE-RAI, R., GAGNON, E., HENDREN, L., LAM, P., POM-
INVILLE, P., AND SUNDARESAN, V. Optimizing java bytecode
using the soot framework: Is it feasible? In International Confer-
ence on Compiler Construction, LNCS 1781 (2000), pp. 18–34.
[46] WARE, M. S., AND FOX, C. J. Securing Java Code: Heuristics
and an Evaluation of Static Analysis Tools. In Proceedings of the
Workshop on Static Analysis (SAW) (2008).
[47] YIN, H., SONG, D., EGELE, M., KRUEGEL, C., AND KIRDA,
E. Panorama: Capturing System-wide Information Flow for Mal-
ware Detection and Analysis. In Proceedings of the ACM confer-
ence on Computer and Communications Security (2007).
Notes
1
The undx and dex2jar tools attempt to decompile .dex ?les, but
were non-functional at the time of this writing.
2
Note that it is suf?cient to ?nd any type-exposing instruction for
a register assignment. Any code that could result in different types for
the same register would be illegal. If this were to occur, the primitive
type would be dependent on the path taken at run time, a clear violation
of Java’s type system.
3
Fortunately, these dangerous applications are now nonfunc-
tional, as the imnet.us NS entry is NS1.SUSPENDED-FOR.
SPAM-AND-ABUSE.COM.
doc_288733385.pdf