Electronic Notes in Theoretical Computer Science10(1998)
URL:www.elsevier.nl/locate/entcs/volume10.html7pages
A Type System for Object Initialization
In the Java TM Bytecode Language
Stephen N.Freund John C.Mitchell
Department of Computer Science
Stanford University
Stanford,CA94305-9045
{freunds,mitchell}@cs.stanford.edu
1Introduction情歌 歌词
The Java programming language is a statically-typed general-purpose pro-gramming language with an
implementation architecture that is designed to facilitate transmission of compiled code across a network.In the standard implementation,a Java language program is compiled to Java bytecode and this bytecode is then interpreted by the Java Virtual Machine.The interme-diate bytecode language,which we refer to as JVML,is a typed,machine-independent form of assembly language with some low-level instructions that reflect specific high-level Java source language constructs.For example,classes are a basic notion in JVML.Since bytecode may be written by hand,or corrupted during network transmission,the Java Virtual Machine contains a bytecode verifier that performs a number of consistency checks before code is interpreted.This protects the receiver from certain security risks and various forms of attack.
In this paper,we develop a specification for a fragment of the bytecode language that includes object creation(allocation of memory)and initializa-tion.This work is based on a prior study of the bytecodes for local subroutine
c 1998Publishe
d by Elsevier Scienc
e B.V.
call and return[2].We prove soundness of the type system by a traditional method using operational semantics.It follows from the soundness theorem that any bytecode program that passes the static checks will initialize every object before it is used.
2Object Initialization
As in many other object-oriented languages,the Java implementation creates new objects in two steps.Thefirst step is to allocate space for the object.This usually requires some environment specific operation to obtain an appropriate region of memory.In the second step,user-defined code is executed to initialize the object.In Java,the user initialization code is provided by a constructor defined in the class of the object.Only after both of these steps are completed can a method be invoked on an object.
In the Java source language,allocation and initialization are combined into a single statement.This is illustrated in the following code fragment:
Point p=new Point(3);
p.Print();
Since every Java object is created by a statement like the one in thefirst line here,it does not seem difficult to prevent Java source language programs from invoking methods on objects that have not been initialized.
It is much more difficult to recognize initialization-before-use in bytecode. This can be seen by looking at thefive lines of bytecode that are produced by compiling the two lines of source code above:
1:new#1<Class Point>
2:dup
3:iconst_3
4:invokespecial#4<Method Point(int)>
花样的年华
于湉女朋友5:invokevirtual#5<Method void Print()>
The most striking difference is that memory allocation(line1)is separated from the constructor invocation(line4)by two lines of code.Thefirst inter-vening line,dup,duplicates the pointer to the uniniti
alized object.This line is needed due to the calling convention of the Java Virtual Machine.The second line,iconst
and stored and loaded in the local variables,the analysis also duplicates these line numbers.All references having the same line number are assumed to refer to the same object.When an object is initialized,all pointers that refer to objects created at the same line number are set to initialized.In other words, all references to objects of a certain type are partitioned into equivalence classes according to what is statically known about each reference.Since aliasing is irrelevant for objects that have been initialized,it is not necessary to track aliasing once a reference leads to an initialized object.
Since our approach is type based,the status information associated with each reference for the alias analysis is recorded as part of its type.
3JVML i
This section describes the JVML i language,a subset of JVML encompass-ing basic constructs and object initialization.The run-time environment for JVML i consists only of an operand stack and afinite set of local variables.
A JVML i program will be a map from Addr to instructions drawn from the following list:
instruction::=push0|inc|pop
|if L
|store x|load x
第一滴泪歌词|newσ|initσ|useσ
|halt
where x is a local variable name,σis an object type,and L is an address of another instruction in the program.We refer the reader to Stata and Abadi for a description of those instructions not defined below:
newσ:allocates a new,uninitialized object of typeσ.
initσ:initializes a previously uninitialized object of typeσ.This represents calling the constructor of an object.
useσ:performs an operation on an initialized object of typeσ.This corre-sponds to several operations in JVML,including method invocation(invoke-virtual),accessing an instancefield(putfield/getfield),etc.
4Operational and Static Semantics
4.1Values and Types
JVML i types are generated by the grammar:
万茜丈夫τ::=Int|σ|ˆσi|Top
whereσ∈T,the set of valid object types.The typeˆσi will be given to values resulting from allocating an object of typeσon line i of a program.The type Int will be used for the type of integers.Onefinal type is Top.Any value of
3
P[pc]=newσ
ˆa∈Aˆσpc,Unused(ˆa,f,s)
P⊢ pc,f,ˆa·s → pc+1,[a/ˆa]f,[a/ˆa]s
P[pc]=useσ
a∈Aσ
v:Top n is an integer
a:τ
a∈Aˆτi ǫ:ǫ
a:τs:α
(new)
P[i]=newσ
σ∈T
F i+1=F i
S i+1=ˆσi·S i
ˆσi∈S i
∀y∈Dom(F i).F i[y]=ˆσi
i+1∈Dom(P)
F,S,i⊢P
(use)
P[i]=useσ
σ∈T
F i+1=F i
S i=σ·S i+1
i+1∈Dom(P)
Unused(a,f,s)
This will allow the virtual machine to pick a value that is currently not used by the program.The operational rules not presented here are discussed in[2]. These rules have been designed so that a step cannot be made from an illegal state,such as being at a pop statement when there is an empty stack.
When a new object is created,a currently unused value of an uninitialized object type is placed on the stack.The type of that value is determined by the object type named in the instruction and the line number of the instruction. When the value for an uninitialized object is initialized by an init instruction, all occurrences of that value are replaced by a value corresponding to an initialized object.
4.3Static Semantics
A program P is well typed if there exist F and S such that
F,S⊢P.
F is a map from Addr to functions mapping local variables to types.As described in[2],elements in a
orange range
map over Addr are accessed as F i instead of F[i].Thus,F i[y]is the type of local variable y at line i of a program.Likewise, S is a map from Addr to stack types such that S i is the type of the operand stack at location i of the program.The judgment which allows us to conclude
5
that a program P is well typed by F and S is
(wt prog)
F1=F Top
S1=ǫ
∀i∈Dom(P).F,S,i⊢P