A cheat sheet for custom TensorFlow layers and models
How to redefine everything from numpy, and some actually useful tricks for part assignment, saving custom layers, etc.
We’re talking about creating custom TensorFlow layers and models in:
TensorFlow v2.5.0.
No wasting time with introductions, let’s go:
Numpy mappings to TensorFlow
- Keep in mind for all operations: in TensorFlow, the first axis is the batch_size.
- Trick to get past the batch size when it’s inconvenient: wrap linear algebra in
tf.map_fn
, e.g. converting vectors of sizen
in a batch(batch_size x n)
to diagonal matrices of size(batch_size x n x n)
:
mats = tf.map_fn(lambda vec: tf.linalg.tensor_diag(vec), vecs)
- For most operations such as
tf.matmul
, the batch_size is already taken care of:
The inputs must, following any transpositions, be tensors of rank >= 2 where the inner 2 dimensions specify valid matrix multiplication dimensions, and any further outer dimensions specify matching batch size.
- Sum an array:
np.sum(x)
→tf.map.reduce_sum(x)
- Dot product of two vectors:
np.dot(x,y)
→tf.tensordot(x,y,1)
- Matrix product with vector
A.b
:np.dot(A,b)
→tf.linalg.matvec(A,b)
- Trigonometry:
np.sin
→tf.math.sin
- Transpose
(batch_size x n x m)
→(batch_size x m x n)
:np.transpose(x, axes=(0,2,1))
→tf.transpose(x, perm=[0,2,1])
- Vector to diagonal matrix:
np.diag(x)
→tf.linalg.tensor_diag(x)
- Concatenate matrices:
np.concatenate((A,B),axes=0)
→tf.concat([A,B],1)
- Matrix flatten:
[[A,B],[C,D]]
into a single matrix:
tmp1 = tf.concat([A,B],2)
tmp2 = tf.concat([C,D],2)
mat = tf.concat([tmp1,tmp2],1)
- Kronecker product of a vector
vec
of sizen
(makes a matrixn x n
):tf.tensordot(vec,vec,axes=0)
except the vectors usually come in a batchvecs
of size(batch_size x n)
, so we need to map:
mats = tf.map_fn(lambda vec: tf.tensordot(vec,vec,axes=0),vecs)
- Zeros:
np.zeros(n)
→tf.zeros_like(n)
. Note that to avoid the error “Cannot convert a partially known TensorShape to a Tensor”, you should usetf.zeros_like
instead oftf.zeros
since the former does not require the size to be known until runtime, see also here.
Part assignment by indexes
In numpy
we can just write:
a = np.random.rand(3)
a[0] = 5.0
This doesn’t work in TensorFlow:
a = tf.Variable(initial_value=np.random.rand(3),dtype='float32')
a[0] = 5.0# TypeError: 'ResourceVariable' object does not support item assignment
Instead, you can add/subtract unit vectors/matrices/tensors with the appropriate values. The tf.one_hot
function can be used to construct these:
e = tf.one_hot(indices=0,depth=3,dtype='float32')# <tf.Tensor: shape=(3,), dtype=float32, numpy=array([1., 0., 0.], dtype=float32)>
and then change the value like this:
a = a + (new_val - old_val) * e
where a
is the TF variable, new_val
and old_val
are floats, and e
is the unit vector.
For example, in the previous example:
Part assignment by indexes for matrices, etc.
Same as the vector example, but to construct the unit matrices you can use this helper function:
which you can adapt to higher order tensors as well.
Getting the shape of things
Use tf.shape(x)
instead of x.shape
. You can read this discussion here: or:
Do not use .shape; rather use tf.shape. However, h = tf.shape(x)[1]; w = tf.shape(x)[2] will let h, w be symbolic or graph-mode tensors (integer) that will contain a dimension of x. The value will be determined runtime. In such a case, tf.reshape(x, [-1, w, h]) will produce a (symbolic) tensor of shape [?, ?, ?] (still unknown) whose tensor shape will be known on runtime.
What to subclass — tf.keras.layers.Layer or tf.keras.Model?
Model
if you want to use fit
or evaluate
or something similar, otherwise Layer
.
From the docs:
Typically you inherit from keras.Model when you need the model methods like: Model.fit,Model.evaluate, and Model.save (see Custom Keras layers and models for details). One other feature provided by keras.Model (instead of keras.layers.Layer) is that in addition to tracking variables, a keras.Model also tracks its internal layers, making them easier to inspect.
What do I need to implement to subclass a Layer?
- Obviously call
super
in the constructor. - Obviously implement the
call
method. - You should implement
get_config
andfrom_config
— they are needed often for saving the layer, e.g. ifsave_traces=False
in the model. @tf.keras.utils.register_keras_serializable(package="my_package")
should be added at the top of the class. From the docs:
This decorator injects the decorated class or function into the Keras custom object dictionary, so that it can be serialized and deserialized without needing an entry in the user-provided custom object dict.
This really means that if you do not put it, you must later load your model containing the layer like this Keras documentation describes:
loaded_1 = keras.models.load_model(
"my_model", custom_objects={"MyLayer": MyLayer}
)
which is kind of a pain.
What do I need to implement to subclass a Model?
Saving and loading models and layers
If you have correctly defined get_config
and from_config
for each of your layers and models as above, you can just save using:
model.save("model", save_traces=False)# Or just save the weights
model.save_weights("model_weights")
Note that you do not need save_traces
if you have defined those correctly. This also makes the saved models much smaller!
To load, just use:
model = tf.keras.models.load_model("model")# Or just load the weights for an already existing model
model.load_weights("model_weights")
Note that we don’t need to specify and custom_objects
if we correctly used the
@tf.keras.utils.register_keras_serializable(package="my_package")
decorator everywhere.
How to reduce the size of TensorBoard callbacks if you have a large custom model
Similar to how save_traces
reduces the size of saved models, you can use write_graph=False
in the TensorBoard callback to reduce the size of these files:
logdir = os.path.join("logs",datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir,
histogram_freq=1,
write_graph=False
)
Or, for a ModelCheckpoint
, you can use save_weights_only=True
to just save the weights and not the model at all:
val_checkpoint = tf.keras.callbacks.ModelCheckpoint(
'trained',
monitor='val_loss',
verbose=1,
save_best_only=True,
save_weights_only=True,
mode='auto',
save_frequency=1
)
Conclusion
Hope this starter-set of tricks helped!