Logger

Using a Logger

Spinning Up ships with basic logging tools,implemented in the classesLoggerandEpochLogger.The Logger class contains most of the basic functionality for saving diagnostics,hyperparameter configurations,the state of a training run,and the trained model.The EpochLogger class adds a thin layer on top of that to make it easy to track the average,standard deviation,min,and max value of a diagnostic over each epoch and across MPI workers.

You Should Know

All Spinning Up algorithm implementations use an EpochLogger.

Examples

First,let's look at a simple example of how an EpochLogger keeps track of a diagnostic value:

>>>from spinup.utils.logx import EpochLogger
>>>epoch_logger = EpochLogger()
>>>for i in range(10):
epoch_logger.store(Test=i)
>>>epoch_logger.log_tabular('Test', with_min_and_max=True)
>>>epoch_logger.dump_tabular()
-------------------------------------
|     AverageTest |             4.5 |
|         StdTest |            2.87 |
|         MaxTest |               9 |
|         MinTest |               0 |
-------------------------------------

Thestoremethod is used to save all values ofTestto theepoch_logger‘s internal state.Then,whenlog_tabularis called,it computes the average,standard deviation,min,and max ofTestover all of the values in the internal state.The internal state is wiped clean after the call tolog_tabular(to prevent leakage into the statistics at the next epoch).Finally,dump_tabularis called to write the diagnostics to file and to stdout.

Next,let's look at a full training procedure with the logger embedded,to highlight configuration and model saving as well as diagnostic logging:

1 2 3 4 5 6 7 8 9101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
 import numpy as np
 import tensorflow as tf
 import time
 from spinup.utils.logx import EpochLogger


 def mlp(x, hidden_sizes=(32,), activation=tf.tanh, output_activation=None):
     for h in hidden_sizes[:-1]:
         x = tf.layers.dense(x, units=h, activation=activation)
     return tf.layers.dense(x, units=hidden_sizes[-1], activation=output_activation)


 # Simple script for training an MLP on MNIST.
 def train_mnist(steps_per_epoch=100, epochs=5,
                 lr=1e-3, layers=2, hidden_size=64,
                 logger_kwargs=dict(), save_freq=1):

     logger = EpochLogger(**logger_kwargs)
     logger.save_config(locals())

     # Load and preprocess MNIST data
     (x_train, y_train), _ = tf.keras.datasets.mnist.load_data()
     x_train = x_train.重塑(-1, 28*28) / 255.0

     # Define inputs & main outputs from computation graph
     x_ph = tf.placeholder(tf.float32, shape=(None, 28*28))
     y_ph = tf.placeholder(tf.int32, shape=(None,))
     logits = mlp(x_ph, hidden_sizes=[hidden_size]*layers + [10], activation=tf.nn.relu)
     predict = tf.argmax(logits, axis=1, output_type=tf.int32)

     # Define loss function,accuracy,and training op
     y = tf.one_hot(y_ph, 10)
     loss = tf.losses.softmax_cross_entropy(y, logits)
     acc = tf.reduce_mean(tf.cast(tf.equal(y_ph, predict), tf.float32))
     train_op = tf.train.AdamOptimizer().minimize(loss)

     # Prepare session
     sess = tf.Session()
     sess.run(tf.global_variables_initializer())

     # Setup model saving
     logger.setup_tf_saver(sess, inputs={'x': x_ph},
                                 outputs={'logits': logits, 'predict': predict})

     start_time = time.time()

     # Run main training loop
     for epoch in range(epochs):
         for t in range(steps_per_epoch):
             idxs = np.random.randint(0, len(x_train), 32)
             feed_dict = {x_ph: x_train[idxs],
                          y_ph: y_train[idxs]}
             outs = sess.run([loss, acc, train_op], feed_dict=feed_dict)
             logger.store(Loss=outs[0], Acc=outs[1])

         # Save model
         if (epoch % save_freq == 0) or (epoch == epochs-1):
             logger.save_state(state_dict=dict(), itr=None)

         # Log info about epoch
         logger.log_tabular('Epoch', epoch)
         logger.log_tabular('Acc', with_min_and_max=True)
         logger.log_tabular('Loss', average_only=True)
         logger.log_tabular('TotalGradientSteps', (epoch+1)*steps_per_epoch)
         logger.log_tabular('Time', time.time()-start_time)
         logger.dump_tabular()

 if __name__ == '__main__':
     train_mnist()

In this example,observe that

Logging and MPI

You Should Know

Several algorithms in RL are easily parallelized by using MPI to average gradients and/or other key quantities.The Spinning Up loggers are designed to be well-behaved when using MPI: things will only get written to stdout and to file from the process with rank 0.But information from other processes isn't lost if you use the EpochLogger: everything which is passed into EpochLogger viastore,regardless of which process it's stored in,gets used to compute average/std/min/max values for a diagnostic.

Logger Classes

class spinup.utils.logx. Logger ( output_dir=None, output_fname='progress.txt', exp_name=None ) [source]

A general-purpose logger.

Makes it easy to save diagnostics,hyperparameter configurations,thestate of a training run,and the trained model.

__init__ ( output_dir=None, output_fname='progress.txt', exp_name=None ) [source]

Initialize a Logger.

Parameters:
  • output_dir(string) – A directory for saving results to.IfNone,defaults to a temp directory of the form/tmp/experiments/somerandomnumber.
  • output_fname(string) – Name for the tab-separated-value filecontaining metrics logged throughout a training run.Defaults toprogress.txt.
  • exp_name(string) – Experiment name.If you run multiple trainingruns and give them all the sameexp_name,the plotterwill know to group them.(Use case: if you run the samehyperparameter configuration with multiple random seeds,youshould give them all the sameexp_name.)
dump_tabular ( ) [source]

Write all of the diagnostics from the current iteration.

Writes both to stdout,and to the output file.

log ( msg, color='green' ) [source]

Print a colorized message to stdout.

log_tabular ( key, val ) [source]

Log a value of some diagnostic.

Call this only once for each diagnostic quantity,each iteration.After usinglog_tabularto store values for each diagnostic,make sure to calldump_tabularto write them out to file andstdout (otherwise they will not get saved anywhere).

save_config ( config ) [source]

Log an experiment configuration.

Call this once at the top of your experiment,passing in all importantconfig vars as a dict.This will serialize the config to JSON,whilehandling anything which can't be serialized in a graceful way (writingas informative a string as possible).

Example use:

logger = EpochLogger(**logger_kwargs)
logger.save_config(locals())
save_state ( state_dict, itr=None ) [source]

Saves the state of an experiment.

To be clear: this is about savingstate,not logging diagnostics.All diagnostic logging is separate from this function.This functionwill save whatever is instate_dict—usually just a copy of theenvironment—and the most recent parameters for the model youpreviously set up saving for withsetup_tf_saver.

Call with any frequency you prefer.If you only want to maintain asingle state and overwrite it at each call with the most recentversion,leaveitr=None.If you want to keep all of the states yousave,provide unique (increasing) values for ‘itr'.

Parameters:
  • state_dict(dict) – Dictionary containing essential elements todescribe the current state of training.
  • itr– An int,or None.Current iteration of training.
setup_tf_saver ( sess, inputs, outputs ) [source]

Set up easy model saving for tensorflow.

Call once,after defining your computation graph but before training.

Parameters:
  • sess– The Tensorflow session in which you train your computationgraph.
  • inputs(dict) – A dictionary that maps from keys of your choiceto the tensorflow placeholders that serve as inputs to thecomputation graph.Make sure thatallof the placeholdersneeded for your outputs are included!
  • outputs(dict) – A dictionary that maps from keys of your choiceto the outputs from your computation graph.
class spinup.utils.logx. EpochLogger ( *args, **kwargs ) [source]

Bases:spinup.utils.logx.Logger

A variant of Logger tailored for tracking average values over epochs.

Typical use case: there is some quantity which is calculated many timesthroughout an epoch,and at the end of the epoch,you would like toreport the average / std / min / max value of that quantity.

With an EpochLogger,each time the quantity is calculated,you woulduse

epoch_logger.store(NameOfQuantity=quantity_value)

to load it into the EpochLogger's state.Then at the end of the epoch,youwould use

epoch_logger.log_tabular(NameOfQuantity, **options)

to record the desired values.

get_stats ( key ) [source]

Lets an algorithm ask the logger for mean/std/min/max of a diagnostic.

log_tabular ( key, val=None, with_min_and_max=False, average_only=False ) [source]

Log a value or possibly the mean/std/min/max values of a diagnostic.

Parameters:
  • key(string) – The name of the diagnostic.If you are logging adiagnostic whose state has previously been saved withstore,the key here has to match the key you used there.
  • val– A value for the diagnostic.If you have previously savedvalues for this key viastore,doprovide avalhere.
  • with_min_and_max(bool) – If true,log min and max values of thediagnostic over the epoch.
  • average_only(bool) – If true,do not log the standard deviationof the diagnostic over the epoch.
store ( **kwargs ) [source]

Save something into the epoch_logger's current state.

Provide an arbitrary number of keyword arguments with numericalvalues.

Loading Saved Graphs

spinup.utils.logx. restore_tf_graph ( sess, fpath ) [source]

Loads graphs saved by Logger.

Will output a dictionary whose keys and values are from the ‘inputs'and ‘outputs' dict you specified with logger.setup_tf_saver().

Parameters:
  • sess– A Tensorflow session.
  • fpath– Filepath to save directory.
Returns:

A dictionary mapping from keys to tensors in the computation graphloaded fromfpath.

When you use this method to restore a graph saved by a Spinning Up implementation,you can minimally expect it to include the following:

Key Value
x Tensorflow placeholder for state input.
pi
Samples an action from the agent,conditioned
on states in x.

The relevant value functions for an algorithm are also typically stored.For details of what else gets saved by a given algorithm,see its documentation page.