Dynamic graphs
If you are just getting started with Dagster, we strongly recommend you use assets rather than ops to build your data pipelines. The ops documentation is for Dagster users who need to manage existing ops, or who have complex use cases.
The ability for portions of a graph to be duplicated at runtime.
Relevant APIs
Name | Description |
---|---|
DynamicOut | Declare that an op will return dynamic outputs |
DynamicOutput | The object that an op will yield repeatedly, each containing a value and a unique mapping_key |
Overview
The basic unit of computation in Dagster is the op. In certain cases it is desirable to run the same op multiple times on different pieces of similar data.
Dynamic outputs are the tool Dagster provides to allow resolving the pieces of data at runtime and having downstream copies of the ops created for each piece.
Using dynamic outputs
A static job
Here we start with a contrived example of a job containing a single expensive op:
@dg.op
def data_processing():
large_data = load_big_data()
interesting_result = expensive_processing(large_data)
return analyze(interesting_result)
@dg.job
def naive():
data_processing()
While, the implementation of expensive_processing
can internally do something to parallelize the compute, if anything goes wrong with any part we have to restart the whole computation.