Friday, May 2, 2025

Prompt chaining reimagined with type inference

Prompt chaining reimagined with type inference

At work I’ve been researching how to improve the ergonomics of prompt engineering and I wanted to share and open source some of what I’ve done. This initial post is about how I’ve been experimenting with using bidirectional type inference to streamline prompt chaining.

“Prompt chaining” is a prompt engineering technique that splits a larger task/prompt into multiple smaller tasks/prompts which are chained together using code. For example, instead of prompting a model to generate a poem in one prompt like this:

Write a poem based off this idea:

${idea}

… by following this process:

  • First think through the form, stanza count, lines per stanza, and rhyme scheme
  • Then choose a poetic style (tone, voice, and literary devices) based on the poem’s form
  • Then write a complete poem based on that plan

… you can split it into smaller prompts, like this:

structure prompt:

Plan the structure of a new poem based on this idea

${idea}

Describe its form, stanza count, lines per stanza, and rhyme scheme

style prompt:

Given this poem structure:

  • Form: ${structure.form}
  • Stanzas: ${structure.stanzaCount}
  • Lines per stanza: ${structure.linesPerStanza}
  • Rhyme scheme: ${structure.rhymeScheme}

Choose a poetic style: tone, voice, and literary devices to emphasize

poem prompt:

Write a complete poem based on this idea:

${idea}

Structure:

  • Form: ${structure.form}
  • Stanzas: ${structure.stanzaCount}
  • Lines per stanza: ${structure.linesPerStanza}
  • Rhyme scheme: ${structure.rhymeScheme}

Style:

  • Tone: ${style.stone}
  • Voice: ${style.voice}
  • Literary Devices: ${style.literaryDevices}

Why might you want to do this?

  • to improve the quality of the results

    Models perform better when working on more constrained subproblems. Splitting a larger prompt into smaller prompts helps the model stay focused at each step.

  • to introspect intermediate results

    This comes in handy when you want to log, validate, or correct intermediate results.

  • to perform actions in between prompts

    You might want to take the output of one prompt, use that to call some tool, then use the output of that tool to decide what the next prompt should be, which you can’t do with a single prompt.

In other words, prompt chaining unlocks greater accuracy, control, and flexibility for prompt engineering.

The problem

The main issue with prompt chaining is that it is a huge pain in the ass; if you start do anything a little bit complicated you need to start using structured outputs (i.e. JSON), which adds a whole lot of boilerplate to the process:

  • you have to define the schema for each intermediate step of the process

    You typically do this by defining your data model in your host programming language (e.g. a Pydantic model in Python) or directly defining your JSON schema

  • You have to instruct the model to produce JSON and explain the shape of the expected output

  • (Depending on the framework) you have to decode the JSON into your data model

For small prompt chaining pipelines this isn’t too hard, but it starts to get annoying to define all these schemas when you scale this up to more sophisticated prompt chaining pipelines.

So as a thought experiment I wanted to create a research prototype that handled all of that for you so that you didn’t need to specify any schemas at all. In other words I wanted to build a programming language that harnessed bidirectional type inference to perform schema inference for prompts with structured JSON outputs.

Example

I’ll cut to the case by showing the above prompt chain written as a program in this language:

let concatSep =
      https://raw.githubusercontent.com/Gabriella439/grace/refs/heads/main/prelude/text/concatSep.ffg

let lines = concatSep "\n"

let generatePoem idea =
        let structure = prompt
                { model: "gpt-4o"
                , text: lines
                    [ "Plan the structure of a new poem based on this idea:"
                    , ""
                    , idea
                    , ""
                    , "Describe its form, stanza count, lines per stanza, and rhyme scheme."
                    ]
                }

        let renderedStructure = lines
                [ "- Form: " + structure.form
                , "- Stanzas: " + Real/show (structure.stanzaCount : Integer)
                , "- Lines per stanza: " + Real/show (structure.linesPerStanza : Integer)
                , "- Rhyme scheme: " + structure.rhymeScheme
                ]

        let style = prompt
                { model: "gpt-4o"
                , text: lines
                    [ "Given this poem structure:"
                    , renderedStructure
                    , ""
                    , "Choose a poetic style: tone, voice, and literary devices to emphasize."
                    ]
                }

        let renderedStyle = lines
                [ "- Tone: " + style.tone
                , "- Voice: " + style.voice
                , "- Literary Devices: " + concatSep ", " style.literaryDevices
                ]

        let poem : Text = prompt
                { model: "gpt-4o"
                , text: lines
                    [ "Write a complete poem based on this idea:"
                    , ""
                    , idea
                    , ""
                    , "Structure:"
                    , renderedStructure
                    , ""
                    , "Style:"
                    , renderedStyle
                    ]
                }

        in { structure: structure, style: style, poem: poem }

in  generatePoem "The stillness of early morning"

The above prompt chain is a real program in a research programming language named Fall-from-Grace (or “Grace” for short). Specifically, this is implemented in a branch of the language that adds a magic prompt keyword that works like in the above example.

If you run the above example, you might get an output like this:

{
  "structure": {
    "form": "Free verse with a sprinkle of rhyme",
    "linesPerStanza": 4,
    "rhymeScheme": "ABCB",
    "stanzaCount": 5
  },
  "style": {
    "literaryDevices": [
      "imagery",
      "alliteration",
      "personification",
      "enjambment"
    ],
    "tone": "nostalgic",
    "voice": "reflective"
  },
  "poem": "### The Stillness of Early Morning\n\n**I**  \nIn the gentle rise of dawn's embrace,  \nA whisper of winds through sleepy skies,  \nThe world lies suspended in an echo  \nOf yesterday's dreams, reluctantly saying goodbye.\n\n**II**  \nSoft orange light graces fields of dew,  \nCradling tenacious tendrils of the past.  \nMorning breathes upon the waiting world  \nWith promises that this serenity will last.\n\n**III**  \nWaves of silence surge across the meadow,  \nNature's symphony gently cradles rest.  \nA chorus of leaves sings its soft refrain,  \nComforted by morning's tender caress.\n\n**IV**  \nThe silence dips to caress weary trees,  \nWhose branches waltz to whispers soft and slow.  \nEach tendril holds close the morning's intention,  \nBefore the day's demands, the chaos, start to show.\n\n**V**  \nEarly rays dance upon a tranquil sea,  \nA canvas painted through a dreamy haze.  \nTime sits by, savoring the gentle hush,  \nAs the world awakens, stirs, and finally plays."
}

… and if you’re curious what the formatted poem looks like:

The Stillness of Early Morning

I
In the gentle rise of dawn’s embrace,
A whisper of winds through sleepy skies,
The world lies suspended in an echo
Of yesterday’s dreams, reluctantly saying goodbye.

II
Soft orange light graces fields of dew,
Cradling tenacious tendrils of the past.
Morning breathes upon the waiting world
With promises that this serenity will last.

III
Waves of silence surge across the meadow,
Nature’s symphony gently cradles rest.
A chorus of leaves sings its soft refrain,
Comforted by morning’s tender caress.

IV
The silence dips to caress weary trees,
Whose branches waltz to whispers soft and slow.
Each tendril holds close the morning’s intention,
Before the day’s demands, the chaos, start to show.

V
Early rays dance upon a tranquil sea,
A canvas painted through a dreamy haze.
Time sits by, savoring the gentle hush,
As the world awakens, stirs, and finally plays.

Type inference

The sample Grace program hardly specifies any types (mainly the final expected type for the poem: Text). The reason this works is because Grace supports bidirectional type inference, which means that Grace can work backwards from how intermediate results are used to infer their schemas.

I’ll illustrate this with a contrived Grace example:

let numbers = prompt{ text: "Give me two numbers" }

in  { x: numbers.x
    , y: numbers.y
    , sum: numbers.x + numbers.y : Integer
    }

… which might produce an output like this:

$ grace interpret ./numbers.ffg
{ "x": 7, "y": 14, "sum": 21 }

When Grace analyzes this program the type checker works backwards from this expression:

numbers.x + numbers.y : Integer

… and reasons about it like this:

  • the addition produces an Integer, therefore numbers.x and numbers.y must also be Integers

  • therefore numbers is a record with two fields, x and y, both of which are Integers

    … or using Grace syntax, the inferred type of numbers is: { x: Integer, y: Integer }

  • therefore the output of the prompt command must have the same type

… and then Grace generates a JSON schema for the prompt which looks like this:

{ "type": "object",
  "properties": {
    "x": { "type": "integer" },
    "y": { "type": "integer" }
  },
  "required": [ "x", "y" ],
  "additionalProperties": false
}

Of course, you can specify types if you want (and they’re more lightweight than schemas in traditional prompt chaining frameworks). For example:

$ grace repl
>>> prompt{ text: "Give me a first and last name" } : { first: Text, last: Text }
{ "first": "Emily", "last": "Johnson" }
>>> prompt{ text: "Give me a list of names" } : List Text
[ "Alice"
, "Bob"
, "Charlie"
, "Diana"
, "Ethan"
, "Fiona"
, "George"
, "Hannah"
, "Isaac"
, "Jack"
]

However in our original example we don’t need to specify intermediate types because when the type-checker sees this code:

let structure = prompt
        { model: "gpt-4o"
        , text: lines
            [ "Plan the structure of a new poem based on this idea:"
            , ""
            , idea
            , ""
            , "Describe its form, stanza count, lines per stanza, and rhyme scheme."
            ]
        }

let renderedStructure = lines
        [ "- Form: " + structure.form
        , "- Stanzas: " + Real/show (structure.stanzaCount : Integer)
        , "- Lines per stanza: " + Real/show (structure.linesPerStanza : Integer)
        , "- Rhyme scheme: " + structure.rhymeScheme
        ]

… the compiler can reason backwards from how the structure value is used to infer that the JSON schema for the prompt needs to be:

{ "type": "object",
  "properties": {
    "form": { "type": "string" },
    "stanzaCount": { "type": "integer" },
    "linesPerStanza": { "type": "integer" },
    "rhymeScheme": { "type": "string" }
  },
  "required": [
    "form",
    "stanzaCount",
    "linesPerStanza",
    "rhymeScheme"
    ],
  "additionalProperties": false
}

Tool use

Grace also supports generating sum types (a.k.a. tagged unions), and you can imagine using this to subsume traditional tool use frameworks.

For example, consider this Grace program:

let concatSep = https://raw.githubusercontent.com/Gabriella439/grace/refs/heads/main/prelude/text/concatSep.ffg

let call = merge
      { HttpRequest: \x -> "curl " + x.url
      , ShellCommand: \x -> concatSep " " ([ x.executable ] + x.arguments)
      }

in  List/map call (prompt{ text: "Call some tools" })

This doesn’t actually run any tools (I haven’t added any callable tools to my work-in-progress branch yet), but just renders the tool use as a string for now:

$ grace interpret ./tools.ffg
[ "curl https://api.example.com/data", "ls -l -a" ]

However, the idea is that you can model a tool as a sum type with one constructor per callable tool, and in the above example the type checker infers that the sum type representing one tool call is:

< HttpRequest: { url: Text }
| ShellCommand: { executable: Text, arguments: List Text }
>

… so the inferred type of call is:

call : < HttpRequest: …, ShellCommand:> -> Text

… but since we List/map the call function over the output of the prompt the type checker infers that the prompt needs to generate a List of tool calls:

prompt{ text: "Call some tools" } : List < HttpRequest: …, ShellCommand:>

… and then Grace does some magic under the hood to convert that type to the equivalent JSON schema.

What’s particularly neat about this example is that the prompt is so incredibly bare (“Call some tools”) because all the information the model needs is present in the schema.

Schema-driven prompting

We can explore this idea of using the schema to drive the prompt instead of prose using an example like this:

prompt{ text: "Generate some characters for a story", model: "gpt-4o" }
  : List
    { "The character's name": Text
    , "The most memorable thing about the character": Text
    , "The character's personal arc": Text
    }
[ { "The character's name": "Aveline Thatcher"
  , "The character's personal arc":
      "Aveline starts as a skeptical journalist who doubts the stories of mythical creatures. Over time, she becomes a firm believer, risking her career to uncover the truth and protect these creatures."
  , "The most memorable thing about the character":
      "The intricate tattoo of a phoenix on her forearm that seems to glow when she discovers hidden truths."
  }
, { "The character's name": "Kelan Frost"
  , "The character's personal arc":
      "A former rogue alchemist who turns hero after he inadvertently creates a dangerous substance. Driven by guilt, Kelan seeks redemption by finding an antidote and saving his village."
  , "The most memorable thing about the character":
      "His iridescent blue eyes that seem to see into one's soul, a side effect of his alchemical experiments."
  }
, { "The character's name": "Luciana Blair"
  , "The character's personal arc":
      "Luciana is a reclusive artist who initially fears the world outside her home. After a mysterious vision rejuvenates her, she sets out on a journey of self-discovery, ultimately finding both her voice and courage."
  , "The most memorable thing about the character":
      "Her ability to paint scenes before they happen, which she attributes to the visions she sees in her dreams."
  }
, { "The character's name": "Ezra Hartman"
  , "The character's personal arc":
      "Once a charismatic but self-centered lawyer, Ezra is confronted with a moral crisis that forces him to reevaluate his values. He chooses a path of integrity, becoming an advocate for justice."
  , "The most memorable thing about the character":
      "His perfectly tailored suits that slowly become more casual, symbolizing his transformation and shifting priorities."
  }
, { "The character's name": "Seraphine Mora"
  , "The character's personal arc":
      "Seraphine is a young music prodigy who loses her hearing after an accident. Battling despair, she learns to embrace a new way of 'hearing' music through vibrations and her other senses."
  , "The most memorable thing about the character":
      "The ethereal way she 'dances' with the music, using her entire body to express each note's emotion."
  }
]

Grace is a superset of JSON and since JSON supports arbitrary field names so does Grace! Field names in Grace support arbitrary capitalization, punctuation, and whitespace as long as you quote them, and we can use the field names to “smuggle” the description of each field into the schema.

Conclusion

Hopefully this gives you some idea of why I’ve begun to think of prompt chaining as a programming languages problem. Type inference is just the beginning and I think it is possible to use a domain-specific programming language not just to simplify the code but to ultimately unlock greater reasoning power.

I’m going to continue to use Grace as a research vehicle for prompt chaining but my LLM-enabled branch of Grace (like Grace itself) is not really intended to be used in production and I created it mainly as a proof-of-concept for where I’d like prompt chaining frameworks to go. If I do end up eventually productionizing this research I will create a proper fork with its own name and the whole works.

Thursday, November 14, 2024

The Haskell inlining and specialization FAQ

The Haskell inlining and specialization FAQ

This is a post is an FAQ answering the most common questions people ask me related to inlining and specialization. I’ve also structured it as a blog post that you can read from top to bottom.

What is inlining?

“Inlining” means a compiler substituting a function call or a variable with its definition when compiling code. A really simple example of inlining is if you write code like this:

module Example where

x :: Int
x = 5

y :: Int
y = x + 1

… then at compile time the Haskell compiler can (and will) substitute the last occurrence of x with its definition (i.e. 5):

y :: Int
y = 5 + 1

… which then allows the compiler to further simplify the code to:

y :: Int
y = 6

In fact, we can verify that for ourselves by having the compiler dump its intermediate “core” representation like this:

$ ghc -O2 -fforce-recomp -ddump-simpl -dsuppress-all Example.hs

… which will produce this output:

==================== Tidy Core ====================
Result size of Tidy Core
  = {terms: 20, types: 7, coercions: 0, joins: 0/0}

x = I# 5#

$trModule4 = "main"#

$trModule3 = TrNameS $trModule4

$trModule2 = "Example"#

$trModule1 = TrNameS $trModule2

$trModule = Module $trModule3 $trModule1

y = I# 6#

… which we can squint a little bit and read it as:

x = 5

y = 6

… and ignore the other stuff.

A slightly more interesting example of inlining is a function call, like this one:

f :: Int -> Int
f x = x + 1

y :: Int
y = f 5

The compiler will be smart enough to inline f by replacing f 5 with 5 + 1 (here x is 5):

y :: Int
y = 5 + 1

… and just like before the compiler will simplify that further to y = 6, which we can verify from the core output:

y = I# 6#

What is specialization?

“Specialization” means replacing a “polymorphic” function with a “monomorphic” function. A “polymorphic” function is a function whose type has a type variable, like this one:

-- Here `f` is our type variable
example :: Functor f => f Int -> f Int
example = fmap (+ 1)

… and a “monomorphic” version of the same function replaces the type variable with a specific (concrete) type or type constructor:

example2 :: Maybe Int -> Maybe Int
example2 = fmap (+ 1)

Notice that example and example2 are defined in the same way, but they are not exactly the same function:

  • example is more flexible and works on strictly more type constructors

    example works on any type constructor f that implements Functor, whereas example2 only works on the Maybe type constructor (which implements Functor).

  • example and example2 compile to very different core representations

In fact, they don’t even have the same “shape” as far as GHC’s core representation is concerned. Under the hood, the example function takes two extra “hidden” function arguments compared to example2, which we can see if you dump the core output (and I’ve tidied up the output a lot for clarity):

example @f $Functor = fmap $Functor (\v -> v + 1)

example2 Nothing = Nothing
example2 (Just a) = Just (a + 1)

The two extra function arguments are:

  • @f: This represents the type variable f

    Yes, the type variable that shows up in the type signature also shows up at the term level in the GHC core representation. If you want to learn more about this you might be interested in my Polymorphism for Dummies post.

  • $Functor: This represents the Functor instance for f

    Yes, the Functor instance for a type like f is actually a first-class value passed around within the GHC core representation. If you want to learn more about this you might be interested in my Scrap your Typeclasses post.

Notice how the compiler cannot optimize example as well as it can optimize example2 because the compiler doesn’t (yet) know which type constructor f we’re going to call example on and also doesn’t (yet) know which Functor f instance we’re going to use. However, once the compiler does know which type constructor we’re using it can optimize a lot more.

In fact, we can see this for ourselves by changing our code a little bit to simply define example2 in terms of example:

example :: Functor f => f Int -> f Int
example = fmap (+ 1)

example2 :: Maybe Int -> Maybe Int
example2 = example

This compiles to the exact same code as before (you can check for yourself if you don’t believe me).

Here we would say that example2 is “example specialized to the Maybe type constructor”. When write something like this:

example2 :: Maybe Int -> Maybe Int
example2 = example

… what’s actually happening under the hood is that the compiler is actually doing something like this:

example2 = example @Maybe $FunctorMaybe

In other words, the compiler is taking the more general example function (which works on any type constructor f and any Functor f instance) and then “applying” it to a specific type constructor (@Maybe) and the corresponding Functor instance ($FunctorMaybe).

In fact, we can see this for ourselves if we generate core output with optimization disabled (-O0 instead of -O2) and if we remove the -dsuppress-all flag:

$ ghc -O0 -fforce-recomp -ddump-simpl Example.hs

This outputs (among other things):

…

example2 = example @Maybe $FunctorMaybe
…

And when we enable optimizations (with -O2):

$ ghc -O2 -fforce-recomp -ddump-simpl -dsuppress-all Example.hs

… then GHC inlines the definition of example and simplifies things further, which is how it generates this much more optimized core representation for example2:

example2 Nothing = Nothing
example2 (Just a) = Just (a + 1)

In fact, specialization is essentially the same thing as inlining under the hood (I’m oversimplifying a bit, but they are morally the same thing). The main distinction between inlining and specialization is:

  • specialization simplifies function calls with “type-level” arguments

    By “type-level” arguments I mean (hidden) function arguments that are types, type constructors, and type class instances

  • inlining simplifies function calls with “term-level” arguments

    By “term-level” arguments I mean the “ordinary” (visible) function arguments you know and love

Does GHC always inline or specialize code?

NO. GHC does not always inline or specialize code, for two main reasons:

  • Inlining is not always an optimization

    Inlining can sometimes make code slower. In particular, it can often be better to not inline a function with a large implementation because then the corresponding CPU instructions can be cached.

  • Inlining a function requires access to the function’s source code

    In particular, if the function is defined in a different module from where the function is used (a.k.a. the “call site”) then the call site does not necessarily have access to the function’s source code.

To expand on the latter point, Haskell modules are compiled separately (in other words, each module is a separate “compilation unit”), and the compiler generates two outputs when compiling a module:

  • a .o file containing object code (e.g. Example.o)

    This object code is what is linked into the final executable to generate a runnable program.

  • a .hi file containing (among other things) source code

    The compiler can optionally store the source code for any compiled functions inside this .hi file so that it can inline those functions when compiling other modules.

However, the compiler does not always save the source code for all functions that it compile because there are downsides to storing source code for functions:

  • this slows down compilation

    This slows down compilation both for the “upstream” module (the module defining the function we might want to inline) and the “downstream” module (the module calling the function we might want to inline). The upstream module takes longer to compile because now the full body of the function needs to be saved in the .hi file and the downstream module takes longer to compile because inlining isn’t free (all optimizations, including inlining, generate more work for the compiler).

  • this makes the .hi file bigger

    The .hi file gets bigger because it’s storing the source code of the function.

  • this can also make the object code larger, too

    Inlining a function multiple times can lead to duplicating the corresponding object code for that function.

This is why by default the compiler uses its own heuristic to decide which functions are worth storing in the .hi file. The compiler does not indiscriminately save the source code for all functions.

You can override the compiler’s heuristic, though, using …

Compiler directives

There are a few compiler directives (a.k.a. “pragmas”) related to inlining and specialization that we’ll cover here:

  • INLINABLE
  • INLINE
  • NOINLINE
  • SPECIALIZE

My general rule of thumb for these compiler directives is:

  • don’t use any compiler directive until you benchmark your code to show that it helps
  • if you do use a compiler directive, INLINABLE is probably the one you should pick

I’ll still explain what what all the compiler directives mean, though.

INLINABLE

INLINABLE is a compiler directive that you use like this:

f :: Int -> Int
f x = x + 1
{-# INLINABLE f #-}

The INLINABLE directive tells the compiler to save the function’s source code in the .hi file in order to make that function available for inlining downstream. HOWEVER, INLINABLE does NOT force the compiler to inline that function. The compiler will still use its own judgment to decide whether or not the function should be inlined (and the compiler’s judgment tends to be fairly good).

INLINE

INLINE is a compiler directive that you use in a similar manner as INLINABLE:

f :: Int -> Int
f x = x + 1
{-# INLINE f #-}

INLINE behaves like INLINABLE except that it also heavily biases the compiler in favor of inlining the function. There are still some cases where the compiler will refuse to fully inline the function (for example, if the function is recursive), but generally speaking the INLINE directive override’s the compiler’s own judgment for whether or not to inline the function.

I would argue that you usually should prefer the INLINABLE pragma over the INLINE pragma because the compiler’s judgment for whether or not to inline things is usually good. If you override the compiler’s judgment there’s a good chance you’re making things worse unless you have benchmarks showing otherwise.

NOINLINE

If you mark a function as NOINLINE:

f :: Int -> Int
f x = x + 1
{-# NOINLINE f #-}

… then the compiler will refuse to inline that function. It’s pretty rare to see people use a NOINLINE annotation for performance reasons (although there are circumstances where NOINLINE can be an optimization). It’s far, far, far more common to see people use NOINLINE in conjunction with unsafePerformIO because that’s what the unsafePerformIO documentation recommends:

Use {-# NOINLINE foo #-} as a pragma on any function foo that calls unsafePerformIO. If the call is inlined, the I/O may be performed more than once.

SPECIALIZE

SPECIALIZE lets you hint to the compiler that it should compile a polymorphic function for a monomorphic type ahead of time. For example, if we define a polymorphic function like this:

example :: Functor f => f Int -> f Int
example = fmap (+ 1)

… we can tell the compiler to go ahead and specialize the example function for the special case where f is Maybe, like this:

example :: Functor f => f Int -> f Int
example = fmap (+ 1)
{-# SPECIALIZE example :: Maybe Int -> Maybe Int #-}

This tells the compiler to go ahead and compile the more specialized version, too, because we expect some other module to use that more specialized version. This is nice if we want to get the benefits of specialization without exporting the function’s source code (so we don’t bloat the .hi file) or if we want more precise control over when specialize does and does not happen.

In practice, though, I find that most Haskell programmers don’t want to go to the trouble of anticipating and declaring all possible specializations, which is why I endorse INLINABLE as the more ergonomic alternative to SPECIALIZE.

Thursday, August 29, 2024

Firewall rules: not as secure as you think

Firewall rules: not as secure as you think

This post introduces some tricks for jailbreaking hosts behind “secure” enterprise firewalls in order to enable arbitrary inbound and outbound requests over any protocol. You’ll probably find the tricks outlined in the post useful if you need to deploy software in a hostile networking environment.

The motivation for these tricks is that you might be a vendor that sells software that runs in a customer’s datacenter (a.k.a. on-premises software), so your software has to run inside of a restricted network environment. You (the vendor) can ask the customer to open their firewall for your software to communicate with the outside world (e.g. your own datacenter or third party services), but customers will usually be reluctant to open their firewall more than necessary.

For example, you might want to ssh into your host so that you can service, maintain, or upgrade the host, but if you ask the customer to open their firewall to let you ssh in they’ll usually push back on or outright reject the request. Moreover, this isn’t one of those situations where you can just ask for forgiveness instead of permission because you can’t begin to do anything without explicitly requesting some sort of firewall change on their part.

So I’m about to teach you a bunch of tricks for efficiently tunneling whatever you want over seemingly innocuous openings in a customer’s firewall. These tricks will culminate with the most cursed trick of all, which is tunneling inbound SSH connections inside of outbound HTTPS requests. This will grant you full command-line access to your on-premises hosts using the most benign firewall permission that a customer can grant. Moreover, this post is accompanied by a repository named holepunch containing NixOS modules automating this ultimate trick which you can either use directly or consult as a working proof-of-concept for how the trick works.

Overview

Most of the tricks outlined in this post assume that you control the hosts on both ends of the network request. In other words, we’re going to assume that there is some external host in your datacenter and some internal host in the customer’s datacenter and you control the software running on both hosts.

There are four tricks in our arsenal that we’re going to use to jailbreak internal hosts behind a restrictive customer firewall:

Once you master these four tools you will typically be able to do basically anything you want using the slimmest of firewall permissions.

You might also want to read another post of mine: Forward and reverse proxies explained. It’s not required reading for this post, but you might find it helpful or interesting if you like this post.

Proxies

We’re going to start with proxies since that’s the easiest thing to explain which requires no other conceptual dependencies.

A proxy is a host that can connect to other hosts on a client’s behalf (instead of the client making a direct connection to those other hosts). We will call these other hosts “upstream hosts”.

One of the most common tricks when jailbreaking an internal host (in the customer’s datacenter) is to create an external host (in your datacenter) that is a proxy. This is really effective because the customer has no control over traffic between the proxy and upstream hosts. The customer’s firewall can only see, manage, and intercept traffic between the internal host and the proxy, but everything else is invisible to them.

There are two types of proxies, though: forward proxies and reverse proxies. Both types of proxies are going to come in handy for jailbreaking our internal host.

Forward proxy

A forward proxy is a proxy that lets the client decide which upstream host to connect to. In our case, the “client” is the internal host that resides in the customer datacenter that is trying to bypass the firewall.

Forward proxies come in handy when the customer restricts which hosts that you’re allowed to connect to. For example, suppose that your external host’s address is external.example.com and your internal hosts’s address is internal.example.com. Your customer might have a firewall rule that prevents internal.example.com from connecting to any host other than external.example.com. The intention here is to prevent your machine from connecting to other (potentially malicious) machines. However, this firewall rule is quite easy for a vendor to subvert.

All you have to do is host a forward proxy at external.example.com and then any time internal.example.com wants to connect to any other domain (e.g. google.com) it can just route the request through the forward proxy hosted at external.example.com. For example, squid is one example of a forward proxy that you can use for this purpose, and you could configure it like this:

acl internal src ${SUBNET OF YOUR INTERNAL SERVER(S)}

http_access allow internal
http_access deny all

… and then squid will let any program on internal.example.com connect to any host reachable from external.example.com so long as the program configured http://external.example.com:3128 as the forward proxy. For example, you’d be able to run this command on internal.example.com:

$ curl --proxy http://external.example.com:3128 https://google.com

… and the request would succeed despite the firewall because from the customer’s point of view they can’t tell that you’re using a forward proxy. Or can they?

Reverse proxy

Well, actually the customer can tell that you’re doing something suspicious. The connection to squid isn’t encrypted (note that the scheme for our forward proxy URI is http and not https), and most modern firewalls will be smart enough to monitor unencrypted traffic and notice that you’re trying to evade the firewall by using a forward proxy (and they will typically block your connection if you try this). Oops!

Fortunately, there’s a very easy way to evade this: encrypt the traffic to the proxy! There are quite a few ways to do this, but the most common approach is to put a “TLS-terminating reverse proxy” in front of any service that needs to be encrypted.

So what’s a “reverse proxy”? A reverse proxy is a proxy where the proxy decides which upstream host to connect to (instead of the client deciding). A TLS-terminating reverse proxy is one whose sole purpose is to provide an encrypted endpoint that clients can connect to and then it forwards unencrypted traffic to some (fixed) upstream endpoint (e.g. squid running on external.example.com:3128 in this example).

There are quite a few services created for doing this sort of thing, but the three I’ve personally used the most throughout my career are:

  • nginx
  • haproxy
  • stunnel

For this particular case, I actually will be using stunnel to keep things as simple as possible (nginx and haproxy require a bit more configuration to get working for this).

You would run stunnel on external.example.com with a configuration that would look something like this:

[default]
accept = 443
connect = localhost:3128
cert = /path/to/your-certificate.pem

… and now connections to https://external.example.com are encrypted and handled by stunnel, which will decrypt the traffic and route those requests to squid running on port 3128 of the same machine.

In order for this to work you’re going to need a valid certificate for external.example.com, which you can obtain for free using Let’s Encrypt. Then you staple the certificate public key and private key to generate the final PEM file that you reference in the above stunnel configuration.

So if you’ve gotten this far your server can now access any publicly reachable address despite the customer’s firewall restriction. Moreover, the customer can no longer detect that anything is amiss because all of your connections to the outside world will appear to the customer’s firewall as encrypted HTTPS connections to external.example.com:443, which is an extremely innocuous type of of connection.

Reverse tunnel

We’re only getting started, though! By this point we can make whatever outbound connections we want, but WHAT ABOUT INBOUND CONNECTIONS?

As it turns out, there is a trick known as a reverse tunnel which lets you tunnel inbound connections over outbound connections. Most reverse tunnels exploit two properties of TCP connections:

  • TCP connections may be long-lived (sometimes very long-lived)
  • TCP connections must necessarily support network traffic in both directions

Now, in the common case a lot of TCP connections are short-lived. For example, when you open https://google.com in your browser that is an HTTPS request which is layered on top of a TCP connection. The HTTP request message is data sent in one direction over the TCP connection and the HTTP response message is data sent in the other direction over the TCP connection and then the TCP connection is closed.

But TCP is much more powerful than that and reverse tunnels exploit that latent protocol power. To illustrate how that works I’ll use the most widely known type of reverse tunnel: the SSH reverse tunnel.

You typically create an SSH reverse tunnel by running a command like this from the internal machine (e.g. internal.example.com):

$ ssh -R "${EXTERNAL_PORT}:localhost:${INTERNAL_PORT}" -N external.example.com

In an SSH reverse tunnel, the internal machine (e.g. internal.example.com) initiates an outbound TCP request to the SSH daemon (sshd) listening on the external machine (e.g. external.example.com). When sshd receives this TCP request it keeps the TCP connection alive and then listens for inbound requests on EXTERNAL_PORT of the external machine. sshd forward all requests received on that port through the still-alive TCP connection back to the INTERNAL_PORT on the internal machine. This works fine because TCP connections permit arbitrary data flow both ways and the protocol does not care if the usual request/response flow is suddenly reversed.

In fact, an SSH reverse tunnel doesn’t just let you make inbound connections to the internal machine; it lets you make inbound connections to any machine reachable from the internal machine (e.g. other machines inside the customer’s datacenter). However, those kinds of connections to other internal hosts can be noticed and blocked by the customer’s firewall.

From the point of view of the customer’s firewall, our internal machine has just made a single long-lived outbound connection to external.example.com and they cannot easily tell that the real requests are coming in the other direction (inbound) because those requests are being tunneled inside of the outbound request.

However, this is not foolproof, for two reasons:

  • A customer’s firewall can notice (and ban) a long-lived connection

    I believe it is possible to disguise a long-lived connection as a series of shorter-lived connections, but I’ve never personally done that before so I’m not equipped to explain how to do that.

  • A customer’s firewall will notice that you’re making an SSH connection of some sort

    Even when the SSH connection is encrypted it is still possible for a firewall to detect that the SSH protocol is being used. A lot of firewalls will be configured to ban SSH traffic by default unless explicitly approved.

However, there is a great solution to that latter problem, which is …

corkscrew

corkscrew is an extremely simple tool that wraps an SSH connection in an HTTP connection. This lets us disguise SSH traffic as HTTP traffic (which we can then further disguise as HTTPS traffic by encrypting the connection using stunnel).

Normally, the only thing we’d need to do is to extend our ssh -R command to add this option:

ssh -R -o 'ProxyCommand /path/to/corkscrew external.example.com 443 %h %p` …

… but this doesn’t work because corkscrew doesn’t support HTTPS connections (it’s an extremely simple program written in just a couple hundred lines of C code). So in order to work around that we’re going to use stunnel again, but this time we’re going to run stunnel in “client mode” on internal.example.com so that it can handle the HTTPS logic on behalf of corkscrew.

[default]
client = yes
accept = 3128
connect = external.example.com:443

… and then the correct ssh command is:

$ ssh -R -o 'ProxyCommand /path/to/corkscrew localhost 3128 %h %p` …

… and now you are able to disguise an outbound SSH request as an outbound HTTPS request.

MOREOVER, you can use that disguised outbound SSH request to create an SSH reverse tunnel which you can use to forward inbound traffic from external.example.com to any INTERNAL_PORT on internal.example.com. Can you guess what INTERNAL_PORT we’re going to pick?

That’s right, we’re going to forward inbound traffic to port 22: sshd. Also, we’re going to arbitrarily set EXTERNAL_PORT to 17705:

$ ssh -R 17705:localhost:22 -N external.example.com

Now, (separately from the above command) we can ssh into our internal server via our external server like this:

$ ssh -p 17705 external.example.com

… and we have complete command-line access to our internal server and the customer is none the wiser.

From the customer’s perspective, we just ask them for an innocent-seeming firewall rule permitting outbound HTTPS traffic from internal.example.com to external.example.com. That is the most innocuous firewall change we can possibly request (short of not opening the firewall at all).

Conclusion

I don’t think all firewall rules are ineffective or bad, but if the same person or organization controls both ends of a connection then typically anything short of completely disabling internet access can be jailbroken in some way with off-the-shelf open source tools. It does require some work, but as you can see with the associated holepunch repository even moderately sophisticated firewall escape hatches can be neatly packaged for others to reuse.