
my name is Joe Testa I'm uh the co-founder of positron security uh it's a local uh computer security company and I'm here to present on a new open source project called bit clamp um it's a tool to publish uh documents files into the Bitcoin blockchain uh so I'm going to go go into uh the details of that uh I know not everybody is experienced with Bitcoin so I figured I'd get give a little bit of a primer um to really go into all the details of Bitcoin would be its own presentation easily uh so I'm just going to give a quick quick primer here actually I show a hands how many people have experience at least some
experience using Bitcoin okay about half so here's a quick primer uh Bitcoin is a 100% digital currency uh use a strong cryptography to protect against fraud um it's designed to be decentralized and I guess you can kind of say it still is although that's a little debatable these days um payments can be anonymous common thing people like to say is that Bitcoin is anonymous um I would say that they can be anonymous depending on all how how you go about using it uh and payments are permanent that's one of the central uh things behind it is that if you send a payment to somebody it's there it's gone forever unless they choose to send it back uh
that's there forever so the big question is why would you use Bitcoin to publish uh like I said before anything you do in the Bitcoin Network it's permanent uh so that's a that's a pretty interesting property to have if you're trying to publish files or something like that um it's also like I said before semi Anonymous um to remove anything uh I said before it's permanent um technically speaking you could in theory remove something from the from the the transaction log but it would require just ridiculous amounts of computing power you'd need about 1,300 pah hash per second or something around there that's just a massive amount of computing power uh so it's very unlikely that anybody
else has that amount of power so this is the perfect situation for uh whistleblower type documents uh the ability if you could figure out a way to put data into the Bitcoin blockchain which is the transaction log um you know those documents are going to be there available to everybody and they're going to be there forever so here's an outline of what I'm going to cover I'm going to go into a little bit of uh some some technicals behind Bitcoin uh and and some of the background uh there's two prior methods of publication uh this idea of publishing into Bitcoin is not new I'm not presenting that as new uh that's been done before but the existing
methods have some pretty serious shortcomings so I'm going to talk about that and then I'm going to unveil a new method that I came up with myself and analyze what its properties are and then hopefully I'll have time to talk about some of the the more advanced features so a little bit more background just to understand the later parts of my presentation so Bitcoin has its own scripting language which is pretty pretty interesting um so if I se send coins to some address uh there can be a script embedded in it so that the recipient has to uh satisfy the logic somehow so this uh this language it's stack based uh there's conditional statements there's wise logic you can do
uh there's uh arithmetic that you can do in this language and of course there's cryptographic functions um many of them for standard transactions are actually disabled and I'll get into that a little bit later um and if anybody's interested there's a link to the documentation of of this scripting language so a little bit more background uh transaction fees um trying to think how to describe this because I don't want to get too deep into Bitcoin but um the people that process the transactions in Bitcoin they're called miners uh what they do is they just run run the shot 256 hash function over and over and over again to try to what's called solve a block of Unsolved of
unconfirmed transactions um and the idea is that if you succeed in this you get paid you get a block reward at this point the block reward is 25 Bitcoin so that's their motivation for doing all this hashing to to confirm transactions is that if they succeed at it they get a whole bunch of money um and the current price is I think $450 per Bitcoin so that times 25 equals quite a lot of money uh so that's the the block reward happens to be the main motivator but it's not the only one uh there's transaction fees that can be attached now they they aren't exactly mandatory but the transaction system is pretty much a free
market so if I want to send money to some other person uh if I want to get that transaction confirmed quickly I would have to attach a fee um I could try to cheap out and either do no fee or low fee um but the miners are are self-interested so if they see well here's like a whole bunch of high fee transactions I'm going to I'm going to prioritize these and I'm going to deprioritize uh the low fee or no fee transactions so it's a free market so I can try to cheap out if I want um but it just means my transactions won't be confirmed very quickly so the current rate is uh 0.003 Bitcoin per kilobyte which is a
pretty weird it's a pretty weird uh system it's not like like in a traditional system like credit cards or PayPal or something where they take a percentage of what you're you're transferring this is uh a fee per kilobyte so that's that's per however large your transaction is in bites so it's a pretty weird thing that uh trips up a lot of users because they doesn't really make sense if they send sometimes you know if you send $ thousand dollars worth of bitcoin to somebody you might pay a very small fee but later on you might send you know small amount of money and suddenly got to pay you know five times in the fee it doesn't
entirely make sense it just all has to do with how the transactions are structured under the
pit so now I'm going to go into uh the first the first method that somebody used to to to put data into the blockchain um little bit of background on this first so to generate a Bitcoin address this would be basically how it's structured uh you start with an ecdsa key the public the public key uh you run hash two hash functions on it first shot 256 and then rip MD 160 uh you prepend a zero byte and then you append some check some which doesn't really matter in this in this context and then you base 58 encode it all to get that string in the middle Now That Base 58 uh it's basically B 64 just
with some of the characters removed and that's that's basically it um the punch line is a base 58 is reversible just like Bas
64 so one thing to notice is that the ripe MD 160 output that happens to be 20 bytes so if you want wanted to publish 20 bytes of data you just set um your data to be that that 20 bytes excuse me then you Preen that zerobyte and a pen to check sum and then you generate that address let me back up there so basically instead of calculating that ripe rip MD output you just set your 20 bytes to be that calculate the check sum prepend the zero byte basics base 58 that and there you go you've got a a Bitcoin address that appears to be valid now the network doesn't know that it's invalid it just
sees that that address and says okay well the check sum's valid so
okay so pretty much you can do this over and over again you just generate all these uh fake addresses and just send a very tiny amount of Bitcoin to them um and in a single transaction you actually have up to 256 outputs so what that means is in a in a single transaction I can say all right I want to give you know half a Bitcoin to this person3 Bitcoin to this person 03 Bitcoin to this person and so on um so one transaction could actually have 256 of those in theory so in one transaction you've got 20 bytes times 256 so you've got about 5 K actually it's exactly 5K which is kind of small to publish a
lot of files so if you've got something bigger you just use another transaction so just as an example just to go over this again so if you have the the text ABCD EFG you just first convert it to asky uh and there's the hex 65 66 67 you calculate that check sum you run through that base 58 and just send money to that address and just keep keep repeating so somebody used this method to publish the original Bitcoin paper uh the original PDF I think it was maybe 175k um so yeah that that they use this to to publish into the blockchain already um when I was making these slides though I couldn't find it
anywhere um which is kind of funny because I think that says a lot about this publication method as a whole it's like okay it's there but I can't find it I can't extract it so how useful is it really I mean it might as well not be there so major disadvantages to this this method um you're actually destroying coins um because you're generating bogus addresses those addresses don't have a valid private key so nobody can ever spend those coins again they're just destroyed forever uh a big one that uh and this is what the the Bitcoin developers really had an issue with is uh it pollutes the utxo database uh that stands for the unspent
transaction outputs database uh so that's pretty much what it sounds like the Bitcoin network will will maintain a list of okay at any given time here's a whole list of all unspent transactions that way when a new transaction comes in uh the network can say oh okay you're referencing this one which happens to be unspent uh so therefore it pass passes this check so the problem is with that method of generating all these bogus addresses um those transactions will always remain in the utxo database they will never leave because if a a a transaction then becomes spent at some point in the future it gets removed from that database so all those entries are in the
utxo database and they will never leave or results in pollution over time so the the Bitcoin developers had a big problem with this and I'll get into that probably in the next slide so another other major disadvantage is that it's it's very expensive um the current minimum that you can send to an address I think is this uh 0.00000000 504 Bitcoin so you would have to send that minimum amount for every 20 bytes you want to publish and that gets really expensive uh and on top of that you have to pay transaction fees so a one one megabyte if you wanted to push out with this method would cost about6 Bitcoin which at today's today's rate is
$255 that's pretty expensive for just one megabyte so yeah like I said before the the developers really didn't like this utxo pollution uh so they came up with a compromise like I said before there the Bitcoin uh network has a scripting language so one of the op codes is this op return which in programming languages is basically saying like return false um so they changed this to allow anybody to put at first 40 bytes and then I think they extended it to 80 bytes of just arbitrary data um and the punchline here is that op return always results in Failure so the transaction can never be spent and and the network knows this so it knows that
it doesn't ever have to add it to the utxo database to begin with so it prevents this pollution problem but you know on the on the other hand you've only got about 80 bytes to to work with which is very tiny that's one of the the main uh problems with this uh is that the payload is so tiny with just 80 bytes if you wanted to reference another transactions to say continue continue publishing blocks um with just 80 bytes the transaction IDs I think are 32 bytes so you'd have to reference that in the header and then you've only got 48 bytes remaining to do a payload and it's just extremely inefficient if you if you
want to publish anything large it's extremely inefficient one of the things I said before is that you've got up to 256 outputs so one might think okay well that's 80 bytes time 256 I guess that's maybe not so bad no unfortunately there's a restriction that that says you can only do one op return per transaction so you're just limited 80 bytes per transaction period regardless people are using this method still uh they're using this to to publish hashes and there's a website here uh eternity wall. that lets you just uh just type in a box and click publish and you just publish some small amount of text in the blockchain so this is a this is a method that's
actively used just in a limited limited fashion so here's my goals for a new method since since these two existing methods have so many downsides um there's no way that you can use them to publish files of arbitrary size so that's one of the goals that I have is I want to be able to publish a file of any size I also want to be able to search Andor extract the data like I said before the you know if you can't do that it the data might as well not be there I also want to minimize the cost and I want these tools to be available to everybody
so I started digging around the the Bitcoin code and I happened upon this uh is standard TX method call and this happens to be the gatekeeper to whether or not a transaction will be relayed across the network uh so if you wanted to play around with you know any of the the various scripting language uh components to try to like jam in data into a transaction uh you need to pass this is standard TX method call so I started by analyzing that I started looking at these uh multisig addresses that happen to be the key to uh to solving everything um so a little bit about these multisig addresses it's kind of a it's kind of an
obscure feature but pretty interesting so it's a system so you can say here's uh M Keys public keys and to spend these coins any n of them are needed and that's up to 15 so what you can do is you can say all right well here's seven keys and in order to spend these coins uh any three signatures need to be present or you can say here's 11 keys I five of these need to be present so this can be kind of useful if you want to protect let's say against malware or theft you can say all right here's a two of two scheme so let's say you have one key on your laptop and one key on your phone uh
let's say your your phone you lose it or somebody steals it so they recover the key from there well in that case they the whoever stole your phone can't spend your coins because they would need a signature from the key that's on your laptop so it's a separation of keys in order to uh to allow spending toer and just FYI there at the bottom I I I just have an example if you wanted to spot out what a multi- address looks like they begin with a number three versus uh I think the standard ones begin with a one just gives you an easy way to to identify
them so I looked at this and I said all right well there's 15 Keys up to 15 keys that are supported so let's use them uh how about we do something where we we generate one legitimate key and we set up a system that says all right only one out of these 15 are needed and the other 14 since one is the legitimate key the other 14 happen to be just data that's the data that you want to publish so in this case the the keys that are presented uh are the raw EC DSA keys and those are 32 bytes so you've got 14 * 32 = 448 bytes and this isn't just per transaction this
is per output so you can have 200 uh in theory I'm going to get into the restrictions later but in theory 448 times for every 256 uh maximum outputs that's quite a lot so the advantages large payload uh another big one transactions remain spendable so because the coins are never destroyed because there's you you always you always have that one legitimate key uh to actually to to ensure that it remains spendable uh that means the transactions are spendable so that uh the utxo database doesn't get polluted so that's that's a key point right here uh you only need to pay transaction fees uh for this uh if at all and I'm going to get into some situations
later uh now I said before it's 448 bytes per output um but unfortunately there's a there's a hard limit of 10,000 bytes so your transactions need to fit within that um but I think you can get somewhere around 90 to 95% efficiency um because there's there's going to be overhead in terms of like you know data structure and your transaction and so on so you can you can get pretty close to that 10,000 bytes and maybe I don't know 9,000 bytes would be your data payload so that that's pretty good so here are some numbers to publish one megabyte using this method will cost up to35 Bitcoin which is about $150 now notice I've got in italics
they're up to and I'll get into later uh so more about that in contrast this with the the about $255 guaranteed you will burn in the first method so that's pretty good just right off the
bat so the fastest publication time that you could get with uh for one megabyte would be 20 hours and that's because Bitcoin targets about one new block about every 10 minutes so if you're publishing about 9,000 bytes every 10 minutes it's going to take you roughly about 20 hours to publish a megabyte now there's a tradeoff as I kept alluding to there's a trade-off between the fees paid and the time to publish so if you wanted if you wanted to pay less and you you were okay with waiting longer uh you just reduce your fees and uh I don't don't have any hard data as to as to the effects of this but um you know you could lower your
transaction fees lower and lower until you got like a publication of time of days or weeks or even months and you know depending on what what situation you're in you might not care like okay it takes me six weeks to publish this uh this document I'm okay with that um remember this isn't this isn't like your own personal cloud backup service you know like oh I just took pictures of me at the park I want to store them into the Bitcoin blockchain that's not really the the use case here this is you know some you want to publish something very serious and you want it there forever that nobody can ever delete so in that kind of
situation you might be okay with all right it's going to take you know five weeks to to publish something and I'm going to pay I don't know 20 $30 might not be so bad one thing to notice is that uh Bitcoin isn't the only thing that you can publish into um there are other cryptocurrencies that that pretty much started by copying the Bitcoin source code and then just making a little bit of changes from there Litecoin was one of the first ones and then a lot of other cryptocurrencies started off of of that so those cryptocurrencies all inherited unless the developers actively you know went back and disabled them they all inherited the multisig functionality so
you can publish into other other blockchains like Dogecoin so here's a little bit of an analysis of that in Dogecoin the block time is 1 minute versus 10 minutes for Bitcoin and another major major advantage is there's extremely low or even a lot of times no trans action fees at all the disadvantage is though the network has a much lower hash rate uh than Bitcoin and the hash rate the the collection of all the hash rate uh for all miners that's what specifically protects the blockchain from being reverted um so it is a bit more vulnerable to uh transactions being reverted which means you know Publications being being reverted there's also a question of will
it survive Dogecoin was very popular about two years ago and then in recent times it it it died down and then sort of leveled off so there's still more of a a question of will it survive and we're talking about this context where I want to publish something permanently there's sort of a question mark there um but maybe that's okay I mean it's a tradeoff there's many trade-offs in this whole system so it might be okay you might be okay with that uh because those advantages are pretty massive one me one megabyte can be published for just 24 cents because the transaction fees are just so low and because that block time is 10 times faster you can publish uh a
megabyte in two hours even more is the fact that their blocks tend to be mostly empty so there's no contention there uh what that means is you can probably flood the network with multiple transactions per block so it's possible that you could even you can get much less than two hours you could potentially um publish a megabyte in just a matter of minutes let me just check my time here
so here's some special features that I have uh implemented uh it by default supports temporal encryption so if you go publish a a a file by default it'll use gpg2 to encrypt the file with a random key and in the last block the key is divulge there so this protects the content while you're publishing uh which might be a very useful feature to have you might not want anybody seeing what it is you know that's that's being published until it's done uh on the other hand you might be in a situation where you want as much you want to jam in as much as possible as quickly as possible because maybe somebody's on to you and they're about
to stop you so you can turn off encryption um so that's up to you I began implementing another feature a dead man switch and this is pretty pretty interesting uh this can be an an insurance policy for some people in some very special situations uh let's say Okay so the idea is that you would publish an encrypted archive except that last block doesn't have the key and then you can set up some kind of external process so that you know this person would have to check into on some kind of regular basis like every X hours or X days or once a week or something like that and the idea is that if this person
disappears if they get arrested or fall down the flight of stairs this process will automatically uh publish the key so it can be an insurance mechanism this is called a dead man switch um I don't have it fully implemented yet in the code but it's
coming so uh a pretty big question I think that comes up would be can Bitcoin block this can the can the developers block this and I think there's some things that they can do but not really without changing the functionality of those multisig uh the support for the multisig functionality so they can reduce the the number of keys from 15 to something lower but you know that obviously interfere can interfere with what people are are legitimately using it for um and if they did that you can just bump up the number of outputs per transaction so you you should be able to always fill up that 10 10,000 byte limit one way or another so it seems like
probably they probably can't block this the thing is they might not even care um one of the biggest things they cared about in the beginning was the utxo pollution uh which would create a pretty major problem for um and this system doesn't doesn't result in that so I'm thinking they they probably don't even care on top of that the miners still get transaction fees so it's not like you're abusing the network in any way the miners are still getting rewarded for this data that you're giving them so I'm thinking they might not even care so the code I've got available right now there's uh the link to it it's up on GitHub it's currently in beta I
kind of wish I had more time to finish some of the features and and things like that but unfortunately I could only get it in the state of a of beta um so if you guys want to test it out and help me uh get that to a version 1.0 stable I'm I'm hoping that'll be done by the summer and here's kind of the punchline of this whole project is the fact that I don't want this to be some kind just just another nerd tool you know it's a it's a command line python application there on GitHub I mean how many end users are going to be able to figure this out you know the techies can but
you know normal everyday people it's like what's a python script you know how do I command line like it doesn't make any sense to them so I I don't want this to be some kind of like obscure hacker tool I want the general public to be able to do this so I'm going to be writing a website front end for it so people will be able to go to a website and you'll be able to browse here's like the last 10 things in the Bitcoin blockchain you know here's the last 10 things in Dogecoin You' be able to click download stuff um there'll be a search bar there so you can search you know based on file
name based on file type stuff like that um and the the ETA I think of that is also going to be the summertime and of course you'd be able to publish on the website as well so that that that's really the punchline for me for this
project any questions yes so all your data is going into into the extra addresses for the for the multi uh for the multi transaction right so so I'm me I'm just wondering what happens regarding increasing the potential for collision between addresses
um yeah I suppose that can happen um but the but th those addresses whatever so it'll send like a very very small amount of like the the the bare minimum of Bitcoin to that address but what gets sent there is not controllable except for that one legitimate key so like the the the bogus uh keys in this in this case cannot be used to control those funds the only the only thing that could be used to control the funds would be that one legitimate key the way it works is you would you start off the process you say at the command line you say here's a file I want to publish some other options and it'll say
all right to start start it off send X amount of Bitcoin to this address and then you externally send that to that address and it'll notice that and then it starts sending the Bitcoin back and forth back and forth to itself over and over again every transaction level and each time it's publishing about you know up to about 9,000 bytes or so so every time it's sending the these coins back and forth It's using that one legitimate key uh that only you have because you you generated it special for your own publication does that answer your question um not entirely but I mean I think probably a longer conversation I think were you were you
trying to ask like is it possible to somebody for somebody to hijack the I mean all of that right but the some of the other thoughts that came to mind was um just as far as the as far as the like just the address space I the address space is immense but um but those all those addresses are being generated in a very random way right now as you start to generate addresses based on things that are not random I I just see that at some point there could be collision between users using this um or I don't know it just it just seems that they're they're monkeying with the map it's cool but I just wonder if there's you know if
there's an inherent flaw there because we're we're kind of taking things that are more deterministic and we're we're replacing something that's very random that yeah I didn't uh include this any slides on how these multi- addresses I'm generating um because I I kind of thought I was going to throw way too much information at you guys anyway um as it is but this uh multisig address at the bottom the way that's generated is a hash of all the 15 Keys 14 uh fields which are data and then the one legitimate key they all get hashed in a certain way eventually leading to that so those 14 pieces of data may not be entirely random you know because if
you if everybody's publishing the same file over and over again it's going to be those are always going to be the the same but everybody generates a random key themselves so then you append all this stuff and there's a little bit of Randomness in there because of that unique key and then all of that is hashed to get that multisig address okay and so then funds are sent to this and then another address is generated funds are sent to that one and so on just so you actually have two legitimate addresses to be able to send back and forth um yeah well it's always maintaining yeah it prettyy much always maintains too and just keep keeps sending uh funds
back and forth back and forth um and then in the transaction it says oh and by the way here's all the 15 keys that that that's included in there as well yep yes you mentioned this was um you're going to include an easier method of
trying uh that's another slide that I intentionally didn't uh include just because I didn't want to flood people with information um so because you can so because you can publish you know about 9,000 bytes per block um there's there's pretty much it starts with a publication header that says here's the file name here's a description of what what's being published here's the file type here's whether it's encrypted or not and then uh and then the continuation block will say here's a here's a reference back to that the prior transaction that has the prior data and then so then basically they're all chained together and that at the end there's like a termination header so basically to to search through the
blockchain you just parse through and just keep looking for this header and then once you find that what's that you have you have to index when you're running to be able to search through all theaction without having just access yeah you do have to enable the the TX index yeah so yeah once you enable that then you you've got uh all the data you need and you can you use the RPC it uses the RPC uh calls to to just pull down on all the raw data and just pretty much looks for that header right now um the code doesn't support searching that's obviously a very major feature uh right now the code just supports like real time it just
pulls down what's what's being published in real time and that's just a current limitation question over here yeah do you see umer possible greater through put with segregated witness segregated witness it's an improvement oh is it uh one of the proposed
yeah oh okay yeah no I I actually don't know anything about that so I don't know basically it makes like all the transactions cheaper okay it's called segregated witness yeah basically um it separates the signature of the transaction from the transaction itself and so like miners they still need to get the
signatur effec oh okay yeah I suppose I should look into that any other questions oh yep have you um considered looked at etherum for this kind of project uh I've heard about it I've read a bit about about it but um not so much worth your time is it okay all right I suppose I will any other questions all right thank you very much I'll be around if anybody has any questions