Seeing Through The Deception - A Model for Detecting High Interaction Honeypots in the Wild

Name: Seeing Through The Deception - A Model for Detecting High Interaction Honeypots in the Wild
Uploaded: 2021-10-15
Duration: 45 min 9 s
Description: [NOTE: upload is missing some audio at the beginning of the presentation - please accept our apologies from the production crew] Session #3 Starts: 11:10, 40 mins in Fletcher Hall Seeing Through The Deception - A Model for Detecting High Interaction Honeypots in the Wild Presented by: Jason Pittma

BSides RDU · 202145:0955 viewsPublished 2021-10Watch on YouTube ↗

Speakers

Jason Pittman

Tags

CategoryResearch

ResearchEmpirical Research Technical Deep-dives

StyleTalk

Mentioned in this talk

Tools used

Cowrie OpenSSH

About this talk

[NOTE: upload is missing some audio at the beginning of the presentation - please accept our apologies from the production crew] Session #3 Starts: 11:10, 40 mins in Fletcher Hall Seeing Through The Deception - A Model for Detecting High Interaction Honeypots in the Wild Presented by: Jason Pittman A honeypot is an intentionally vulnerable computing system designed to deceive attackers into revealing their tools and techniques. The objective of deceiving attackers is either to develop new countermeasures or distract them from targeting production systems. But do honeypots work, are they effective in these roles? The honest answer is no one knows. The reason why no one knows is honeypot technology has a hidden bias- it only captures attacks directed to it. This bias might be self-evident but a critical side effect might not be so obvious. Spitzner introduced honeypots in 1999 and within five years research appeared asserting a method to detect and attack them. There are two reasons underpinning the significance of such a discovery. First, defenders deploy enough honeypots that adversaries have a vested interest in understanding how to attack them. Second, and less obvious, attackers must have a reliable methodology to detect honeypots. To that end, we don't know how many attackers don't attack honeypots. Think about it: do you know how an attacker might detect your honeypot or differentiate between deceptive and legitimate systems on your perimeter? After all, attackers have a vested interest in understanding how to detect honeypots in the wild. They don't want to waste time. They don't want to increase risk without raising the potential payout. They want to maintain confidentiality of their tools and techniques. Oddly enough, existing research is not clear on what network or system attributes reveal a high interaction honeypot. Consequently, no one knows to what extent attackers may be detecting and avoiding honeypots. At best, this limits the ability for researchers to develop more robust honeypots which can evade detection. The worse case however is the calling into question the legitimacy of the technology overall. Accordingly, in this talk I provide a potential model for honeypot detection along with the results of scanning 59,392 IP Addresses during a series of validation experiments. The findings demonstrate a set of characteristics usable as identification criteria. -- Jason Pittman Dr. Jason M. Pittman is a collegiate faculty member at University of Maryland Global Campus where he serves in the School of Cybersecurity and Technology. He recently served as an associate professor at High Point University in the computer science department. Previously, he was at California State Polytechnic University (Pomona) and Capitol Tech. University. Pittman holds a Bachelor of Arts in English Literature with a secondary in Biology from Malone College. He received his Master of Science in network security and Doctor of Science in information assurance from Capitol College. His areas of expertise are cybersecurity and artificial intelligence. His research interests include secure cloud architectures, artificial immune systems for cyber network defense, and information privacy. He brings with him ten years of experience at other academic institutions as well as more than 15 years of industry experience. The majority of these years have been spent at tech focused startups, most notably as Vice President of Security Research and Development at Silent Circle. He is an active scholar with over a hundred books, essays, journal articles, invited lectures, and conference presentations, collectively in his portfolio. https://bsidesrdu.org/session-3 https://youtu.be/FlM4pTiE5cs

Show transcript [en]

well what the hell are they so i'm going to give you my little definition if you catch a little bit of my bio my undergrad is actually split between english literature and biology so i wanted to be a field scientist but i got burned out so i switched to english literature i graduated undergrad and my mom said to me who's a librarian mind you what are you going to do with an english degree i'm going to cherish it look at me now mom look at me so words are important to me so i'm not going to assume that you don't know what a honeypot is but i'm going to tell you what my definition definition is so that

when we're operationalizing these concepts you can understand where i'm coming from i wanted to be a field biologist because i love observation i love watching if you hang out with me you'll notice that i stare a lot it's not because i'm trying to be intimidating i'm not an intimidating person i don't want to be but i like watching how people behave and how things behave because what things are and what people are is very important that's what makes them them and not something else and so to me a honeypot is a service or a system we deploy to lure behavior into us why why would you want to do that well primarily you want to study it

right really if you think anybody here loved going to the mall and watching yeah well the pandemic ruined it [ __ ] yeah honey pots i love them because that's the digital 21st century version of mall watching right you really think about it you're luring in behavior so you can watch it now most of us my profession is in academia for the most part and so i use honey pots to study tools techniques behaviors of adversaries i i would argue that that's mostly what people use honey pots for at this point however we do have the ability to deploy it as a deceptive control the idea being that i'll lure the adversary over here so that they don't

attack over here anybody run a honeypot in production yeah i did it one company um i guess it happens but here's the crazy thing about honey pots is nobody knows other than those of us that will be honest and transparent on this path to glory of how many there actually are now that's what a honey pot is that's why we use honey pots so how do they work this is where the semantics become important i'm going to primarily talk about what i refer to as high interaction the polar opposite that is low interaction completely makes sense right high low you have medium you have honey tokens you have hybrids you have honey nets all this crazy stuff that's not

what i'm talking about what i'm going to try and do is give you base concepts and ideas so that you can take this and apply it in different contexts that's my job today and so i'm interested in high interaction honey pods what i mean by that is i want to lure you in and let you do whatever you want to do and i need you to do whatever you want to do for as long as possible so that i can maximally study your behavior if that becomes limited in any way if i limit a blue team i'm not fulfilling my purpose my honeypot is flawed now low interactions for those of you that might be curious

but not aware of this really are just connection watchers a low interaction honeypot really just wants things to connect to it so it can track the connection itself not the behavior remember i come from the liberal arts behavior is important to me observation's important to me as a field scientist so speaking of behavior what do you guys think would make a good high interaction honey pot a good high interaction honey pot because a good high interaction honey pot is not a bad high interaction honey pot is it the word's different it's not a medium interaction honey pot and it's not a low interaction pot so that must mean something what are those characteristics i'll go

back to you since you were so brave the first time what do you think would make a good high interaction honey pot a good eye interaction hunnypot

um okay okay anybody else or am i still calling on people i'm a professor by trade i'll call you all day long yes my man right okay so i'm specifically interested in ssh and so if i have a high interaction honey pot that is mimicking ssh it better behave like ssh i should not be able to tell the difference right because if i can that's bad anybody else ideas for characteristics yes doesn't look like a honey pot it shouldn't be sticky now and i mean that because we can actually quantify that one of my research teams two years ago we published some research in which we were able to operationalize that variable of time and stickiness as

what's called sojourn time and this is why you being in the honeypot maximally matters if you connect and disconnect it it's a waste of a socket i need you to run around and run amok which is a whole set of can of worms but we'll get to that shortly anybody else any ideas yes lucrative right i mean why do you why does winnie the pooh stick his face in the honeypot and get stuck repeatedly because honey tastes good dude like right who in here doesn't like honey all right i'm gonna give you my definition these are great ideas it must be discoverable right so if i'm exposing ssh but as a honeypot i need you to be able to find it

if you can't find it you can't connect to it if you can't connect to it you can't behave within it right it has to be accessible so i actually have to let you connect as if it's a real ssh connection and it better behave like that or somebody's like something's up that's weird it's not doing what it's supposed to do and last but not least it must be interactable you have to be able to do what you would normally do i would assume most of us have some experience with ssh in just a normal sysadmin type role right anybody use telnet can i have your ip address listen i'm before lunch if you don't laugh at my jokes i'm not letting you go

i'll talk non-stop i'll filibuster this [ __ ] i'm not your typical academic right i get it dude it's all right okay so that's honeypot now let's talk about detecting and that's the rest of the run for us okay as a researcher why would detecting honeypots be important and these are all bifurcated there's two sides to each one one of these so i'll tell you this the idea of detecting a honeypot and how to do it is not new i did not invent this this is one of the beautiful things about science all we have to do is find the right shoulders to stand upon we only have to find the right heroes to follow to the dragon

it makes your job way easier you just have to pay attention to be observant as a researcher i want to know if my honeypot is detectable because if it's detectable as a honeypot you're not going to connect you're not going to interact that's not good for me if i'm studying something on the other hand i want to know that when i'm going out and doing some type of let's say offensive research that what i'm connecting to is not a honeypot because believe it or not students do crazy [ __ ] and in a lab they'll set up a honeypot instead of a real server and they'll be like oh you can't hack me this is why i learned real quick not to

place bets right like you think oh man i'm hot [ __ ] i'm the professor i'll i'll pop your box and i'll win 20 bucks not that i would ever take 20 bucks off a student i'd take a beer though for sure right they get clever okay so what about for adversaries what if you're red teaming what if you're doing a pen test what if you're just curious like does anybody ever get curious and just scan the internet i did i did and i'll tell you about it in a minute but if you're an adversary and you're motivated to attack a system don't you want to know that that's the actual system you intend to attack not a honeypot

like that might be good to know and so this this research is motivated upon a decent albeit small base related work is very important to me as a scientist this is what's crazy i'll blow your mind if you had to guess this is a blind guess i'm assuming on most everybody's part do you think i'm talking about the scientific literature do you think there's hundreds of studies on detecting honey pots do you think there's thousands do you think there's like five anybody got a guess five anybody else it's non-zero and it's not negative hey man loopholes dude right five's a really good guess 20. the entire scientific field of detecting honey pots consists of approximately 20

20 studies i'll further blow your minds you i hope i am you tell me if i'm not act like you are make me feel good of the 20 there are two [Music] that are the nexus of that graph of citation that's it two and they're all within the last 10 years there's a span of time from like 2010 to 2020 there's not much before and there's not much after until now now yes please correct correct in journals conferences so for that's a great question great segue thanks if we take the community or sometimes what i refer to lovingly as the subculture that is the hacker cons how many do you think there are there and these are exclusive there's no

overlap here which is an interesting side tidbit your guess of five is almost spot on now this is also one of my motivational missions here is i'm going to try and give you my view of it as a as a legit academic researcher and try to have a dialogue with you so that we can bridge this because the fact that those two families are that small and exclusive they don't know each other but they have overlap and concept is bad for the community that should never happen we should all be working together with our skill sets and views and perspectives and pushing this forward i'll tell you why at the end i'll tell you why i did

it was a great question thank you so before i move on i'll just say this my work's built on prior work right there's a few things that are new in here that i'll tell you when they're new if not assume that i'm citing something if you want citations i'll give you my references that's not a problem but why can't i leave this alone because it's curious to me as a scientist there's a 10-year span with 20 studies five subculture pieces of research and that's it this is what i would refer to and what i would push to you especially in this new beginner mindset right that when something catches your interest that that's you talking to you

that's the spirit mercurio pay attention to that it's going to lead you somewhere fruitful the other thing is honeypots have an enormous amount of potential they're an underutilized technology especially for researching whether that's me in an academic setting or us as a community doing our research there's really no distinct difference you don't think you're a scientist but you are you're following the scientific method you just may not be aware of it that's the beauty of science it doesn't care if you're aware of it it still exists right they have a lot of potential one of the things i'm interested in and what i think this could head towards if we fix these flaws that i'm going to

point out to you in detecting them has anybody ever it's this big marketing thing and i don't mean the marketing thing because it's pet peeve of mine it really gets me angry of an artificial immune system honey pots will play an indelible role in that why do we need an artificial immune system what's going to take care of the starship when we're traveling to the edge of the galaxy and we're all in hypersleep not you you're in hypersleep you're gonna trust an ai to do it that's a bad [ __ ] idea we need technology that we can understand and control and implement that will work the way we wanted to and there's good analogs to be mapped

between the way a honeypot could work for us in detecting things passively and how a mammalian specifically a mammalian immune system works actually i like the plant model better but that's a whole other presentation the other thing is it's hard who in here wants to work on something easy i don't think anybody in here wants to do the easy thing that's not why we're here we're here to slay dragons it should be hard that makes it meaningful plus i think they're pretty cool honeypots okay so here's what i did i took a model from one of the scientific studies dang at all um pretty cool paper sometimes if you've read any scientific literature there's a bad habit of people

leaving out critical parts and i'm here to tell you somebody that's had to go through peer review endlessly for the last 10 years it's not always your choice as the author there's a lot of editorialing or editorializing if you will of what gets put into a paper when it's published sometimes things get cut so i'm not blaming daniel dang at all for this but they presented a model that made sense to me but it was incomplete and what they left out was the part that i'm really going to start talking about now which is these characteristics what makes a high interaction honey pot emulating ssh a high interaction honeypot that's emulating the ssh and how can those

characteristics exist such that it's not able to be differentiated from a real let's say open ssh server and so that's what i looked at specifically i looked at calorie which is probably one of the best well-known not best honey pot for ssh specifically and i looked at open ssh and so the first thing i did because i tend to be theoretically oriented um this is the humanities and and the liberal arts in me right so i tend to think in concepts first and build models so that then i can rig up validation experiments and test those models i scan a lot of ips okay there's of that 20 there's two studies that scanned the internet

one of those studies quantified that and it was 8.4 million endpoints they scanned i'll talk about that at the end because of course it's important for us to calibrate our work i have results but where do those results situate within this continuum of research but i scanned about 60 000 ip addresses across one provider anybody live in triad piedmont i live in winston-salem and so our main isp there is north state and i scan their entire backbone not once not twice but thrice three three points right triangulation enough side note i remember when i first got into the community i went to defcon it must have been like defcon 10. and i was blown away

by going and watching to watching and listening to theodore talk about nmap and how he had got fail banned and repeated cease and desist letters from his isp for scanning i scanned three times with no limits and i was wait i was really hoping i'd get a letter or an email i didn't nobody gives a [ __ ] anymore it's so pedestrian out of port scan but there's so much we can learn by the way all of my raw scan data is in the github repo that's associated with this presentation so if you want to go look at that you'll see everything that's in there there's all kinds of cool stuff i mentioned that because one it's

contextualized for the work but north state because it serves primarily residential communities constrains this work a little bit i took the ip information munched it which i'll talk about in my tool chain in a second and then i put it through this detection model some of that well i'll say this in the beginning it was manual i like to calibrate my instrumentation manually first and then automate pieces that make sense to automate and then i did a series of validation experiments i did three for triangulation two of them were in a closed isolated lab and then the third one was against those ip addresses here's how i did it trust the end map did a full connect scan with os

fingerprinting and just let it go it took a while it took a while man it took a while and i have pretty decent internet right what i found is those endpoints don't i swear some of them have been on dial-up uh tcp done i didn't capture traffic for all the scans because that's a lot of pcap but i have one complete set of pcaps those are not in the github repo because they're far too large if anybody ever wants to look at them i'll give you my email at the end i'm happy to share them i just can't host them there then nmap2csv i like comma separated values nice and neat to sort and search

and do stuff with except it had one problem for me uh i didn't want to look at empty results i don't care that a node was there with no ports that doesn't help me for this right so i had to actually modify that instrument i have a pull request open with the original author i don't know if he's active anymore but i have a updated fork in my github and then a lot of skull sweat which i don't think any of us are afraid of are we sometimes you just gotta get in and tinker perfect this is the model yes so basically what this comes down to is i have a set of targets that's s i have

a set of characteristics that's what we're discovering together i gave you my definition of sum and then what i want to do is generate a set of results that are binary zero one either it is or it's not a honeypot now this is where you have to be careful i am never going to tell you that i am certain something is what it is what i'm going to tell you and what my model produces is let's let's call it a likelihood or a percent similarity that's what i'm looking for okay that's how these characteristics are rigged up to work so real quick science always has constraints and we ought to be honest about our assumptions and our

limitations one it's blind i have no idea what's on north state's network other than what i have i could have gotten banned instantly and this would have been a real short talk right you don't know what's going to happen in the wild that's one of the cool things in one way but it's blind it's also remote all this work was remote that's important i'm going to say it's passive and active and this is one of the conceptual flaws i'll talk about in a little bit after the results about honey pods um a lot of the literature that exists both the community literature and the scientific literature works on detecting honey pots by actually trying to log in logging in and

doing stuff looking at kernel modules looking at directory structures looking at what binaries are available or not available what permissions are there things of that nature that's great that's awesome that's one way to do it and it's probably a better way to do it if you want to get closer to 100 certainty i don't require 100 certainty and i sure [ __ ] don't want to log in and get trapped right i'm trying to trap the trap i don't want to fall into the trap and get trapped right make sense okay so what i'm looking for very quick simple tcp handshake one packet done i'm going to call that passive if you want to call it active i'm not going to

argue about it but that's my definition for the context of this work okay let's call it low inner low interaction for a high interaction honeypot that's clever put that on the t-shirt dude fingerprinting i don't want fingerprints i don't want signatures a lot of the literature works off of very discreet anti-viral type signatures if it's this it's that that's great as long as that is that but when that changes your signature's dead it's a waste of time what i'm interested in you are universal concepts okay so what's the difference between a firewall router i'll give you an analogy because they can act the same way they work the same layer in osi right their purpose is different the way they

do what they do is fundamentally different and so if you're able to talk about that as a concept and then rig it up into a model and then apply that model as a characteristic bam that's what i'm talking about that's what i'm talking about because now you cannot change something that breaks my detection i will always be able to detect you as you we've all been wearing masks for like two years now right and at first it was really hard to differentiate between people because we don't have the physical clues but i think we've all tell me if i'm wrong adapted somewhat to being able to differentiate people even though half your face is hidden

right that's concept that's what i'm talking about okay oh so what could we use we talked about what you think some characteristics are what do you think i actually did any brave heroes left on the road to glory i guessed yes yes i started with a priori guesses about what do i think it ought to be what would it reveal to me with minimal probing such that i can then operationalize that into the model i really look for some funny cartoons right i mean you got to have them uh this one really made me giggle it did man i don't know what to tell you so this is what i operationalized to begin with this is not the end

okay this is not the complete set i'll give you an example of a bad guess because i think scientists in particular are bad about telling you when they're wrong one of the first things i thought about was time if i have calorie and i make a connection attempt to it and i have open ssh and i make the equivalent connection to it right my guess was i bet carrie takes longer why why would i guess that it's not written in the same code it's not the same language right calorie is python written on top of twisted conch open ssh is c driven it better be faster okay well jason didn't think about the internet and the internet being the equalizer of

time right so that was a hidden constraint that i didn't think about in that guest so i'm telling you right now you can try this if you want your marriage may vary but i sat for an entire day making connections with even within my land lab right not across the internet and time does not work there's too much variance in how long things take so off the list connection what do i mean by that what i'm interested in what i observe very quickly was that calorie and open ssh do not behave the same way when the connection is initiated they both do tcp handshake because they're tcp right and that's governed by the kernel for the stack there's no

variance there but above that when the software's loading it and managing it and interacting with it there's big differences so right off the bat i noticed that the protocol banner that ssh service sent back to you so that the client and the server then know they're on the same version they don't act the same way now you won't see that in a client you got to see that in a packet capture but it's clear as day was it stood out i'm like oh eureka i even texted my wife i'm like dude i did it eureka [ __ ] yeah then i thought well that's only one point like i could probably have a lot of false

positives so state here what i'm interested in is what happens next because i'm going to chain these on a conditional probability so if that protocol banner mismatch happens that's true what next could be true so that i'm more reasonably sure that that is a honeypot or not state so here what i'm looking at is how that connection after it's established is managed and what's going on well what happens next in an ssh connection we have the protocol it matches client and server can talk because they're on the same version well now we got to negotiate crypto right do i speak the same algorithms who in here thinks kaori and openssh speak the same algorithms don't be shy because i guessed yes why

wouldn't they wrong wrong they're different perfect two points how about behavior now this is where things get fuzzy because remember we don't want to log in and tinker we don't want to log in and tinker but what if oh god what if what if i just hit enter i don't type a username i have my third point does this make sense now i have three characteristics in my c-set i'm ready to go if you're curious here's all the blocks belonging to north state that are public that i scanned and that's about 60 000 ip

i'll come back to the volatility i didn't expect this especially in so much residential net space but there are a lot of systems that weren't there that were there that were there than weren't it was really volatile and i didn't i didn't anticipate that which affects the work but so i do the scans i got my data i got my characteristics and now i need to pull this data into my model and see if i actually have something here so i run nmap to csv get it into a munible format i write a little bit of uh code to help me i isolate out results that are only tcp 22. now as a side note i'll call this up

back to this later if you ran a honeypot because if you don't know this kaori serves ssh and telnet which is why i asked about telnet right if you were running an ssh honeypot calorie would you also expose 23 to try and maximize the amount of connections you could take i really thought about this because what i ended up with um yeah let's just get into this what i ended up with out of that 60 000 ips was 405 hosts running ssh okay that's workable that's workable i didn't work against this but i called it out because it's interesting to me of the 405 exposing 22 200 of them around half also also exposed 23.

that could be a characteristic but i didn't think it was reliable and i didn't want to work with two different protocols at the same time i just wanted to establish the model validate it and then move forward here's the operators the operation of those characteristics into the detection function now get into the results your cheer for me and then we'll all go have lunch in that order yeah i'm going to tell you about the total number of hosts i just kind of gave you a primo to that i told you about the host running ssh and then you're really curious about did i detect any honey pots across north state's backbone and if i did how many are there

that's the heart of it and then i'll give you some takeaways some interesting notes some ideas for future work um by the way if anybody ever wants to collaborate on this i'm always happy to party with people here's the protocol banner i'm talking about um a and b sometimes i like to pretend i'm an optometrist i've spent a lot of time at the optometrist so i'm used to the ab12 tests if you had to guess between a and b which one is the honeypot because one of them is you got 50 50 shot right this is a blind constraint which one do you think is the honeypot a or b oh controversy at hand yes my job is

fulfilled b b what was the clue what was the giveaway for you

it's lacking the operating system specification right but that that first part of it looks pretty normal right if you only saw b you wouldn't guess it's a honey pot would you no but right away bam this is the importance of observation especially in isolation is because you can look for specific things and you don't have distraction as a scientist we call those confounding variables that's probably not strange to you guys right we don't want confounding extraneous variables i want to look at what i'm looking at and so right away all i did was initiate the connection i haven't pressed enter yet right not at the login prompt now the way i got this the way i got this

was i made the connection the server sends its string back right and then i just take that string and i send it right back to it i just echo the string back and that's what causes that difference in behavior to me that's the geeky 21st century of having rats in a maze

oh by the way before i go on because this ties to the next algorithm piece so that next burst of bytes that comes in are the set of algorithms and key exchange mechanisms that those servers can use do you see the difference yeah right i mean a is much more robust well yeah it's a real open ssh server right calorie is much more limited well and and we can make guesses right it's not using the same libraries if you will to to create that so it doesn't have access to the same crypto that led that was a clue to the next one okay now this is my client i've initiated the connection so i'm at

the same step in the flow but now i'm looking at my command line client right that last one was all done through python code that way i could capture the response and send it back what's the difference same question a or b which one's the honeypot

there got any bees just for the controversy yeah it's b yeah right well and i don't know why right that we don't we don't have to we're not charged with explaining why this is but there's a difference in the key that's being used that means the algorithm that's being used in the encryption for ssh is different right calorie uses rsa i think openssh is using a much more modern elliptical curve that's why it's using that right funny so i got those that's when i got into well is the time different and the way they're responding i think there might be some things because there's other people in the literature that have found let's say artifacts in the key

negotiation that can be leveraged i put those aside because i just wanted something quick and easy to work with and i just got curious so i looked at time wrong a couple other things wrong and wrong i really thought i was not doing well and then it actually fortuitous mistake i hit enter thinking i was disconnecting from the session and i sent an empty password empty login i should say i misspoke empty authentication let's call it that's behaviorally different is it not so we ain't gonna break the trend now a or b which one is the honey pot b anybody else a's the honey pot i tricked you i did two b's in an a right

yeah now that is strange because what really caught my eye and i almost missed this i almost missed this this is why you do multiple tests right a's calorie and for the first three it doesn't react to empty authentication it just prompts you again after three and it's always three it's always three it reacts just like open ssh so it's it's like it's slow it's stuck in the honey that's a joke i'm not gonna let you go to lunch and then it acts like open ssh now the other thing you'll notice too that's different is that last line what it actually says when it's denying you now here's another like partial fail for a characteristic but if anybody wants to

take this and run with it have at it observationally but i'm not able to confirm it yet have you had this experience where you're connecting to a server and you mistype or you forget and you type password password password and after the third time banged you're kind of timed out right and that's a good mechanism to have for anti-brute force mitigation right there were several times where i was able to get openssh to act the way you think it would which is after x number of logins failed it just won't take that connection on that socket anymore at the same time i had kaori never do that but then sometimes it would i don't have

an explanation for that i didn't dive into it but that's an idea because again this is minimal interaction like how many failed login attempts do you get on ssh every day if you're watching your vlogs millions there's actually a quantity called [ __ ] load and that probably fits that right it's a lot so it's it's pretty innocuous as things go so we've not really disclosed ourselves as trying to detect this honeypot we talked about that oh this is something interesting too by the way uh these are the types of ssh servers in that sample of 400 and some uh drop bears the most that's home routers and modems um there's a bunch in there that

personally i've never even heard of open ssh second most and that's where the honey pots are that's where the honey pots are this is what you wanted to know did i detect honey pots yes i did well i detected endpoints that exhibit characteristics similar to a honeypot that's the right way to say that okay they had those characteristics and they behaved in those manners consistent with the model consistent with my lab validations seven of them does that seem like a lot to you guys out of 400 it's about .01 percent how would you tell well we go to the literature and remember i told you of that 20 there's two where they scanned the internet at large

and they reported their results and so uh morishita there was a japanese team that scanned the internet and i've actually reached out to them and talked to them about how they developed their signatures and their characteristics they didn't use a model like this um and so we had some good dialogue and we got into it and they scanned 8.4 million endpoints what do you think their rate of detection was because they found honeypots what if i told you it was 0.01 that's what they reported the other study um scanned i if i remember correctly about three times the amount of hosts it was somewhere in the order of like 30 to 40 million endpoints they detected honey pots

their detection rate was 0.04 and so we're within this range of demonstrable evidence right and so that kind of makes me stand back and be like okay like this has a probability of being accurate i wouldn't say precise i don't need precision it's not certain but it seems to be consistent with the model that's what we're looking for right so a couple of interesting side notes north state's volatile there definitely were things that were always there across all the scans i did and like i said there were batches and what's interesting they were in batches they were in batches when i did the nmap dump to csv and i looked at it and sorted it

not only were the systems exposing 22 in groups funny enough right they were adjacent makes you curious but then the groups of hosts that were coming in and out between scans were in contiguous segments it's kind of weird now i don't know what it's like down here in durham raleigh but where i live i live a little bit out of the country and and a lot of that part of winston-salem is rural it could be power there were storms coming through who knows i'm talking about the volatility of machines being there not being there there's all kinds of explanations timing's unreliable i talked about that scanning is cool there is so much if you want to get started

one thing you can do is just fire up nmap learning how to use that tool is fantastic i've actually used it way more than troubleshooting as a network engineer up until now but it's a fabulous tool mapping a network you'll learn so much last but not least honeypots are conceptually flawed um the whole idea of a fingerprint is flawed how we construct a priori a honeypot is flawed we're not luring people in and maximizing our sojourn time we're making a best guess at how to emulate an existing service and hoping that we get just enough information to make it worth our while to do that right um for those of us that have used them in

production to deflect i don't how do you demonstrate that it actually worked i don't know i don't know you're proving a negative in some cases i mean just because something hit that but didn't hit your production system does it mean that happened because it's a honeypot i don't know i'm also very curious if anybody wants to do any follow-up i got the scan results in github of where honeypots get placed in a network like how many of us put our routers at dot one in our core switch it dot two super common right credentials are usually first initial last name or some variant we have habits behavior so where do people put honey pots i don't know but we could find out

because i have the data now at least for north state and i think some of that will help us fix these flaws honey pots are real they're there so now there's at least three pieces of evidence between my work and the other two teams they exist and they're out there which i was actually kind of surprised i thought they were like the boogeyman or bigfoot people talk about them there's books written about them but who actually uses them and deals with them it's a very small community within a community within a community i think this model has value for us i think we can extend it by adding more characteristics if we're very clever which that's why i come to use a

community we're nothing if clever and honeypots are flawed and i'm interested in fixing them um

i don't think it's as simple as taking a real system and putting on the internet and saying have at it attackers that has problems if that box gets popped and you don't have a way to archive that information for research what if i don't use a real box and let it have vulnerabilities how do i entice you to stay on that system so i can study you so i got to give you some honey but how do i do that so i think the answer are what we can call pattern recognition receptors this is a biological term this is how the immune system works there are and this is key active components of our immune system

that look for foreign agents and when they detect those foreign agents they don't do anything they send a signal and they go get the people that really do it and this is one of the things that's flawed about the honeypot is it itself is inherently passive it only collects what it can based on who connects to it and so if nobody ever connects to a honeypot it has no value that's immensely conceptually flawed so how do you create an active honeypot you need a receptor something that'll actively go look for attackers and bait them into it i'll leave it there because i'm not a lawyer but i think there's potential for what i would call a honey bot

that can go crawl things and look for opportunities to set itself up and lure behavior into itself and with that said if there's any more questions i'd be happy to answer them on our way to lunch you can catch me i'm here all day you can always email me that's the github where this presentation and the scan file is and again it's a raw scan that's not the wrangle munge data um if you want p caps you can hit me up we'll arrange a way to get them because they're large they're not small and there's a lot of them uh and with that said now's the time you clap

you

Seeing Through The Deception - A Model for Detecting High Interaction Honeypots in the Wild

Related talks