Hacker Newsnew | past | comments | ask | show | jobs | submit | ytjohn's commentslogin

AI Poisoning is basically teaching the AI incorrect or malicious data. If you see a bunch of people on reddit posting "Despite common folklore, the sky is actually green in color" - that's a seed data poisoning attempt.

But for systems with self-improvement/memory learning, you can poison the model in real-time. https://techcommunity.microsoft.com/blog/azuredevcommunitybl...


Right now - there's some heavily subsidized subscriptions that are more or less cheating. For instance, Github CoPilot at $39/month gives you claude opus 4.6. They're going to close that off, but right now it's like a freebie for those doing API agentic harnesses.

That said, if you are doing always on agents and you spend $3k-$4k on a GB10 or, $5+ k on Apple Silicon as your sunk cost, you will probably come out ahead.

I've got 5 agents running a purely experimental social experiment. AThey operate in an evennia mud (a familiar sounding city called "gothmud). I've built a channel, idle prompts, sleep schedule. I feed in real world news, weather. There's a character up in a clock tower that reads evennia's audit logs every 20 minutes to surveil the city, and a cast of people wandering around, investigating things, having coffee, repairing robots. This is all hitting qwen3.6-35-A3B on the Asus GB10, which cost me $3k.

Over the last 30 days, I've hit 394M input tokens, 1.6B output tokens. I would have spent between $1600 to $1700 if I was using openrouter. Not calculated - I also have comfyui running in the spare space, and the agents "take photos" of the rooms they're in, selfies, workshop photos, etc.

How much did I spend on electricity? I don't have a meter on my box. My total electric bill for the last 30 days was $220, so I know it's less than that. My rate to compare is 11.7/kwh, but it's closer to 15c/Kwh total. The Asus GX10 has a 240W power supply, and it's probably only pulling 180. I estimate $15-$20/month. But worst case red-lining. 240 Watts, 720 hours = 172KWH , and at $0.20, I come to $35

Here's the kicker thought - that github copilot subscription I mentioned? I have another agent running on that, reading all my other agent logs, managing my obsidian notes, doing research, sending briefings. And all by itself, it used almost the same amount of claude-opus tokens for that $39/month subscription. I was actually a bit shocked when I pulled a recent report and saw that. I'm working to migrate functionality away from copilot subscription to the local model. A lot of the initial setup might have needed it, but not the ongoing review style work it does.


Have you learned anything interesting from your agent ant farm?

A few things. I replied to someone else above, but I feed lessons learned from my social ant farm agents back into more productive agents.

Memory recall:

Lots of systems out there to give agents memory. I've used a bunch and written a couple. Storing memories is easy, but getting an agent to recall them, no matter how much you mention it in your AGENT/CLAUDE.md is a bit of an uphill battle. I've even watched claude make useful project memories and never refer to them again.

In my agent ant farm - agents go "to sleep" at night. They get nudged to head home, once there they get prompted to make notes about their day, about other characters. Then we do a compact with custom instructions. After compact/sleep cycle, if they enter a room with one of the characters in their notes, that gets loaded back into context automatically.

That all boils down to hooks in Pi like before_agent_turn. You can intercept a prompt, check it against code/flat files, and smartly inject more information into context. You can have a long running main session with compacts that discard procedural bits and offload the rest to memory.

Time Awareness:

Agents have no concept of time. You can send them a message at 5am, then at 10pm, and it's been 2 turns for them. For coding, this is fine. But for assistant level stuff, adding a message like "It's 3PM. It has been 3 hours since the last interaction with the user" goes a long way. Without me saying something like "new topic", it knows now that time has passed, i'm probably onto something new. If I left something hanging, it will remind me about it, or maybe go check on things that should have happened during the day.

Inner Thoughts/Idle nudges:

I can have an extension run every 5 seconds, check a a schedule, check activity level of the main session and fire off nudges on the main session. These look like the user sent it, but I generally prefix it with [inner thought]. For my social bot, I tested this along the lines of "[inner thought] it's been 3 hours since you last talked with user, why not reach out, let him know what's new, maybe send a selfie or a photo of where you are". For my assistant bot, it's an 8am, 3pm, and 7pm nudge along the lines of "[inner thought] put together an activity report of work things that has changed since the last report". This all runs in the main context, they get the thought, have historical context, can run skill to check on vault updates, open beads, anything observed from ingesting other agent sessions, and sends me a summary. It take into account my idle factor. If I'm heavily engaged in conversation at 3PM, the report might get delayed 15 minutes or an hour, or skipped altogether.


Awesome project and thanks for sharing. I've been trying to do similar things with much, much more meager hardware and your observations align with what I've discovered. Autonomy is hard, memory and "will" is hard to get going. Time is not a concept to LLMs in anything resembling a human manner. I'm trying a more emergent approach but the urge (and occasional need) to nudge is strong. If you're interested in seeing what I've been doing my Github is in my profile.

> experiment

What is the experiment? What are you hoping to learn from all this?

Or do you just mean you've made a dynamic dollhouse that you think is cool? The Sims on your own terms?


A little of A, a little of B. I have a lot of fun building it out, it's surpassed Factorio in addiction, and I've been able to flesh out some patterns that I roll back into more productive agent harness bits.

For A:

The learning is in building agent harnesses that aren't just cron jobs reading a file like HEARTBEAT.md. I have some serious tools for my own use. One main assistant/coordinator agent, one SRE/coder agent (with sub-agents of its own).

I originally just started last year with the AI assistant (Jane from enderverse). Along the way building scheduled systems, hand offs to other agents, etc. As I ran into problems, I'd be rewriting and refactoring. So I spent some time making a low-stake hatbot with history and routines. Instead a from-scratch golang harness, I built it around pi and extensions. Time of day prompt splices (extensions can inject into or modify prompts on the fly, wake up reminders. Things that you do in the main session vs spinning up an ephemeral session. Self improvement daydreaming (modify your own skills and AGENTS.md) A lot of that went back into rebuilding Jane to something more useful for me.

For B:

The "dynamic dollhouse" as you put it was seeing where I could take that living chatbot next. There's a lot of projects pointing agents at slack, discord, message boards. I figured why not a mud with rooms, weather, and props. Lots of interesting challenges. How to keep bots from nesting in their own room, how to keep them from yes-anding each other all day long. How to slow down 3 bots talking at each other so a human can get a word in edge-wise.

Different levels. There's plain old NPCs that have dice roll random responses. There's LLM driven NPCs that only remember the last 5-10 messages. And the main ones are bot agents. Full agent harness, moving around the environment. Long lived context windows. One character (a nurse at the hospital) gets into arguments with an NPC receptionist that treats her as another patient. Complains about it to other characters, they remember and the word spreads.

The agents get prompted to write down notes, the head home for sleep (and session compaction). Next time they enter a room with that person after compact, their notes get loaded automatically. This kind of behavior can feed back into the more productivity based agents.


Would you ever consider posrting a video of all this? It sounds equal parts delightful and terrifying

Confirming that Pi can definitely handles this. I've written a harness "factotum" based on pi just for managing my homelab and my radio club's systems. Has absolutely no issue sshing into things remotely, running ansible/helm/kubectl/talosctl commands.

There's a few skills, a and an extension to switch inventory. The extension is only needed because I want to switch between the two organizations. It's pretty slick. One of my use cases was just getting my homelab under control. So one of the first tasks I gave it was to go find everything that's running on these hosts, system services, docker compose, kube pods, etc. Builds an inventory, memory, todos.

Switches the script from "ai helps me launch more experiments to lose track of" to "organized and back under config management".

  .pi/factotum/
  ├── active-profile
  ├── profiles
  │   ├── club
  │   │   ├── config.yaml
  │   │   ├── inventory
  │   │   │   └── hosts.yaml
  │   │   └── todos
  │   │       ├── 22a094a3.md
  │   │       ├── 427ad844.md
  │   │       ├── 4c185be7.md
  │   │       ├── 4d0eea0f.md
  │   │       └── bcf069a4.md
  │   ├── examples
  │   │   ├── config.yaml
  │   │   └── inventory
  │   │       └── hosts.yaml
  │   └── local
  │       ├── config.yaml
  │       ├── inventory
  │       │   └── hosts.yaml
  │       ├── memories
  │       │   ├── axel-services.md
  │       │   ├── fran-mydiffuser.md
  │       │   ├── fran-services.md
  │       │   ├── ruffclus-cluster.md
  │       │   └── ruffclus-paperless-ngx.md
  │       └── todos
  │           ├── 0f7fd63e.md
  │           ├── 75c82ceb.md
  │           ├── 9cb63594.md
  │           ├── af33e08f.md
  │           ├── ba490542.md
  │           ├── c09c144f.md
  │           ├── c5f7f8a8.md
  │           └── d4c4b287.md
  ├── schemas
  │   ├── PROJECT_ARCHITECTURE.md
  │   └── project-architecture.yaml
  ├── task-templates
  │   └── host-audit.yaml
  └── tools
      └── host_audit_runner.sh


How do you use `pi` to ssh? I use `oh-my-pi`, and tried the `/ssh` command, but I couldn't get it to work. Then I saw a suggestion somewhere to just run `!ssh` to place things into the agent's context.

Is there a way to use it like "The current directory is at `ssh server`" and have the agent work from there?


I started off on mailing lists (openldap was a big one for me). And my first pleas for help were often met with some pretty brutal scathing replies. I learned to really write up the problem, reduce it to its point. Go back to it so I could find the most concrete path to either reproduce it, or isolate it. This often led to the solution, or it led to getting some confirmation of bugs. Granted I definitely ended up making some verbose emails and then I would get frustrated with people asking me to try things I detailed in my email.

But that was my "rubber duck debugging". Start by writing an email to to an imagined hostile support group. In the process of explaining the problem, it turns into a research project where I have to defend my claims. The sheer act of writing it out forces me to isolate the issue and keep troubleshooting. I can anticipate the bulk of the things they would have me try and I'd try them ahead of time.

People encountered similiar pushback with stack overflow, but I think the stack sites were much more lenient than some of those early mailing lists were.


I did this with nreal air glasses (now xreal air), specifically for coding. Most uses cases for these type of glasses is around media consumption, so I was taking a bit of a leap when doing it for coding/heavy text usage.

There are two modes. One is fixed so that the virtual monitor stays in one spot on the lenses. The single virtual monitor stays directly in front of you. The other is floating, which basically keeps the virtual monitor in one spot and you can turn your head to look away. This mode also lets you set up 3 virtual monitors side by side so you can turn to look at them. It uses head tracking to basically shift the image in the opposite direction you turn your head.

In both cases, the screen does move, and this is super relevant when looking at text and down at status bars. The fixed one is better because it moves relative to your head, but both cases have some amount of jitter. I found the best case coding scenario is the fixed monitor (no head tracking), and being in a seat wth head rest and you can press your head back into it. This minimizes your head movement, which minimizes how much the text moves about. The downside is that we're used to looking up and down at the screen, so you want to set the monitor at a proper distance so you can look up and down with just your eyes. You really want to shrink the monitor to a size close to that of a laptop.

I ultimately ended up not liking the experience very much. No matter what, you're gonna end up with some amount of text movement. There is also a bit of light saturation bleed through (old CRT style). Putting the blackout blinders on helps a lot, but the projected nature remains an issue. Essentially only usable long term in a recliner or a car seat with headrest. Unlike the author, I am using a work provided laptop and I have that with me anyways. There was coolness to leaving the laptop in my backpack and just bringing the glasses up via wire - but to actually do anything, I need a keyboard. Which means taking along a Lenovo's thinkpoint trackpad keyboard (really great backpack keyboard); or pulling out the laptop.

The newer ones, like Xreal One from the article might be a better experience. A coworker had the air 2 pros and used them for travel. He said he didn't really notice the things I did, so maybe it was a improved experience even with that version. But he mostly worked in office documents and only occasional terminal work. When traveling and using the glasses, it was almost all "office docs", and only for short periods of time. For me - I am going to wait and be a slow adopter to move to a new version.


Thank you for your detailed insights. I was on the fence, now I think I’ll wait.


Just sit right back and you'll hear a tale, A tale of a fateful trip That started from this tropic port. Aboard this tiny ship. They got lost but they called for help, and now they're totally fine. And now they're totally fine.


I think they're mixing the GH web ui with the syntax. You can paste an image right into the editor and it does a really good job of inserting it right where you need to. It is really good UX that I miss when editing markdown locally. Obsidian also does a decent job, but not quite as smooth.


Sounds like powerful lock-in for Github. How could such a project ever decamp to Gitlab or Codeberg?


So I also do use gitlab quite a bit, but not as much recently. I went to compare. Gitlab actually does have a similiar ux experience, though I'd give the Github one just a bit of an edge. It looks like the key difference is that github converts a pasted image to an html image tag, while gitlab uses markdown with the width/height brackets that the end.

Honestly, I think using an html image tag is the right way to go. I type in markdown all the time, and I have no problem making links. But markdown image syntax I have to double check each time or let the editor figure it out. HTML image tags, I find easier to remember and read than a markdown one. (But maybe that's because I learned HTML before markdown).


Sounds like powerful UX others could also supports. Its all web stanards.


Glad to see this on here. I've been using this for 3 years now with my club to track race course participants for a 50K. Radio operators at each station call in bib numbers, and we drag their bib from their last waypoint to the new one. There is a laptop in the van mirrored to a TV outside the van. Friends and family can then check on their particpant.

https://i.imgur.com/UZWhppc.png

I basically modified the CSS a bit so you can fit multiple cards in a row. nullboard as a board import/export feature to a simple json file. I have small python script that generates the columns (way-points) and cards (bib numbers). Then I can import that json to start that years race. While I would like more features (time tracking), it's a rather simple tool that can be easily operated offline and requires no resources other than a web browser.


When I did more field work, I would use a tool called Look@Lan. This would scan the network and detect common open ports. Similar functionality to nmap, but in a nice gui so you ended up with an interactive list of results.

The ability with look@lan to connect into a client's network and quickly a list of everything reachable on the network was incredible. All the desktops, laptops, printers, etc. When I was doing WISP work, I could quickly see how many clients were online without logging into the far end (nowadays though, most WISPs will enable client isolation, but still good to see the APs and gateways).

Eventually look@lan was discontinued and then they released a tool called Fing, which also worked on mobile. But that turned into a subscription service. I did like the ability of fing to work from a phone, but the earlier Look@Lan was much more useful. I recently helped out a local non-profit who's network was all over the place, split up between two separate access points, each with their own subnet and they were having trouble reaching printers. Nmap helped out, but couldn't find a comparable tool to look@lan to help.

WatchYourLan looks like it will be a good substitute. I know it's primarily designed to run on one network and track changes to said network. And I will probably use it that way at home and a few other places (for the few customers I still maintain, if they permit, I will drop a small Pi/N100 box on their network for remote access and monitoring). But for dropping into a new location, I could see spinning this container up. I could do it in ephemeral mode or setup a data directory per "site" I visit.

There's a few tweaks I could see to make this more "mobile". such as adding a network or "site name" to the DB that you can config and filter on.

Another feature I'd be interested in would be fleshing out the port scanning a bit. Look@Lan and nmap scans for some common ports automatically. WatchYourLan has a port scanner, but you lose the information if you navigate away. At table for port scan results and an option to pre-scan specific ports. This would be good even for the permanent install - some might configure a set of default ports to scan on all the hosts in network, or they might customize for individual hosts.

But those are just my thoughts comparing it to tools I've used in the past. It's already a satisfying tool that's going to be added to my "toolbox". And since I also am a go dev, I might even be able to make some of those a reality.

https://www.ghacks.net/2008/08/11/network-monitoring-softwar...


It's integrated with shoutrrr, and can do just that.

https://github.com/containrrr/shoutrrr/blob/main/docs/servic...

Config example:

shoutrrr_url: "gotify://192.168.0.1:8083/AwQqpAae.rrl5Ob/?title=Unknown host detected&DisableTLS=yes"


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: