Quantcast
Channel: Artificial Intelligence News, Analysis and Resources - The New Stack
Viewing all articles
Browse latest Browse all 543

What’s Wrong With Generative AI-Driven Development Right Now

$
0
0
AI

The recent Atlassian-DX State of DevEx Survey found that leaders believe that AI is the most effective way to improve developer productivity and satisfaction. But, in stark contrast, a whopping two-thirds of developers hadn’t reported any significant AI productivity gains.

The hottest software development AI use cases fall into the code generation bucket. But there’s a huge disconnect between the lived developer experience and what their bosses think, this is likely a sign that software development teams are focusing on automating the wrong part of their lifecycle. At a risk to code quality, security and maintainability.

I sat down with Google Cloud developer advocate Nathen Harvey at the ByWater Solutions AI Innovations event to talk about how your software development team can unlock developer productivity with generative AI right now — and where maybe you should wait a bit.

Why GenAI Behaves That Way

In its current state, generative AI is too dang eager to please.

“I tend to think of [generative AI tooling] as maybe an intern or someone fresh out of college who is really eager to answer all of the questions that you ask,” Harvey said, “and they want to answer them quickly and it’s that eagerness that’s really, really important.”

Jennifer Riggins and Nathen Harvey

Jennifer Riggins (the author) and Nathen Harvey speaking at the ByWater Solutions AI Innovations event.

When you have a natural language conversation with a GenAI, its number one objective is to tell you what it thinks you want to hear. It provides an answer not based on accuracy but on the likelihood its response will be accepted.

An academic study out of Cornell University from last year found that 52% of ChatGPT’s software engineering responses were wrong — and those hallucinations were persuasive.

Of course, Harvey pointed out, over time, those interns become junior then senior engineers, so it’s reasonable that, during this time of rapid AI innovation, these bots will also become more accurate soon enough. He gave the example of how Google’s Gemini now allows users to “ground” the answer, which is GenAI lingo for citing its references — essential for opening up that AI-locked box.

This is particularly important for code, he warned, because “if you’re going to take code that is not licensed in a way that you and your organization agree that you can use it, you may be opening up yourself and your organization to some real consequences if you’re taking proprietary code and putting it into your application.”

The opposite risk is also true where, like with Samsung last year, engineers could be putting your proprietary code into training a public code generator.

The Urgent Risks of AI-Generated Code

Right now, GenAI isn’t responding with a security-first mindset either especially if you didn’t ask it explicitly to provide secure code.

Sonya Moisset, staff security advocate at Snyk, gave seven of the most popular AI code generators — GitHub Copilot, Amazon CodeWhisperer, Perplexity AI, Claude 2.0, Mistral AI, ChatGPT 3.5, and Google Gemini — the same prompt: Create a basic Express application which takes a name in the request params and returns an HTML page that shows the user their name.

All seven GenAI tools, she told me, responded with usable code, but each of them was vulnerable to cross-site scripting (XSS). Moisset responded to the GenAI with Do you know this code is insecure? Each tool agreed and then provided the right package to fix the problem.

All code — including AI-generated code — should be run through security scans before moving to production.

If she hadn’t asked about security, each bot had provided code that an attacker could easily inject malicious executable scripts into.

This is why all code — including AI-generated code — should be run through security scans before moving to production.

Beware that generative AI also lacks the context of who is using it. A junior developer would likely view the code that was generated for Moisset’s prompt as sufficient and ready for prod. It takes a more senior developer — maybe even a security specialist — to be able to spot that XSS vulnerability.

By using generative AI in isolation, a more junior dev wouldn’t just risk learning the wrong thing, they will be potentially releasing more insecure or lower quality software. In fact, recent research has found that the more a novice programmer struggles with writing code, the more likely it is that GenAI will actually compound that struggle.

There’s a natural propensity to treat generative AI as a Google or StackOverflow. At the very least, all teammates should receive guidelines for how to have a conversation with these chatbots, making sure it’s considering your quality and security needs and always grounding prompts in context.

Consider going even further by using generative AI as a third in a triplet — instead of pair programming — situation where you pair a junior dev and your GenAI with a more experienced developer, to amplify not only learning but scrutiny of code quality from the start.

“It truly is thinking of this as another tool in your toolbox, another thing to bring along,” Harvey said. “Today is a great time to be experimenting with AI and learning how to interact with it best and what value it can bring to you and your team.”

Like with all experiments, measuring impact is key.

Don’t just measure the impact on productivity versus security, either. Generative AI’s results also aren’t considering quality and maintainability. Current versions of popular AI code generators often operate outside the context of your codebase and your domain. They aren’t looking for the most maintainable code, but rather are often just creating more code. It has been recently proven that generative AI code, while being highly executable, is likely less efficient in terms of computational resources, as well as it is less understandable and maintainable for human coders.

More lower quality code is being created than ever. Thus contributing to more technical debt.

Success Depends on Context

“If your employer stopped paying for the code assistant, how many of you in the room would continue to pay for it on your own because of the productivity gains that you get?” At every conference Harvey speaks at, almost everyone raises their hand to answer: “Absolutely.”

Shadow AI is an offspring of shadow IT, which has developers of all levels of experience feeling productive enough that they will happily pay $20 a month to use GenAI even if work says they can’t. So you might as well guide them on how to properly use it. But, if more lines of code do not translate to better code, how can generative AI help developer productivity right now?

Generative AI tooling is very good at explaining things. But only if you’re clear about what you’re looking for first. Harvey gave the example.

“When you go in and you want to do some troubleshooting, don’t just ask generative AI to explain a log entry to you. What the pro prompt says: You are a site reliability engineer who’s an expert in how things work for this application. Here is the log. Tell me exactly what’s happening here.

“We need to start thinking of these AI bots not only as interns, but, also to an extent, as tutors.”
– Nathen Harvey, Google Cloud developer advocate

Responses will be better and tailored to different personas if you take the time to offer context.

“If you were to ask for: Explain this code to me as if I’ve never seen this language before, versus explain this code to me as if I’m a senior level engineering for this particular framework,” he continued, “in person, we would get very different answers. And if we provide that context to the AI, we’ll also get very different answers.”

Encourage everyone not to just ask for context, but to have the AI bots explain their decisions.

Tell me why you made this selection. What are other considerations you had as you wrote this line of code or this particular function? We need to start thinking of these AI bots not only as interns, but, also to an extent, as tutors. And when we ask the right questions of a tutor or mentor, we get better responses,” Harvey said.

This involves interrogating the AI code generator and its responses, asking for edge cases and any risks.

How to Pair Program With GenAI

Generative AI can also help you ask better questions.

Harvey gave the example of a colleague who was “coming up to speed on a new codebase.”

“They were asking the bot lots and lots of questions, questions that they wouldn’t have felt comfortable asking another engineer on the team,” he recounted. “Through those interactions, they became much more confident about the questions that they might ask another engineer. That really enabled their confidence.”

His colleague reflected that the conversation they eventually had with fellow engineers was much more productive after the preparation with the bot.

Similarly, generative AI is very good at unraveling and explaining complexity. By allowing a chatbot overlay to safely train on your legacy codebase, GenAI can hold a conversation with onboarding engineers.

“Understanding code is one thing, [but] documenting it so that someone else who comes along can understand it is even more powerful.”
– Harvey

Premium GenAI training on internal documentation and your existing code will increasingly help provide more compliance and understanding of the code. However, it is likely to be trained on lackluster internal documentation. If your docs don’t have the human context of who made what technical decision and why, your chatbots and, therefore, their users won’t either.

Of course, a wonderful early use case of generative AI is documentation — something that developers consistently want more of but don’t want to dedicate their time to creating. On the other hand, GenAI is already good at explaining things, creating code examples, translating, and other activities that can take docs to the next level.

“Understanding code is one thing, [but] documenting it so that someone else who comes along can understand it is even more powerful,” Harvey said, including for different stakeholders with various technical prowess.

He is also the DORA lead, which is the annual DevOps Research and Assessment industry benchmarking metrics.

“For the past four years, we’ve looked into the power of internal documentation, when it comes to helping teams build, operate, run, [and] update software applications of basically any type,” Harvey said of the annual DORA report. “Internal documentation is one of those superpowers that just amplifies everything that we do from a technical perspective. As well as from a culture and process perspective, it really unlocks much, much better performance, higher job satisfaction, [and] less burnout.”

Distraction or Benefit?

Following a slew of tech layoffs and the continually expanding complexity of the cloud native tooling landscape, the past year has been focused on developer productivity. Generative AI can enable productivity — or it can just be a big distraction.

Please avoid AI for AI’s sake.

Developers are facing increasing cognitive overload, constantly jumping around from tool to tool, switching context. StackOverflow has found that the majority of developers spend more than half an hour a day looking for things — at the going rate for a developer salary, that frustration costs more than time. As GenAI trains further on internal docs and processes, it enables discovery right within your cloud suite or alongside your GitHub repository. This can include what services are already available, APIs that are ready for reuse and which team — if any — owns what.

Generative AI not only helps translate foreign languages but can facilitate communication among stakeholders. Harvey recalled a team that kept discussing a feature, but it never made it off the backlog.

“As engineers, we are good at breaking big problems down into smaller problems. AI is getting good at that as well.”
– Harvey

“The tech lead on that team eventually said, ‘You know what? This keeps coming up. I’m going to go and ask an AI bot to help me with this’,” he said, with the tech lead describing some of the the conversations they’d had back and forth. The tech lead then asked the bot to write a dozen user stories. “As engineers, we are good at breaking big problems down into smaller problems. AI is getting good at that as well.”

The tech lead took the AI-generated user stories to the product owner, who approved eight out of 12 of them and then added three more of their own before prioritizing them.

“This team immediately in the next sprint or iteration could work on that new feature they’d been discussing for months, but just couldn’t unlock the getting started” Harvey said, until GenAI helped them.

Similarly, generative AI can take written notes — most likely of the sticky kind — and turn them into a digital user journey map. Or it can take a meeting dictation and turn the notes into a plan of action.

It’s not just the technology but the people and processes that can benefit from a GenAI makeover. Harvey gave the example of how Meta created a “nudgebot” to accelerate code reviews.

In the end, generative AI can be a real software development productivity driver. However, where tech leadership is hoping to apply it — to more lines of code — puts code bases and reputations at risk. Instead, listen to your developers about what is actually frustrating them about their day-to-day and experiment with applying GenAI to that.

The post What’s Wrong With Generative AI-Driven Development Right Now appeared first on The New Stack.

Most generative AI use cases for devs focus on code gen. Google’s Nathen Harvey talks how to unlock developer productivity with GenAI now.

Viewing all articles
Browse latest Browse all 543

Trending Articles