Posts Tagged ‘programming’

Java: Determine if String is a URL/URI or file

In the spirit of making a more polymorphous app, you may need to pull off this trick, as I did in a recent assignment at Berkeley. I compiled a few different ways of getting the job done:
public boolean isLocalFile(String file) {
     try {
         new URL(file);
         return false;
     } catch (MalformedURLException e) {
         return true;
there’s also a util for this in Android’s toolkit (not worth grabbing unless you’re specifically writing for Android, though).
another semi-related thing;
  1. Make sure the filename is correct (proper capitalization, matching extension etc – as already suggested).
  2. Use the Class.getResource method to locate your file in the classpath – don’t rely on the current directory:
    URL url = insertionSort.class.getResource("10_Random");
    File file = new File(url.toURI());
  3. Specify the absolute file path via command-line arguments:
    File file = new File(args[0]);

In Eclipse:

  1. Choose “Run configurations”
  2. Go to the “Arguments” tab
  3. Put your “c:/Users/HackR/somewhere/10_myjava.txt.or.something” into the “Program arguments” section


Software Sec: C / C++ Buffer overflows and Robert Morris

Buffer Overflow = any access of a bugger outside of its allotted bounds
  •      over-read or over-write
  •      could be during iteration (running off the end), or direct access (pointer arithmetic)
  •      this is a general definition; some people use more specific definitions of differing types of buffer overflows

A buffer overflow is a bug that affects low-level code, typically C and C++ with significant sec implications

Normally causes a crash, but can be used to:
  • dump/steal information
  • corrupt information
  • run code (payload)
They also share common features with other bugs.
C and C++ are the most popular languages (behind Java) and therefore buffer overflows are a major vuln. C/C++ are heavily used in:
  •      OS Kernels
  •      embedded systems
  •      HPC servers
 First buffer overflow occurred in 1988 by a student named Robert Morris, as part of a self-propagating computer worm that was an attack against fingerd and VAXes (Morris was caught and punished but is now a MIT professor); this worm affected 10% of the Internet
In 2001, CodeRed exploited a buffer overflow in the MS-IIS server, which infected >300,000 machines in 14 hours
In 2003 SQL Slammer worm infected 75,000 machines in 10 minutes by exploiting a buffer overflow in MS-SQL Server
In 2014 a latent buffer overflow bug was found in X11 that had been present over 23 years.



Loaders and Linkers

A linker or link editor is a computer program that takes one or more object files generated by a compiler and combines them into a single executable file, library file, or another object file.

A loader is the part of an operating system that is responsible for loading programs and libraries. It is one of the essential stages in the process of starting a program, as it places programs into memory and prepares them for execution. Loading a program involves reading the contents of the executable file containing the program instructions into memory, and then carrying out other required preparatory tasks to prepare the executable for running. Once loading is complete, the operating system starts the program by passing control to the loaded program code.

All operating systems that support program loading have loaders, apart from highly specialized computer systems that only have a fixed set of specialized programs. Embedded systems typically do not have loaders, and instead the code executes directly from ROM. In order to load the operating system itself, as part of booting, a specialized boot loaderis used. In many operating systems the loader is permanently resident in memory, although some operating systems that support virtual memory may allow the loader to be located in a region of memory that is pageable.

In the case of operating systems that support virtual memory, the loader may not actually copy the contents of executable files into memory, but rather may simply declare to the virtual memory subsystem that there is a mapping between a region of memory allocated to contain the running program’s code and the contents of the associated executable file. (See memory-mapped file.) The virtual memory subsystem is then made aware that pages with that region of memory need to be filled on demand if and when program execution actually hits those areas of unfilled memory. This may mean parts of a program’s code are not actually copied into memory until they are actually used, and unused code may never be loaded into memory at all.


C Primer Plus

C: C v. C++, Object Orientation

C++ graphs object oriented programming* tools to the C language

In the 1990s many companies began using C++ for large programming projects
C++ is a superset of C, any C program is (or almost is) a valid C++ program

*object-oriented programming philosophy attempts to mold the language to fit a problem instead of mold a problem to fit a language

C: Virtues and Shortcoming of C


  • Powerful control structures
  • Fast (and efficient, like an assembly language)
  • Compact code (small programs)
  • Portable (moreso than other languages)
Its design makes it desirable for top-down planning, structured programming and modular design.
C is especially popular for programming embedded systems


  • Use of pointers — errors are hard to trace
  • Can be difficult to follow


C: Dennis Ritchie, history of C

C was created by Dennis Ritchie of Bell Labs in 1972, while Ritchie worked with Ken Thompson on designing Unix.

C was based on Ken Thompson’s language, B.


2 Problems with Inductive Logic Programming

2 problems of ILP systems wrt. data-mining problems:
1. they deal artificially with functions (by using modes to indicate that one argument of a predicate is the output, thus emulating a function)
2. data-mining problems are characterized by the absence of negative evidence, and ILP systems must avoid the most general hypothesis

Inductive Logic Programming (ILP), Link Discovery (LD), Evidence Extraction and Link Discovery (EELD)

Inductive logic programming (ILP) is a subfield of machine learning which uses logic programming as a uniform representation for examples, background knowledge and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples.

Schema: positive examples + negative examples + background knowledge => hypothesis.

Inductive logic programming is particularly useful in bioinformatics and natural language processingEhud Shapiro laid the theoretical foundation for inductive logic programming[1][2] and built its first implementation (Model Inference System) in 1981:[3] a Prolog program that inductively inferred logic programs from positive and negative examples. The term Inductive Logic Programming was first introduced[4] in a paper by Stephen Muggleton in 1991.[5] The term “inductive” here refers to philosophical (i.e. suggesting a theory to explain observed facts) rather than mathematical (i.e. proving a property for all members of a well-ordered set) induction.

Link discovery (LD) is an important task in data mining for counter-terrorism and is the focus of DARPA’s Evidence Extraction and Link Discovery
(EELD) research program. Link discovery concerns the identification of complex
relational patterns that indicate potentially threatening activities in large amounts of relational data. Most data-mining methods assume data is in the form of a feature-vector (a single relational table) and cannot handle multi-relational data.
Inductive logic programming is a form of relational data mining that discovers rules in first-order logic from multi-relational data. This paper discusses the application of ILP to learning patterns for link discovery
“Relational Data Mining with Inductive Logic Programming for Link Discovery” –>

Introduction to the Stack

A fundamental concept in computer science is the stack. Ironically, as coding and scripting becomes more and more removed from the underlying basic software and hardware operations, fewer and fewer code authors have an understanding of concepts such as this… but if nothing else you must’ve wondered where the name StackOverflow comes from, right?

The stack stores information about how a function is called, the parameters it takes, and how it should return after it is finished executing.

Some important points to keep in mind:

  • The stack is a First In, Last Out (FILO) structure
  • Arguments are pushed onto the stack for a function call and popped off the stack when the function is finished
  • The stack grows from high memory addresses to low memory addresses


White & Black box Debuggers, Intelligent Debugging, and Dynamic Analysis

Debugging is a common task for data scientists, programmers, and security experts alike. In good ole RStudio we have a nice, simple built-in white-box debugger. For many analysis-oriented coders, the basic debugging functionality of an IDE like RStudio is all they know and it may be a surprise that debugging is a bigger, much sexier, topic. Below I define and describe key topics in debugging and dynamic analysis, as well as provide links to the most cutting edge free debuggers I use.

Dynamic Analysis: Runtime tracing of a process, usually performed using a debugger. Dynamic Analysis is critical for exploit development, fuzzer assistance, and malware inspection.

Debugger: a program that is used to test and troubleshoot other programs.Intelligent Debugger: a scriptable debugger that supports extended features such as call hooking, such as Immunity Debugger and PyDbg.

White Box Debugger: Debuggers built into IDEs and other dev platforms, which enable developers to trace through source code with a high degree of control, as to aide in the troubleshooting of functions and other code breakages.
Black Box Debugger: Used by bug hunters and reverse engineers, black box debuggers operate on compiled programs when the source code is not available and the only information is available in a disassembled format. There are two broad subclasses of black box debuggers, which are user mode (i.e. ring 3) and kernel mode (i.e. ring 0).
User mode black box debugger: a processor mode under which your applications run, usually with the least amount of privilege (e.g. double clicking PuTTY.exe launches a user-mode process).
Kernel mode black box debugger: a processor mode where the core of the OS runs using the highest amount of privilege (e.g. capturing packets with a network adapter that is in passive mode).
User-mode Debuggers Commonly used among Reverse Engineers
WinDbg by Microsoft
OllyDbg by Oleh Yuschuk, a F.O.S.S. debugger
GNU Debugger (gdb), a F.O.S.S. Linux debugger by the community